tweag / HaskellR

The full power of R in Haskell.

Home Page:https://tweag.github.io/HaskellR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting values from R into the 'pure' Haskell world (using inline-r in source files)

OscarSouth opened this issue · comments

I've managed to get data into Haskell using the insight gained from reading this issue:
#294

This prints to terminal fine in the REPL, however, I'm using inline-rin compiled source code, rather than using H or iHaskell. The problem I'm running into is that my incoming data is getting 'stuck in IO', where in reality I'd prefer to get that data into Haskell's pure universe, for further processing.

The test functions I'm using (with types) are the following to define what data I want to retrieve:

rDef  :: Double -> R s [Double]
rDef n = do
  R.fromSomeSEXP <$> [r| rnorm(n_hs) |]

(I'd actually prefer to use fromSEXP with a type rather than fromSomeSEXP, but I couldn't get that working yet)

Then the following brings that data into Haskell.

rRet  :: Double -> IO [Double]
rRet n = runRegion $ rDef n

(these are pretty much taken from the previously mentioned Issue, I just delivered the argument differently and replaced dynSEXP with fromSomeSEXP.

On a quick tangent, I also tried to generalise the return function:

rRet'  :: (Double -> R s [Double]) -> Double -> IO [Double]
rRet' f n = runRegion $ f n

This failed, however, so I discarded this approach for the time being.

I do understand that I can most likely (certainly!) get this to work in a more optimised way (and I do understand the inner workings of R to a workable level). Right now though I'm still digesting the inline-r documentation available while building up my Haskell fluency (which is 'ok' at this point), so I'm currently prototyping the program I'm working on using relatively small data and approaching problems in the most understandable way (for me at least, at this time).

Could anyone who knows the library more intimately than me share any insight into preferable approaches to the problem I'm exploring?

Thanks,
Oscar

ps.
As an additional 'add on' -- how would I get Int data into Haskell from R using the above functions? I've not had any problem with String or Double, but Int hasn't worked how I'd have expected it to.

Hello! Sorry for the late response.
Technically all calls to R involve foreign library calls, mutable memory operations and handling of the objects lifetime. So they must be in IO. However there is an rsafe quasi-quote that allow you to splice your R operation into the pure code, so you can write:

foo = runRegion $ do
   let x = [rsafe| calltoR|]
         y = anotherPureFunction x
   return y

however that road may lead to failures. I'll try to describe potential problems tomorrow, so maybe we will resolve if your problem can be restructured so IO nature of R calls will not be a problem, or if we can provide safe API that would allow you to keep non-IO API.

Thanks for the reply!
I'm looking forward to reading your follow up with potential problems.

My present use case is that I've written a small Markov Chain module in Haskell which takes in a dataset and pre-processes it into set of data structures suitable for interactive use in the rest of the program.

My code to load and transform the tabular dataset into the required shape for the Markov Chain module to ingest is written in R, which would be then passed into Haskell for processing by the Markov Chain module and the resulting data structures sent back to R for further analysis/plotting/logging (at this point the program would be in IO).

As my Markov Chain module is all pure code, I'd like to be able to get the R data into Haskell 'cleanly', so that it can be pre-processed purely before entering IO for the interactive aspects of the system.

I feel that the example you've provided for rsafe would fit into the flow of my program (unless there are complications that I haven't foreseen). I'm definitely interested to hear your insight into how the nature of R calls can be best handled.

I also read in #294 that there's a new parser API, which could be an optimal solution for passing data between R and Haskell. However, the GitHub link for that API which was provided is a dead link and I've not run into anything else about that API. Has implementation and documentation of that API developed since your reply in the referenced issue?

Thanks again!
Oscar

There are several problems, the main one is that in order for you to be safe you need to guarantee that all data lives in protection stack. So this means that you will use a single Region (due to NFData constraint in runRegion you are safe from keeping references to the unprotected data), and side-effect is that all your temporary data that is available to haskell will not be freed. There are few more additional complexities, if you R code modifies environment, then the order of modifications will not be predictable during "purified" evaluation and you may get unexpected reordering. If your R code uses C functions or vector modifications they may update vector inplace if they decide that vector is not used by the other code, and I have not tested if inline-r marks vectors as used properly. This means that R code may be not referential transparent and provide no way to test that in compile time, and for lazy pure code that may lead to a situations that are hard to debug.
But if you are using clear env and functions that do not modify environment and vectors, then [rsafe| may be fine.

Parser was renamed to matcher, you can find documentation on:

https://hackage.haskell.org/package/inline-r-0.9.1/docs/Language-R-Matcher.html

In may have missing helpers but was enough for our usecase, so feel free to request more helper functions there if needed.

matchOnly :: (MonadR m, NFData a) => Matcher s a -> SomeSEXP s -> m (Either (MatcherError s) a)

Lives in MonadR m so can't be easily mixed with pure code. MonadR is used because we are using IO functions and R protection inside, and I'm not sure if it's possible to make that totally pure. Though technically such parser is morally pure as soon as no variable escapes s scope.

Thanks very much again. I'll be exploring this in depth over the next few days.

Had a chance to explore these suggestions today. Have been working on other (pure) modules for a while.

Using the regular [r| ... |] quasi-quotation, I can represent a (toy example) call to R like this:

rep  :: Double -> Double -> IO [Double]
rep x n = 
  let rData x n = R.fromSomeSEXP <$> [r| rep(x_hs, n_hs) |]
   in runRegion $ rData x n

Which I can then call in Haskell IO code as such:

repTest = do
  x <- rep 3 3
  let y = fmap (+3) x
  return y

Which (correctly!) returns:
[6.0,6.0,6.0]

This is a workflow that I could work with for now. However, what I actually need to do is to bring a list of (varying length) atomic vectors into Haskell (it's actually fine for me to work inside IO in the above manner). I tried out this kind of code:

rList  :: Double -> Double -> IO [[Double]]
rList x n = 
  let rData x n = R.fromSomeSEXP <$> [r| list(rep(x_hs, n_hs),rep(x_hs, n_hs)) |]
   in runRegion $ rData x n

But this won't compile.

If I change the type signature to Double -> Double -> IO [Double] then it actually will compile, but returns the following runtime error:

*** Exception: cast: Dynamic type cast failed. Expected: Real. Actual: Vector.
CallStack (from HasCallStack):
  error, called at src/Foreign/R/Internal.hsc:165:5 in inline-r-0.9.1-4gmcSHsBjTo4pIYgG2XA6r:Foreign.R.Internal

This seems like an obvious error, but I'm not sure how I should represent a list of vectors to get it into Haskell. Ideally I'd like to represent it as a list of lists although for now any insight is valuable.

Without a type signature, it will also compile but returns a different error:

<interactive>:47:1: error:
    • Could not deduce (R.Literal Integer a20)
        arising from a use of ‘rList’
      from the context: (deepseq-1.4.3.0:Control.DeepSeq.NFData a1,
                         R.Literal a1 form)
        bound by the inferred type of
                 it :: (deepseq-1.4.3.0:Control.DeepSeq.NFData a1,
                        R.Literal a1 form) =>
                       IO a1
        at <interactive>:47:1-9
      The type variable ‘a20’ is ambiguous
    • In the expression: rList 3 3
      In an equation for ‘it’: it = rList 3 3

I also tried to work with the [rsafe| ... |] quasi-quotation, but I didn't have any success with that.

I simplified the example given above into this:

repTestSafe = runRegion $ do
   let x = [rsafe| rep(3, 3) |]
   return x

Which compiles and runs fine (although returns a pointer value: 0x0000000003191248, which I'm not sure what to do with). However, if I then try to expand it into anything such as this:

repTestSafe' = runRegion $ do
   let x = [rsafe| rep(3, 3) |]
       y = fmap (+3) x
   return y

Then it just fails to compile. I didn't manage to get any further in this case.

Thanks again for the help and insight so far. It's been really valuable and I really appreciate the high quality of knowledge that you've shared. I love the scope of the library and would like to become more competent with it!

I also had a try and couldn't get to grips with the Data.Vector.SEXP module. Would it be possible to share a few examples of initiating objects and accessing them from both Haskell and R?

In fact, are there any resources in general featuring example code using the inline-r library in source code? I've read everything that I can find on the internet about HaskellR and read deeply into low level R mechanics so have a good theoretical grounding, but without practical examples to build from, I'm finding it hard to apply that theory. I managed to find examples here and there for IHaskell code, but very little specific to inline-r in source code. This is making it difficult to get to grips with the different parts of the library and how they fit together.

As I manage to get more competent at performing practical tasks later, I'll be happy to help create code examples!

Thanks again and best regards,
Oscar

Some examples can be found on @idontgetoutmuch
blog: https://idontgetoutmuch.wordpress.com/2018/05/19/cartography-in-haskell/ and https://idontgetoutmuch.wordpress.com/2018/02/25/reproducibility-and-old-faithful/ . I'll check if I can extract some of that code or other examples into the manual.

Thanks very much for those links. While I'd browsed that blog previously, in checking the specific links you sent just now I noticed that the method of extracting dataframe columns used there fits into the code I was working with very simply:

colTest  :: IO [Int32]
colTest = 
  let rData () = R.fromSomeSEXP <$> [r| df$col |]
   in runRegion $ rData ()

Since R dataframe columns are lists, this is bringing a list of 'length 1' atomic vectors in from R.

The specific column of data that I'd like to bring in looks like this:

image

(a list of atomic vectors of varying length)

image

Is it possible for me to get this data into Haskell from R?

I'd actually prefer to do this with the [rsafe| ... |] quasi-quotation code for now, but I'd value any insight into the preferable way to do this.

Thanks again for all the help,
Oscar

You can definitely do that with matcher, I'll try to comeup with code today. And also I'll check if that can be done by the simpler means.

I think you can try to use rsafe but be sure that your data is evaluated to NF before you exit runRegion.

Thanks. It'll give me a starting point to explore matcher more deeply from too.

I've been working on other modules for a while so I've not really looked much at this. Today I came back to it and read through the Matcher documentation on Hackage.

The module seems quite simple and logical and seems to do exactly what I want to do with it's with function, but I don't really have any context for how to apply it and the example code given won't compile for me.

I can't find anything about it anywhere else on the internet. If you get a chance, if you could provide any example of a use case that I could start from then it'd me much appreciated.

Thanks,
Oscar

I actually managed to do what I needed to do by reducing the list column with rbind as the function and an empty matrix with 0 rows and columns equalling max atomic vector length inside the list as the accumulator (atomic vectors with less were recycled, which is ok as all elements represent sorted sets). Then I passed to Haskell as a number of atomic vector columns and zipped back together. Finally, converted to a set and back to a list to remove duplicate arguments -- Pretty hacky, but it works in this instance.

I'd still be interested to learn how to use Matcher though (and while this works, it'd definitely be a better solution) so any reference material would be appreciated.