blindglobe / rho

Rho is yet another array/dataframe package with some typing magic in the access/setting.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

This package was written by Marco Antoniotti in April 2014 as a response to a query on comp.lang.lisp from AJ (Tony) Rossini <blindglobe@gmail.com> regarding types/arrays and R/style dataframes.

Further packaging and repository work has been done by Tony.

This package consists of STRAND and DATA-FRAME classes and a REF$ access element.

STRANDs are named/(optionally typed) columns of data elements. The optional typing means that from a CL perspective, we use T for general columns, and a more specific type (either user specified or computed using upgrade-element-type-array (FIXME:AJR - not sure this is right).

If a STRAND is of a statistical type, then it should also have appropriate methods for summarization. We will need to give an API and examples for this. It isn’t clear if this should be in the same package or in a different one. For example, numbers can be summmarized by the standard 5-number summary if that is indeed what they are (eg if they are not integers or FIXNUMs masquerading for factors).

a DATA-FRAME is a VECTOR of STRANDs (or VECTORs). our example uses the following types: general string restricted integer range DEFSTRUCT’d simple point class DEFCLASS’d simple point class

For the summarization to work, we need to write the appropriate methods.

SUMMARIZE place

should summarize an observation as a vector (perhaps single) of numberrs. If a simple observation, the summary should be a number, if complex, it is an N-vector. This result should have annotations which can be used to describe the number(s).

SUMMARIZE strand|df

should return a vector based on SUMMARIZE values. Numbers can be described using standard 5-num summary (median/min/max/quartiles) plus mean.

Example use of this package

(ql:quickload :rho)

(taken from a response from Marco to Tony):

> Does this help, Marco?

I am sorry, but I still do not understand what you mean by “computing with types”?

This is what I cooked up.

(in-package :rho-user)
(defparameter df (make-data-frame '(foo #(1 2 3)) 
                         '(bar ("a" "s" "d") string) 
                         '(baz (100 102 97) (integer 90 110))))
df
#S(DATA-FRAME :COLUMNS #(#S(STRAND :NAME FOO :DATA #(1 2 3)
              :ELEMENT-TYPE T)
              #S(STRAND :NAME BAR :DATA #("a" "s" "d") 
                        :ELEMENT-TYPE STRING)
              #S(STRAND :NAME BAZ :DATA #(100 102 97)
              :ELEMENT-TYPE (INTEGER 90 110)))) 
(in-package :rho-user)
(pprint-data-frame df) 
        FOO       BAR       BAZ 
0         1       "a"       100 
1         2       "s"       102 
2         3       "d"        97 
NIL 
(in-package :rho-user)
(setf (ref$ df 'bar 2) 42) 

and it errors with:

Error: The assertion (TYPEP V COL-TYPE) failed. 
  1 (continue) Retry assertion. 
  2 (abort) Return to level 0. 
  3 Return to top loop level 0. 

Type :b for backtrace or :c <option number> to proceed. 
Type :bug-form "<subject>" for a bug report template or :? for other options. 
(in-package :rho-user)
(data-frame-column-types df) 
(T STRING (INTEGER 90 110)) 
(in-package :rho-user)
(setf (ref$ df 'bar 2) "Works!") 
"Works!" 
(in-package :rho-user)
(pprint-data-frame df) 
        FOO       BAR       BAZ 
0         1       "a"       100 
1         2       "s"       102 
2         3  "Works!"        97 
NIL 
(in-package :rho-user)
(typep (ref$ df 2 1) (ref$ (data-frame-column-types df) 2)) 
T

Note that the REF$ is your XREF. A STRAND is just a “named vector” with type info recorded in order to circumvent the implementation laziness.

The code is longish and I can send it to you or whoever else wants it. Most of the hairiness goes into REF$. More logic can go into a generalized JOIN operation (or CONCAT).

Of course, no real compile-time use of the types can be made, but that’s just the way it is.

API

Rejoinder

About

Rho is yet another array/dataframe package with some typing magic in the access/setting.

License:Other


Languages

Language:Common Lisp 100.0%