Both DataFrames and DataTables are coupled to the underlying array structures DataArrays and NullableArrays respectively. TableFrames.jl provides a minimal interface for storing and manipulating tabular data agnostic to the storage method. Given the current collection of existing packages for representing tabular data the name "TableFrames" is not intended to be taken seriously and this package will never be registered. This is only a prototype!*
DataFrame
s simply specify a set of restrictions and operations around an Associative{Symbol, <:AbstractVector}
(e.g., Store{<:AbstractVector}
).
- All columns (value vectors) must be the same length.
- All column names are
Symbol
s - All column data must of type
AbstractVector
- Indexing by a single column name returns the
AbstractVector
for that column. - Indexing by row number returns the
NamedTuple
for that row. - Indexing by a column and multiple rows returns a
view
of theAbstractVector
for that column - Indexing by multiple columns (and multiple rows) will return an new
DataFrame
with the specified columns and references to the original data.
- Iterating over
keys
andvalues
to maintain an associative-like interface. - Iterating over
names
should return theSymbol
names for each column. - Iterating over
columns
should return the vectors for each column. - Iterating over
rows
should returnNamedTuple
s for each row.
- Set columns, rows and elements
insert!
a new row into the table.append!
rows from oneDataFrame
into another.add
a column to a table (this will create a new table as it changes the parameterization of the columns)merge!
columns from oneDataFrame
into another.
- Handling missing data (e.g.,
dropna
) - Providing more complex methods (e.g.,
join
,sort
,by
,aggregate
). - Conversion to other formats (table types or arrays).
- Exporting to different file formats.
- Table indexing (e.g., arbitrary row indexing and querying), although we might want to provide some API for extending.
- Statistical Modeling