tested with python 2.7, & 3.4, 3.5
compatible with many existing dataframes: e.g. pandas
requirements: numpy, matplotlib (for plotting only) conversion to other formats require the appropriate library.
.. notes::
* tested with python 2.7, & 3.4
* requirements: numpy
* conversion to other formats require the appropriate library.
:author: Morgan Fouesneau
Documentation and API: link
I always found myself writing snippets around numpy
, matplotlib
, pandas and
other file readers. These are often the same things: read file foo
and plot
a
against b
where something is takes some values
.
It gets always very complex when you want to make something non-standard, for
instance, for each of the 10 classes given according to this selection, make a
scatter plot with these specific markers and color coded by another column.
I was basically tired of all the packages doing fancy things and not allowing basics or requiring a lot of dependencies.
In particular this package allows easy conversions to many common dataframe
containers: dict
, numpy.recarray
, pandas.DataFrame
, dask.DataFrame
,
astropy.Table
, xarray.Dataset
, vaex.DataSetArrays
.
Based on the most basic functions and in particular methods of dict
, I wrote
this package. This basically builds advance-ish access to column oriented data
through 3 main classes, 2 of which handle data. This may not fit all needs,
nor large data access.
-
dictdataframe
: an advanced dictionary object. A simple-ish dictionary like structure allowing usage as array on non constant multi-dimensional column data. The :class:DataFrame
container allows easier manipulations of the data but is basically a wrapper of many existing function around adictionary
object. -
simpletable
: a simplified version of ezTables The :class:SimpleTable
allows easier manipulations of the data but is basically a wrapper of many existing function around anumpy.recarray
object. It implements reading and writing ascii, FITS and HDF5 files. The :class:AstroTable
built on top of the latter class, adds-on astronomy related functions, such asconesearch
-
plotter
: In this package implements :class:Plotter
, which is a simple container to dictionary like structure (e.g. :class:dict
, :class:np.recarray
, :class:pandas.DataFrame
, :class:SimpleTable
). It allows the user to plot directly using keys of the data and also allows rapid group plotting routines (groupy
andfacets
). Note that is also allows expressions instead of keys. This interface should basically work on any dictionary like structure
Both data structures implements common ground base to line and column access in
the same transparent manner. These objects implement for instance array
slicing, shape, dtypes on top of which they implement functions such as:
sortby
, groupby
, where
, join
and evaluation of expressions as keys. (see
examples below). Both also have a direct access to a Plotter
attribute
These data classes allows easy conversions to many common dataframe
containers: numpy.recarray
, pandas.DataFrame
, dask.DataFrame
,
astropy.Table
, xarray.Dataset
, vaex.DataSetArrays
.
- Some data manipulation basics
>>> t = SimpleTable('path/mytable.csv')
# get a subset of columns only
>>> s = t.get('M_* logTe logLo U B V I J K')
# set some aliases
>>> t.set_alias('logT', 'logTe')
>>> t.set_alias('logL', 'logLLo')
# make a query on one or multiple column
>>> q = s.selectWhere('logT logL', '(J > 2) & (10 ** logT > 5000)')
# note that `q` is also a table object
# makes a simple plot (see :module:`plotter`)
>>> q.Plotter.plot('logT', 'logL', ',')
# export the initial subtable to a new file
>>> s.write('newtable.fits')
# or
>>> s.write('newtable.hd5')
- Convert to other dataframe structures
>>> t = SimpleTable('path/mytable.csv')
>>> t.to_pandas()
>>> t.to_dask(npartitions=5)
>>> d = DictDataFrame(t)
- Make a single plot of 'RA', 'DEC' on which each region 'BRK' is represented by a different color (colormap or other) and different marker.
>>> p = t.Plotter.groupby('BRK', markers='<^>v.oxs', colors='parula')
>>> p.plot('CRA', 'CDEC', 'o')
>>> import pylab as plt
>>> plt.legend(loc='best', numpoints=1)
>>> plt.xlim(plt.xlim()[::-1])
>>> plt.xlabel('RA')
>>> plt.ylabel('DEC')
- make a more complex plot: plot the histogram distribution of 'AV' per region given by 'BRK', with given color scheme per region value and individual plots with shared axis
>>> t.Plotter.groupby('BRK', facet=True, \
colors=plt.cm.parula, sharex=True, \
sharey=True).hist('AV',
bins=np.linspace(t.AV.min(),
t.AV.max(), 20), normed=True)
>>> for ax in plt.gcf().axes[-3:]: ax.set_xlabel('AV')