fletcher.read_csv?
dhirschfeld opened this issue · comments
Just wondering if a function fletcher.read_csv
would be in scope which reads csv data directly into arrow tables?
Whilst pd.read_csv
is damn good and the workhorse of many data analytics pipelines it suffers slightly from pandas' limited type system which I'm hoping could be improved using native arrow types. Also it would be nice to not have to go through pandas/python types at all and so avoid the serialization cost
I rather consider this in the scope of arrow
itself since arrow
also handles parsing of other file formats, e.g. feather
, parquet
, etc. To parse the CSV file and put it into an arrow table without any conversion in between is otherwise very difficult, if not impossible.
fletcher
is rather a library to expose the arrow format to pandas as an ExtensionArray
and provide some analytical functions leveraging numba
.
Fair enough, I'll move the feature request to arrow!
Update: It's already on the roadmap - ARROW-25