fletcher.read_csv?

Question

fletcher.read_csv?

dhirschfeld opened this issue 6 years ago · comments

Just wondering if a function fletcher.read_csv would be in scope which reads csv data directly into arrow tables?

Whilst pd.read_csv is damn good and the workhorse of many data analytics pipelines it suffers slightly from pandas' limited type system which I'm hoping could be improved using native arrow types. Also it would be nice to not have to go through pandas/python types at all and so avoid the serialization cost

Florian Jetter · Answer 1 · Mon Jul 02 2018 21:40:04 GMT+0800 (China Standard Time)

I rather consider this in the scope of arrow itself since arrow also handles parsing of other file formats, e.g. feather, parquet, etc. To parse the CSV file and put it into an arrow table without any conversion in between is otherwise very difficult, if not impossible.
fletcher is rather a library to expose the arrow format to pandas as an ExtensionArray and provide some analytical functions leveraging numba.

Dave Hirschfeld · Answer 2 · Tue Jul 03 2018 06:02:23 GMT+0800 (China Standard Time)

Fair enough, I'll move the feature request to arrow!

Update: It's already on the roadmap - ARROW-25