xhochy / fletcher

Pandas ExtensionDType/Array backed by Apache Arrow

Home Page:https://fletcher.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fletcher.read_csv?

dhirschfeld opened this issue · comments

Just wondering if a function fletcher.read_csv would be in scope which reads csv data directly into arrow tables?

Whilst pd.read_csv is damn good and the workhorse of many data analytics pipelines it suffers slightly from pandas' limited type system which I'm hoping could be improved using native arrow types. Also it would be nice to not have to go through pandas/python types at all and so avoid the serialization cost

I rather consider this in the scope of arrow itself since arrow also handles parsing of other file formats, e.g. feather, parquet, etc. To parse the CSV file and put it into an arrow table without any conversion in between is otherwise very difficult, if not impossible.
fletcher is rather a library to expose the arrow format to pandas as an ExtensionArray and provide some analytical functions leveraging numba.

Fair enough, I'll move the feature request to arrow!

Update: It's already on the roadmap - ARROW-25