Exploring Snowflake Support via PyArrow & Pandas
kjam opened this issue · comments
Is your feature request related to a problem? Please describe.
We would like to eventually support workflows that use SnowflakeDB, and one idea that has come up is to use their integration with PyArrow and Pandas to learn more about Arrow but also to support Snowflake data - https://docs.snowflake.com/en/user-guide/python-connector-pandas.html
Describe the solution you'd like
An initial test of whether this workflow is feasible would be useful to see what benchmarks we can create for pulling data into Pandas and then applying Cape policy to it. It might also be worthwhile diving into the library internals to see how the Query -> Arrow -> Dataframe workflow works!
Describe alternatives you've considered
We have explored the idea of a ODBC or JDBC layer as another way of solving this issue.
We could pick an interesting example use case and check it out! Would also be excited to hear about architecture choices they made here and see if we can explore how we might apply policy in/to Arrow??