capeprivacy / cape-python

Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Exploring Snowflake Support via PyArrow & Pandas

kjam opened this issue · comments

Is your feature request related to a problem? Please describe.
We would like to eventually support workflows that use SnowflakeDB, and one idea that has come up is to use their integration with PyArrow and Pandas to learn more about Arrow but also to support Snowflake data -

Describe the solution you'd like
An initial test of whether this workflow is feasible would be useful to see what benchmarks we can create for pulling data into Pandas and then applying Cape policy to it. It might also be worthwhile diving into the library internals to see how the Query -> Arrow -> Dataframe workflow works!

Describe alternatives you've considered
We have explored the idea of a ODBC or JDBC layer as another way of solving this issue.

Additional context
We could pick an interesting example use case and check it out! Would also be excited to hear about architecture choices they made here and see if we can explore how we might apply policy in/to Arrow??