long2ice / asynch

An asyncio ClickHouse Python Driver with native (TCP) interface support.

Home Page:https://github.com/long2ice/asynch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pandas.DataFrame support

ghuname opened this issue · comments

@long2ice
Do you plan to support pandas dataframes that are heavily used by data scientists.
It would be very nice to be able to select directly to dataframe.
Is such feature on your roadmap?

No, what's the relation between asynch and pandas? What's the meaning to select directly to dataframe?

Well, we are talking here about clickhouse database, and how to access it asynchronously.
Most of the time, we will selecting data from database.
When you are selecting data from database you need some complex structure to hold the result.
If we are talking about python, pandas dataframe has no alternative for such purpose, if you need to further do something with the data (data wrangling, machine learning...).

At the moment I am using clickhouse_driver and DB API connection and pandas.read_sql function (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html).

As aioch doesn't have DB API connection I am kind of stuck with that approach.
I will try to do the same with your asynch driver. I hope it will work.

Well, so did you try it? I never use pandas, and asynch also support DB API.

I tried this:

async def main():
    conn = await connect(
        host="127.0.0.1",
        port=9001,
        database="default",
    )

    async with conn.cursor() as cursor:
        await cursor.execute("SELECT 1")
        ret = cursor.fetchone()
        print(ret)

I got (1,) as a result, but where are column types? As I can see you are using with_column_types=True in response = await execute(query, args=args, with_column_types=True, **execute_kwargs), but you are not returning them.

Anyway I hoped that the following will work, but it doesn't:

import asyncio
from asynch import connect
import pandas as pd
from jinjasql import JinjaSql

async def main():
    conn = await connect(
        host="127.0.0.1",
        port=9001,
        database="default",
    )

    jsql = JinjaSql(param_style='pyformat')

    sql_templ = 'select 1'
    params = {}

    query, bind_params = jsql.prepare_query(sql_templ, params)
    df = pd.read_sql_query(query, conn, params=bind_params) # I tried df = await pd.read_sql(...) but it hasn't worked

    print(df)

asyncio.run(main())
RuntimeWarning: coroutine 'Cursor.execute' was never awaited
  cur.execute(*args, **kwargs)

Looks like you should create/delete the cursor on the fly in the background for such usage.

@long2ice can you please comment my findings?

Does pandas support asyncio?

No