This library helps to read and write data from most of the data sources. It accelerate the ML and ETL process without worrying about the multiple data connectors.
pip install -U dataligo
Install from sources
Alternatively, you can also clone the latest version from the repository and install it directly from the source code:
pip install -e .
>>> from dataligo import Ligo
>>> from transformers import pipeline
>>> ligo = Ligo('./ligo_config.yaml') # Check the sample_ligo_config.yaml for reference
>>> print(ligo.get_supported_data_sources_list())
['s3', 'gcs', 'azureblob', 'bigquery', 'snowflake', 'redshift', 'starrocks', 'postgresql', 'mysql', 'oracle', 'mssql', 'mariadb', 'sqlite', 'elasticsearch', 'mongodb']
>>> mongodb = ligo.connect('mongodb')
>>> df = mongodb.read_as_dataframe(database='reviewdb',collection='reviews')
>>> df.head()
_id Review
0 64272bb06a14f52787e0a09e good and interesting
1 64272bb06a14f52787e0a09f This class is very helpful to me. Currently, I...
2 64272bb06a14f52787e0a0a0 like!Prof and TAs are helpful and the discussi...
3 64272bb06a14f52787e0a0a1 Easy to follow and includes a lot basic and im...
4 64272bb06a14f52787e0a0a2 Really nice teacher!I could got the point eazl...
>>> classifier = pipeline("sentiment-analysis")
>>> reviews = df.Review.tolist()
>>> results = classifier(reviews,truncation=True)
>>> for result in results:
>>> print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.9997
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.999
label: POSITIVE, with score: 0.9967
>>> df['predicted_label'] = [result['label'] for result in results]
>>> df['predicted_score'] = [round(result['score'], 4) for result in results]
# Write the results to the MongoDB
>>> mongodb.write_dataframe(df,'reviewdb','review_sentiments')
Data Sources | Type | pandas | polars | dask |
---|---|---|---|---|
S3 | datalake |
|
|
|
GCS | datalake |
|
|
|
Azure Blob Stoarge | datalake |
|
|
|
Snowflake | datawarehouse |
|
|
|
BigQuery | datawarehouse |
|
|
|
StarRocks | datawarehouse |
|
|
|
Redshift | datawarehouse |
|
|
|
PostgreSQL | database |
|
|
|
MySQL | database |
|
|
|
MariaDB | database |
|
|
|
MsSQL | database |
|
|
|
Oracle | database |
|
|
|
SQLite | database |
|
|
|
MongoDB | nosql |
|
|
|
ElasticSearch | nosql |
|
|
|
DynamoDB | nosql |
|
|
|
Redis | nosql |
|
|
|
Some functionalities of DataLigo are inspired by the following packages.
-
DataLigo used Connectorx to read data from most of the RDBMS databases to utilize the performance benefits and inspired the return_type parameter from it