DanielFatkic / datto

Data Tools (Dat To) is a package with various data tools to help in data analysis and data science work, such as natural language processing and machine learning techniques.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Installation

pip install datto

Overview

datto is a package with various data tools to help in data analysis and data science work.

You can find the documentation here.

Some examples of what you can do:

  • Remove links from some text
  • Extract body of an email only (no greeting or signature)
  • Easily load/save data from S3
  • Run SQL from Python
  • Explore data - check for mistyped data, find correlated data
  • Assign a given user to an experimental condition
  • Create an HTML dropdown from a DataFrame
  • Find the most common phrases by a category
  • Classify free text responses into any number of meaningful groups (e.g. find survey themes)
  • Make a simple Python logger with default options
  • Take some data and test a bunch of machine learning models on it

For detailed examples of how you can use it, check out this Juypter notebook.

Other Templates

Check out the templates folder for files and code snippets that automate certain tasks, but don't fit within the realm of a Python package.

Recommended: You can easily reuse some of these by copying the file contents into a text expander app.

Contributing

Create virtualenv (specify version of Python you want):

pyenv virtualenv 3.6 datto

Activate virtualenv:

pyenv activate datto

Install dependencies (specified in pyproject.toml file) in virtualenv:

poetry install

To add any new dependencies you need to Poetry, run:

poetry add PACKAGE_NAME

Run tests:

Run the following to make sure all tests pass:

make test

Submitting a change:

Create a PR with your desired change(s), and request review from the code owner!

About

Data Tools (Dat To) is a package with various data tools to help in data analysis and data science work, such as natural language processing and machine learning techniques.

License:MIT License


Languages

Language:HTML 55.9%Language:Python 15.5%Language:Jupyter Notebook 15.3%Language:JavaScript 10.5%Language:CSS 2.7%Language:Shell 0.2%Language:Makefile 0.0%