hiral72/auto-ml

Need

Not everyone has knowledge of ML and their models to recognize certain types of patterns within a huge dataset. To an ordinary user, it would be very difficult training a model over a set of data, providing it with an algorithm that it can use to reason over and learn from that data and also perform operations like preprocessing, imputation, visualization etc. A system to input data and get appropriate diagnosis along with an auto selected model would come in handy especially to the modern industrial sectors. The platform allows users who don't have any background knowledge of ML or its models and operations to predict and analyze data with ease.

Description

A platform that eases the work of data and business analysts in generating inferences from data without having knowledge of the coding side of things. It will provide complete data handling capabilities from ETL(Extract, Transform and Load) pipelines needed to build the dataset, automated data profiling and cleaning, analyzing how the data changes over time, improving the data quality. As the next step, neural networks would be automatically generated to perform classification or regression tasks on any target variable from the dataset. The platform will also create and suggest beautiful visualizations for the given dataset that can help drive decisions and understand the data at hand better.

Features

Allow the user to input a dataset in the form of CSV format.
Perform basic operations on the input dataset such as identification of columns, their data types, statistics like mean, min, max, grouping etc.
Perform data imputation for missing data for a given dataset.
Prepare the input dataset by applying various preprocessing techniques like handling outlier, one hot encoding, feature scaling etc
Develop an algorithm for Automatic Model Selection, using a genetic approach that automatically and efficiently finds the most suitable neural network model for a given dataset.
Develop an auto data visualization algorithm to show top k data visualization for a given dataset.

Application Screenshots

Home Page

Signup

Login

Dataset Input

Displaying all the datasets created by the user

Data Imputation

Dataset Details

Edit Dataset Details

Dataset Catalog

Data Visualisation

Model Selection

After the model selection task is completed:

How To Use

Software Requirements

VSCode
MongoDB

Installation

Clone the repo

git clone https://github.com/deepanshu2506/auto-ml.git

Install the dependencies by running:

pip3 install -r requirements.txt

yarn install

Run using Command Prompt

flask run

cd ./frontend
yarn start

Tech stack

Frontend : React
Backend : Flask(Python)
Database : MongoDB

hiral72 / auto-ml