achuthasubhash / Tips

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tips

credit: all corresponding resources

FROM VARIOUS RESOURCES

  1. H20.ai

  2. Tpot

  3. autopandas

  4. AutoGluon

  5. autosklearn

  6. autoviml

  7. autoViz

  8. sweetviz (EDA purpose)

  9. pandasprofiling(display whole EDA)

  10. autokeras

  11. pycaret

12.Auto_Timeseries by auto_ts

13.AutoNLP_Sentiment_Analysis by autoviml

14.automl lazy predict

15.bamboolib (python package for easy data exploration & transformation)

16.CUPY (array process parallel in gpu)

17.Dabl has a built-in function that will automatically detect data types and quality issues and apply appropriate pre-processing to a dataset to prepare it for machine learning.

18.dask (parallel comptataion)

19.dataprep (Understand your data with a few lines of code in seconds)

20.Dora library is another data analysis library designed to simplify exploratory data analysis.

21.FastAPI is a modern, fast (high-performance), web framework for building APIs.

22.faster Hyper Parameter Tuning(sklearn-nature-inspired-algorithms)

23.FlashText (A library faster than Regular Expressions for NLP tasks)

24.Guietta (tool that makes simple GUIs simple)

25.hummingbird (make code fastly exexcute)

26.memory-profiler (tell memory consumption line by line)

27.numexpr (incerease speed of execution of numpy)

28.pandarallel (simple and efficient tool to parallelize your pandas computation on all your CPUs)

29.PDFTableExtract(by PyPDF2)

30.PyImpuyte(Python package that simplifies the task of imputing missing values in big datasets)

31.libra(Automates the end-to-end machine learning process in just one line of code)

32.debug code by puyton -m pdp -c continue

33.cURL (This is a useful tool for obtaining data from any server via a variety of protocols including HTTP.)

34.csvkit csvkit is a set of command line tools for working with CSV files. The tasks that it can execute can be divided into three areas: input, processing and output. Let’s look at a quick real-world example of how you can use this.

35.IPython IPython gives access to enhanced interactive python from the shell. In essence, it means you can do most of the things that you can do in a Jupyter Notebook from the command line.

36.pip install faker (Create our own Dataset)

37.Python debugger %pdb

38.πšŸπš˜πš’πš•πšŠ-From notebooks to standalone web applications and dashboards https://voila.readthedocs.io/en/stable/ https://github.com/voila-dashboards/voila

39.πšπšœπš•πšŽπšŠπš›πš— for timeseries data

40.texthero text-based dataset in Pandas Dataframe quickly and effortlessly https://github.com/jbesomi/texthero

41.πš”πšŠπš•πšŽπš’πšπš˜(web-based visualization libraries like your Jupyter Notebook with zero dependencies)

42.Vaex- Reading And Processing Huge Datasets in seconds

43.Uber’s Ludwig is an Open Source Framework for Low-Code Machine Learning

44.Google's TAPAS, a BERT-Based Model for Querying Tables Using Natural Language

45.RAPIDS open GPU Data Science

46.pyforest Lazy-import of all popular Python Data Science libraries. Stop writing the same imports over and over again.

47.Modin Get faster Pandas with Modin

48.faster Hyper Parameter Tuning NatureInspiredSearchCV

49.Dabl has a built-in function that will automatically detect data types and quality issues and apply appropriate pre-processing to a dataset to prepare it for machine learning

About


Languages

Language:Jupyter Notebook 100.0%