kirill-gerasimov / ds

Data stuff

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Stuff

See notebooks in the nb dir. Some extras:

  1. Intro
  2. Iris dataset exploration
  3. Credit fraud detection
  4. Text classification, Reuters21578 news dataset
  5. Pandas and Numpy Intro / Cheatsheets

6, 7. Logistic regression, decision trees, boosting, hyperparameter tuning -- discussed offline

8a. SVM -- a brief comparison of kernels

8b. Topic modeling applied to get extra features

9a. Some experiments with word2vec incl. a word analogies demo and text classification

9b. URLs to some curious resources about neural networks

Resources

  1. Vorontsov / Yandex / MIPT
  2. Natural language processing overview course
  3. Andrew Ng
  4. ODS community course

Methods

  1. classifiers: logreg, MLP, kNN, SVM, decision trees, RF, gradient boosting, naive Bayes
  2. dimensionality reduction, visualization: t-SNE (vis. only), PCA, UMAP
  3. text pre-processing: stemming / lemmatization, bag of words approach, TF-IDF
  4. topic modeling (SVD, LDA)
  5. word vector representations (word2vec, GloVe, fastText)
  6. sentence / document embeddings (SIF, doc2vec, StarSpace)
  7. advanced models: LSTM, GRU, CNN, ELMo, ULMFiT, Transformer

Tools

  1. python, numpy, scipy, pandas, matplotlib, seaborn, jupyter -- all the basics
  2. scikit-learn -- classical algorithms
  3. tensorflow -- neural networks and more
  4. gensim -- topic modeling
  5. nltk -- text processing
  6. spaCy -- advanced natural language processing

About

Data stuff


Languages

Language:Jupyter Notebook 90.7%Language:HTML 9.3%