lclarete / data_pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Next steps:

Data engineering tasks:

  • Write tests for each function
  • Create a python package map
  • Include optional arguments in stopwords functions -- I found a good list to be used, but still want to set the option to chose another one

Modeling the data:

  • Topic modeling
  • Interpret topic modeling
  • Coursera Course: apply Logistic Regression;
  • Bayesian classification

Ruled based functions:

  • Themes based on regular expressions -- I already have a function that does it -- refactoring it

Plans for this work

  • Using in consultancy projects (AME, Unilever, find new partners)

  • Write books and articles: Covid, sexuality

  • Apply to Recurse Center: Set to Dec/ 2020 https://www.recurse.com/apply/retreat

  • Lectures and courses: PUC-SP

  • Master's project Spring 2021: Jan/2021 - Dec/2022

Data Pipeline

Build data processing, and modeling pipelines. Built a pipeline and a package to release. The goal is to use this format to gather codes related to each of the following steps.

Levels:

  • Get data: from APIs, scrapers, databases, files
  • EDA: exploratory data analysis
  • Preprocessing: cleaning, normalizing and vectorizing
  • Modeling: select, test, validate,
  • Plot

In the future:

  • Deployment

Structural concept: https://miro.com/app/board/o9J_koNsYBo=/

Spreadsheet with the diagnosis, normalize and cleaning parts: https://docs.google.com/spreadsheets/d/1hG2RJgBUjyTUhxUg6ATG8_y9RH81P1bWFoBfUmuZWFY/edit#gid=0

Theory: Linguistics and NLP

This is a repository to study NLP -- also my way to organize my coding learning.

Theory readings:

  • On Chomsky and the Two Cultures of Statistical Learning http://norvig.com/chomsky.html
  • Michael Silverstein: helped define the field of sociolinguistics
  • Language in Culture: The Semiotics of Interaction (Masterclass)

Current resources:

Tools to get used to:

Master

Benefits from master's degree

  • Learn about Linguistics, NLP and speech recognition
  • Follow up with academic career
  • Opportunity to live in the US
  • Opportunity to use OPT to work in the US

Method

  • Define a goal
  • Write a project describing the goal, expected results, methods to achieve the results, and schedule (with milestones)
  • Collaborating with academic community
  • Going to classes

Expected result

  • Articles
  • Books
  • New consultancy projects
  • New company (?)

About


Languages

Language:Jupyter Notebook 90.0%Language:HTML 8.9%Language:Python 1.2%