Data engineering tasks:
- Write tests for each function
- Create a python package map
- Include optional arguments in stopwords functions -- I found a good list to be used, but still want to set the option to chose another one
Modeling the data:
- Topic modeling
- Interpret topic modeling
- Coursera Course: apply Logistic Regression;
- Bayesian classification
Ruled based functions:
- Themes based on regular expressions -- I already have a function that does it -- refactoring it
-
Using in consultancy projects (AME, Unilever, find new partners)
-
Write books and articles: Covid, sexuality
-
Apply to Recurse Center: Set to Dec/ 2020 https://www.recurse.com/apply/retreat
-
Lectures and courses: PUC-SP
-
Master's project Spring 2021: Jan/2021 - Dec/2022
Build data processing, and modeling pipelines. Built a pipeline and a package to release. The goal is to use this format to gather codes related to each of the following steps.
Levels:
- Get data: from APIs, scrapers, databases, files
- EDA: exploratory data analysis
- Preprocessing: cleaning, normalizing and vectorizing
- Modeling: select, test, validate,
- Plot
In the future:
- Deployment
Structural concept: https://miro.com/app/board/o9J_koNsYBo=/
Spreadsheet with the diagnosis, normalize and cleaning parts: https://docs.google.com/spreadsheets/d/1hG2RJgBUjyTUhxUg6ATG8_y9RH81P1bWFoBfUmuZWFY/edit#gid=0
This is a repository to study NLP -- also my way to organize my coding learning.
Theory readings:
- On Chomsky and the Two Cultures of Statistical Learning http://norvig.com/chomsky.html
- Michael Silverstein: helped define the field of sociolinguistics
- Language in Culture: The Semiotics of Interaction (Masterclass)
Current resources:
- NLP coursera couse by FastAI: https://www.coursera.org/learn/classification-vector-spaces-in-nlp/ungradedLab/TXtyC/natural-language-preprocessing
- FastAI github rep: https://github.com/fastai/course-nlp
Tools to get used to:
- http://www.nltk.org/
- https://developer.twitter.com/en/docs
- https://developer.twitter.com/en/developer-terms/policy
- https://brightplanet.com/2013/06/25/twitter-firehose-vs-twitter-api-whats-the-difference-and-why-should-you-care/
- Learn about Linguistics, NLP and speech recognition
- Follow up with academic career
- Opportunity to live in the US
- Opportunity to use OPT to work in the US
- Define a goal
- Write a project describing the goal, expected results, methods to achieve the results, and schedule (with milestones)
- Collaborating with academic community
- Going to classes
- Articles
- Books
- New consultancy projects
- New company (?)