chrispaulca / nlp-course

University of San Francisco's Natural Language Processing course

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

University of San Francisco's Natural Language Processing (NLP) course
MSAN 631 04 Summer 2018

GO FIND IMAGE

The mystery lies in the use of language to express human life.
– Eudora Welty


Logistics

Instructor: Brian Spiering | Slack DM (more perfered) | bspiering@usfca.edu (less perfered)
Office hours: By appointment
Grader: Jaime Almeida
Office hours: By appointment
Website: github.com/brianspiering/nlp-course
Communciation: Slack #nlp-2018
Location: 101 Howard, room 154 - 156
Time: Tuesdays & Thursdays, 8:55am - 10:50am


Course Description

This course covers the fundamental concepts and algorithms in Natural Language Processing (NLP). The goal of the course is to understand text using computational statistics.

This course will start with basic text processing techniques (such as regular expressions) and then cover advanced techniques (text classification and topic modeling). The emphasis will be on contemporary best practices in industry, including Deep Learning and text embeddings. Along the way we will touch upon text mining, information retrieval, and computational linguistics.

This is course is a "buffet" format, a sample of many things, but you will not get "full" on any one topic. People get a PhD in each of these individual topics.

Remember - A little bit of knowledge and a lot of "how to" goes a long way in Data Science.

Prerequisites

  • Working knowledge of probability (e.g., calculate conditional probability and apply Bayes Theorem)
  • Basic statistics (e.g., the difference between pmf and pdf)
  • One course in machine learning
  • Intermediate Python (e.g., the ability to create classes). Based on previous classes, the more Python a student knows the more NLP he/she learns during the course.

Learning Outcomes

By the end of the course, you should be able to:

  1. Apply fundamental NLP concepts and algorithms to solve real-world problems
  2. Write efficient code to process and model text data
  3. Classify and cluster text data
  4. Create and use vector representations of words and documents
  5. Build an end-to-end system to model meaning in text

Course Schedule

  1. [05/22] Welcome & NLP Overview
  2. [05/24] Regular Expressions
  3. [05/29] Segmenting, Tokenizing, & Stemming
  4. [05/31] Language Modeling
  5. [06/03] Text Embeddings: Words
  6. [06/07] Text Embeddings: Documents et al.
  7. [06/12] Word Tagging: POS (part of speech) and NER (named entity recognition)
  8. [06/14] Text Classification / Sentiment Analysis with Naive Bayes
  9. [06/19] Text Classification with Deep Learning
  10. [06/21] Information Retrieval / Search Engineering
  11. [06/26] Topic Modeling with Latent Dirichlet allocation (LDA)
  12. [06/28] Final Project Poster Presentations

Topics Not Covered

  • Theory. We are only going to cover applied parts of NLP, aka tips n' tricks for getting stuff done.
  • Grammar. Grammar kinda sucks but it is a very powerful method for understanding language.
  • Non-English languages. I ❤️ other languages, and they are very important to understanding NLP. There is just enough not time!
  • Machine Translation. Again very important and incredible breakthroughs have been made. There is not enough time to adequately cover it.
  • Natural Language Understanding (NLU). Finding "meaning" in text. We'll spend most of our time focused on lower levels of processing.
  • Natural Language Generation (NLG). We'll only going to briefly touch on how to programmatically create text during the Language Modeling section.
  • Speech Recognition. For this class, we'll assume audio waves have been digitized into text. In the last couple of years, speech-based language processing has be revolutionized and it is well worth looking into.

Grading

Item Weight
Participation 30%
Labs 30%
Final Project 40%

Course grades range from “A” to “F.” The MSDS program considers a grade of "A" to represent exceptional work with respect to both the instructor's expectations and peer student achievements. A grade of "B" represents the expected outcome, what is called "competence" in a business setting. A "C" grade represents achievements lower than the instructor's expectations for competence in the subject. A grade of "F" represents little or no work in the course.

Participation

You must show up to each session prepared. Each person is important to the dynamic of the class, and therefore students are required to participate in class activities. Expect to be "cold called". I call on students at random not to put you on the spot but to keep you engaged in the material at all times.

Attendance is mandatory. It is the responsibility of the student to attend all classes. If you have to miss class, due to sickness or other circumstances, please notify your instructor by Slack in advance. Supporting documents (e.g., doctor’s notes) should accompany absences due to sickness.

Labs

The labs will be hands-on activities. They will require a combination of coding and writing. The coding sections will be implementing algorithms from scratch or applying common libraries (e.g., scikit-learn, nltk, and keras). The writing sections will focus on communication to technical and nontechnical audiences.

Final Project

Details in Final Project Folder.


Useful Stuff To Know About

Course Structure

This course will be partly "flipped", basic lectures will be videos watched before class. In class lectures will cover complex topics in an active learning-style. You'll be writing a lot of code and completing many projects during class time.

Textbooks

There are no required textbooks for this course. Preparation materials (e.g., videos, articles, and blog posts) will be assigned for each session.

Academic Integrity

As a Jesuit institution committed to cura personalis — the care and education of the whole person — USF has an obligation to embody and foster the values of honesty and integrity. USF upholds the standards of honesty and integrity from all members of the academic community. All students are expected to know and adhere to the University's Honor Code. You can find the full text of the code online at usfca.edu/academic_integrity. The policy covers:

  • Plagiarism — intentionally or unintentionally representing the words or ideas of another person as your own; failure to properly cite references; manufacturing references.
  • Working with another person when independent work is required.
  • Submission of the same paper in more than one course without the specific permission of each instructor.
  • Submitting a paper written by another person or obtained from the internet.
  • The penalties for violation of the policy may include a failing grade on the assignment, a failing grade in the course, and/or a referral to the Academic Integrity Committee.

Students with disabilities

If you are a student with a disability or disabling condition, or if you think you may have a disability, please contact USF Student Disability Services (SDS) at 415 422-2613 within the first week of class, or immediately upon onset of disability, to speak with a disability specialist.

If you are determined eligible for reasonable accommodations, please meet with your disability specialist so they can arrange to have your accommodation letter sent to me, and we will discuss your needs for this course. For more information, please visit: http://www.usfca.edu/sds or call (415) 422-2613.

Behavioral Expectations

All students are expected to behave in accordance with the Student Conduct Code and other University policies (see http://www.usfca.edu/fogcutter/). Open discussion and disagreement is encouraged when done respectfully and in the spirit of academic discourse. There are also a variety of behaviors that, while not against a specific University policy, may create disruption in this course. Students whose behavior is disruptive or who fail to comply with the instructor may be dismissed from the class for the remainder of the class period and may need to meet with the instructor or Dean prior to returning to the next class period. If necessary, referrals may also be made to the Student Conduct process for violations of the Student Conduct Code.

Learning & Writing Center

The Learning & Writing Center provides assistance to all USF students in pursuit of academic success. Peer tutors provide regular review and practice of course materials in the subjects of Math, Science, Business, Economics, Nursing and Languages. Other content areas can be made available by student request. To schedule an appointment, log on to TutorTrac at https://tutortrac.usfca.edu. Students may also take advantage of writing support provided by Rhetoric and Language Department instructors and academic study skills support provided by Learning Center professional staff. For more information about these services contact the Learning & Writing Center at 415.422.6713, lwc@usfca.edu or stop by Cowell 215. Information may also be found at www.usfca.edu/lwc.

Counseling and Psychological Services

Our diverse staff offers individual, couple, and group counseling to student members of our community. Services are confidential and free of charge. Call 415.422.6352 for an initial consultation appointment. Having a crisis at 3 AM? We are still here for you. Telephone consultation after hours is available between the hours of 5:00 PM to 8:30 AM; call the above number and press 2.

Confidentiality, Mandatory Reporting and Sexual Assault

As an instructor, one of my responsibilities is to help create a safe learning environment on our campus. I also have a mandatory reporting responsibility related to my role as a faculty member. I am required to share information regarding sexual misconduct or information about a crime that may have occurred on USFs campus with the University. Here are other resources:

  • To report any sexual misconduct, students may visit Anna Bartkowski (UC 5th floor) or see many other options by visiting our website: www.usfca.edu/student_life/safer.
  • Students may speak to someone confidentially, or report a sexual assault confidentially by contacting Counseling and Psychological Services at 415-422-6352.
  • To find out more about reporting a sexual assault at USF, visit USFs Callisto website at: www.usfca.callistocampus.org.
  • For an off-campus resource, contact San Francisco Women Against Rape (SFWAR) (415) 647-7273 (www.sfwar.org).

About

University of San Francisco's Natural Language Processing course


Languages

Language:Jupyter Notebook 90.8%Language:Python 7.4%Language:Shell 1.8%