kwchurch / JSALT_Better_Together

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Better Together Team Github

JSALT (Jelinek Summer Workshop on Speech and Language Technology):

Useful links

  1. Final Report (YouTube) and JSALT-2023 Team Page
  2. Web Page
  3. Slides
  4. large datasets from Globus and What's Where (on Globus)
  5. Deliverables
  6. Reading List
  7. Notation
  8. SciRepEval Baselines
  9. Zoom Link and Meeting Notes
  10. Status of Production Runs
  11. Similar Venues

Installation

git clone https://github.com/kwchurch/JSALT_Better_Together
pip install -r requirements.txt

Some useful environment variables; you may need to set these up differently, depending on where you put stuff. JSALTsrc should be assigned to the src directory in this repo. JSALTdir should be assigned to the data from Globus.

Some examples below depend on JSALTdir and some do not. If you cannot download JSALTdir, try the examples that do not require that.

export JSALTdir=/work/k.church/JSALT-2023/
export JSALTsrc=/work/k.church/githubs/JSALT_Better_Together/src

export specter=$JSALTdir/semantic_scholar/embeddings/specter
export specter2=$JSALTdir/semantic_scholar/embeddings/specter2
export proposed=$JSALTdir/semantic_scholar/embeddings/proposed
export scincl=$JSALTdir/semantic_scholar/embeddings/scincl

If you have access to the Northeastern Discovery Cluster, you can request access to the cluster by filling out a ticket here, and then you can use my settings for these environment variables. You should also request to be added to the group: nlp.

Reading List (and Pre-computed Output)

See here, and especially this. The last example starts with papers we should all be reading, and finds some documents similar to those.

Examples

  1. Depend on $JSALTsrc, but not $JSALTdir
    1. Scripts for using Semantic Scholar API
    2. Scripts for using Models from HuggingFace
  2. Depends on both $JSALTsrc and $JSALTdir
    1. Find similar documents

Exercises for JSALT-2023 summer school: here.

About

License:MIT License


Languages

Language:Python 40.4%Language:C 30.0%Language:Jupyter Notebook 23.7%Language:Roff 2.7%Language:Shell 2.3%Language:Cython 0.5%Language:Makefile 0.4%