rbroc / sbert-align

Compute alignment using SentenceBERT

Repository from Github https://github.comrbroc/sbert-alignRepository from Github https://github.comrbroc/sbert-align

sbert-align

Compute parent-child alignment using SentenceBERT

Usage

  1. Create a virtual environment (not necessary).

You can do so by typing:

python3 -m venv PATH_TO_ENV
source PATH_TO_ENV/bin/activate

Replace PATH_TO_ENV with path for virtual environment

  1. Install requirements pip install -r requirements.txt

  2. Run the align.py script.

python3 align.py --lag 1 --model all-mpnet-base-v2.

Arguments are customizable.

Note that the script will be looking for a transcripts.txt or surrogates.txt file in the data folder, and outputs will be saved in an outputs folder.

  1. Deactivate once you're done, by running deactivate.

Output columns

  • Turn metadata: (ChildID|ID, Visit, Turn)
  • Lag: 1 if alignment is computed with previous turn, 2 if two turns back. Note that even numbers compute alignment with previous turns from same speaker;
  • ModelId: Which SentenceBERT checkpoint we are using, see https://www.sbert.net/docs/pretrained_models.html for available models;
  • SemanticAlignment: cosine similarity between sequence encodings;
  • AlignmentType: 'child2caregiver' or 'caregiver2child'

Notes on study 2

  • We keep the second iteration of a conversation, when the Turn ID is repeated
  • Turns where previous index is missing are not coded for current-to-1back alignment, and for 1back-to-2back alignment
  • Turns where the preceding turn does not follow the previous one are not coded for 1back-to-2back alignment

Potential expansion:

  • Make synthetic raw data for better reproducibility

About

Compute alignment using SentenceBERT


Languages

Language:Jupyter Notebook 92.4%Language:Python 7.3%Language:Shell 0.3%