Compute parent-child alignment using SentenceBERT
- Create a virtual environment (not necessary).
You can do so by typing:
python3 -m venv PATH_TO_ENV
source PATH_TO_ENV/bin/activate
Replace PATH_TO_ENV with path for virtual environment
-
Install requirements
pip install -r requirements.txt -
Run the
align.pyscript.
python3 align.py --lag 1 --model all-mpnet-base-v2.
Arguments are customizable.
Note that the script will be looking for a transcripts.txt or surrogates.txt file in the data folder, and outputs will be saved in an outputs folder.
- Deactivate once you're done, by running
deactivate.
- Turn metadata: (
ChildID|ID,Visit,Turn) Lag: 1 if alignment is computed with previous turn, 2 if two turns back. Note that even numbers compute alignment with previous turns from same speaker;ModelId: Which SentenceBERT checkpoint we are using, see https://www.sbert.net/docs/pretrained_models.html for available models;SemanticAlignment: cosine similarity between sequence encodings;AlignmentType: 'child2caregiver' or 'caregiver2child'
- We keep the second iteration of a conversation, when the Turn ID is repeated
- Turns where previous index is missing are not coded for current-to-1back alignment, and for 1back-to-2back alignment
- Turns where the preceding turn does not follow the previous one are not coded for 1back-to-2back alignment
- Make synthetic raw data for better reproducibility