XinyiXiang / ChildhoodArchive

Research on Linguistics Importance in Cognitive Development of Post-War Taiwanese Children

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ChildhoodArchive

GitHub Desktop

  • Click on the green Code button
  • Select open with GitHub desktop
  • See below

ZIP file

  • Click on the green Code button
  • Select download zip files from the dropdown menu
  • Unzip in Downloads
  • Open the project with VS Code Studio
  • Press control with ~ key to summon up a new terminal
  • Run
python3 run_classifier.py --task_name=cola --do_train=true --do_eval=true --do_predict=true --data_dir=/Users/vic/Documents/GitHub/bert/datasets --vocab_file=/Users/vic/Downloads/uncased_L-12_H-768_A-12/vocab.txt --bert_config_file=/Users/vic/Downloads/uncased_L-12_H-768_A-12/bert_config.json --init_checkpoint=/Users/vic/Downloads/uncased_L-12_H-768_A-12/bert_model.ckpt.index --max_seq_length=400 --train_batch_size=8 --learning_rate=2e-5 --num_train_epochs=3.0 --output_dir=/Users/vic/Documents/GitHub/bert/bert_output --do_lower_case=True

Please use absolute path as above indicated

  • Install any missing module

Convert docx file to txt

-Download and move the docx2txt.sh shell script to the same directory as all .docx files -Then type in command line/terminalmake docx2txt

You should see the following lines :

cat docx2txt.sh >docx2txt

chmod a+x docx2txt

-Convert .docx files into .txt

docx2txt TAT11P46.docx

Replace the file names as needed, a .txt file with the same file name should apppear in the directory

-Extract interview question texts

grep "Q" < TAT11P46.txt > Q_TAT11P46.txt

Replace the file names as needed -Extract interview answer texts

grep "A" < TAT11P46.txt > A_TAT11P46.txt

Replace the file names as needed

Sorting through TAT interview sections

vim TAT11P46.txt
  • Then in normal mode (i.e. not inserting), type :set numbers for line numbers to appear.
  • You can always search for a section, for example, section (2) with /(2).
  • Finally, extract target lines and direct output into the same file with two wakkas ('>').
head -n "$last" TAT11P46.txt | tail -n +"$first" TAT11P46.txt >> section_two.txt

About

Research on Linguistics Importance in Cognitive Development of Post-War Taiwanese Children

License:MIT License


Languages

Language:Jupyter Notebook 46.3%Language:Python 46.0%Language:Perl 4.5%Language:Batchfile 2.0%Language:Shell 0.8%Language:Makefile 0.3%