NLP Homework 1

Submission skeleton

- code
  |__ predict.py # Use the skeleton provided in predict.py as entry point to your code
  |__ score.py # The word segmentation scorer. DO NOT MODIFY IT
- sample_files
  |__ sample_icwb2.utf8 # how a file from the icwb2 dataset looks like
  |__ sample_lines.txt  # how the file we will use for prediction will look like. This is the format score.py accepts.
  |__ sample_labels.txt # the correct BIES tags for sample_lines.txt. This is the format score.py accepts.
- resources 	# any additional resource that predict.py should use must be placed in here
- README.md # this file
- Homework_1_nlp.pdf # the slides presenting this homework
- report.pdf	# your report

Instructions

Place all your code in the code folder. You can create other files. Place any additional resources needed for running predict.py (such as the weights of your trained model) in the resources folder. Place your report as report.pdf in the root folder. Follow the slides for any additional information.

Preprocessing

python3 preprocess.py

Training

network.py

Predict example usage

python3 predict.py dataset_new/predict/predict.utf8 dataset_new/predict/output.txt ./resources/

Jean-Pierre-Richa / NLP-Chinese-Word-Segmenter