Jean-Pierre-Richa / NLP-Chinese-Word-Segmenter

Chinese-Word-Segmentator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP Homework 1

Submission skeleton

- code
  |__ predict.py # Use the skeleton provided in predict.py as entry point to your code
  |__ score.py # The word segmentation scorer. DO NOT MODIFY IT
- sample_files
  |__ sample_icwb2.utf8 # how a file from the icwb2 dataset looks like
  |__ sample_lines.txt  # how the file we will use for prediction will look like. This is the format score.py accepts.
  |__ sample_labels.txt # the correct BIES tags for sample_lines.txt. This is the format score.py accepts.
- resources 	# any additional resource that predict.py should use must be placed in here
- README.md # this file
- Homework_1_nlp.pdf # the slides presenting this homework
- report.pdf	# your report

Instructions

Place all your code in the code folder. You can create other files. Place any additional resources needed for running predict.py (such as the weights of your trained model) in the resources folder. Place your report as report.pdf in the root folder. Follow the slides for any additional information.

Preprocessing

  • python3 preprocess.py

Training

  • network.py

Predict example usage

  • python3 predict.py dataset_new/predict/predict.utf8 dataset_new/predict/output.txt ./resources/

About

Chinese-Word-Segmentator


Languages

Language:Python 100.0%