CybCom / NLP

Project files for NLP proj of Fundamentals of Data Science 2022 spring, NJU.

Repository from Github https://github.comCybCom/NLPRepository from Github https://github.comCybCom/NLP

COURSE COMPLETED. ARCHIVED.


NLP

Project files for NLP proj of Fundamentals of Data Science 2022 spring, NJU.

This project is published under GPL v3 protocol.

WARNING! Please REMOVE files in dir "output" before commit, or it will exceed capacity limit of github.

Project Author

CybCom & Zhou

Preparation

ML&DL

Coursera: Machine Learning for basic issues https://www.coursera.org/learn/machine-learning

国立**大学:李宏毅机器学习 for BERT https://speech.ee.ntu.edu.tw/~hylee/ml/2021-spring.php

CS224n for Natural Language Processing, including word2vec http://web.stanford.edu/class/cs224n/index.html

Web Crawler

https://www.zhihu.com/question/20899988

http://c.biancheng.net/python_spider/what-is-spider.html

https://zhuanlan.zhihu.com/p/73742321

Structure

Data Source

Given sheet for training.

Web crawler from gov website cluster

Data Process

Preliminary filtering with logical judgment and string similarity.

Use word2vec with CNN for second classification .

About

Project files for NLP proj of Fundamentals of Data Science 2022 spring, NJU.

License:GNU General Public License v3.0


Languages

Language:Python 51.7%Language:OpenEdge ABL 48.3%