PRF-BERT NTU Course Query System
Background
The original NTU course system searches courses by string matching with query. In that way, students cannot get the truly "related" courses they want but they only get the courses which have the same characters. Therefore, we decide to implement a new search engine by integrating what we learn in DSP.
Implementation
1. Embed course name (BERT/ELMO)
Firstly, we embed course names by language model.
We perform experiments to determine which model should we use. We cluster the course by their embedding code, and the dots with the same color are from the same department. We can observe that BERT perform better than ELMO.
2. Perform cos similarity with query
Perform cos similarity on course with query, and sort than from the largest to the smallest.
3. Optimize by PRF
- Sort courses name by their cosine similarity with query.
- Select the top K ones and suppose they are related to our query. On the other hand, the bottom K ones are irrelevant.
- Perform relevance feedback. We use Rocchio Algorithm.
Result
query = "人工智慧"