wangrui6 / Zhihu-KOL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add example comment dataset aggregation notebook

MLMonkATGY opened this issue · comments

Add notebook for previewing the Zhihu comment dataset.

Comments are divided into 2 groups : root comment and comment tree.

  1. Root comments are top level comments from an answer.
  2. Comment tree are nested comments from any root comments.

There is multiple type of ids.

  1. Answer_id which refers to a specific answers
  2. id_root_comment refers to a specific top level comment
  3. id_comment_tree refers to a specific nested comment which replies to either top level comment or another nested comment
  4. reply_root_comment_id (from nested comment) refers to the top level comment this nested comment replies to
  5. reply_comment_id(from nested comment) refers to other nested comment (from the same root_comments ) this nested comment replies to

A Jupyter notebook for previewing the aggregated comment dataset can be found in
https://github.com/wangrui6/Zhihu-KOL/blob/MLMonkATGY/issue19/comments/relate_comment.ipynb