kalaidin / stackoverflow

https://www.kaggle.com/c/predict-closed-questions-on-stack-overflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Identify features

kalaidin opened this issue · comments

commented

[
"PostId",
"PostCreationDate",
"OwnerUserId",
"OwnerCreationDate",
"ReputationAtPostCreation",
"OwnerUndeletedAnswerCountAtPostTime",
"Title",
"BodyMarkdown",
"Tag1",
"Tag2",
"Tag3",
"Tag4",
"Tag5",
"PostClosedDate",
"OpenStatus",
]

commented

Current:
"BodyMarkdown" tfidf table

merge title with BodyMarkdown and then create TF table

TF table on tags

commented

Suggested features:
"ReputationAtPostCreation"
"OwnerUndeletedAnswerCountAtPostTime"
"PostCreationDate" - "OwnerCreationDate"

number of keywords supplied

commented

len("BodyMarkdown")

  • number of words in title
  • number of words in bodymarkdown
  • is code supplied in bodymarkdown
  • propotion of body to code
  • time (day or night for example)
  • number of code blocks
commented

All done except "time (day or night for example)" with does not seem interesting.