kaustubhhiware / facebook-archive

Just some fun you can have with facebook's archive data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wordcloud of user's comments and posts

kaustubhhiware opened this issue · comments

This task could be a bit harder, since you have to process both posts and comments, and then aggregate words from both, to form a wordcloud.

  • What hashtags do you frequently use?
  • What language do you often use?
  • Skip proper nouns, esp. all your friend's names.

@kaustubhhiware Can I please move to this issue, after finishing my current issue, which is almost near its end (I hope so :p) ? This seems really interesting. Will share the details of implementation and toDo soon.
Thanks!

Sure!
Your PR has been approved by two mentors, let other mentors have a look at it, but mostly there's no work left there.
Assigning this issue to you.

Hi. For the wordcloud, there would be two parts.

  1. Preparing the compiled text from posts and comments. This should not be too tough.
  2. Preparing the wordcloud image. For this, I looked up, there is a python library by the obvious name wordcloud. Tried running it. Worked perfectly. So would be using that to prepare the image.

For hashTags, I would maintain a count and plot the top5 or top10
Language detection is something I don't have an idea about currently. Will have to look up something for that later.

Would this be fine? Thanks!

Hey @parimatrix you might want to take a look at TextBlob. It will help you with the language detection and removal of proper nouns! Best of luck.

Here's an idea (Using this is upto you):
We often tag our friends in memes, right?
Do you think it would be interesting to see which friends you have tagged the most frequently?
Once you have extracted all the text from posts and comments, get a list of names (should be easy), and only retain words in the list. Plot a wordcloud.

Upto you.