snoop2head / instagram_hashtag_analysis

๐Ÿ“ท Crawl and Analyze Instagram Hashtag Data: KoNLPY to gensim word2Vec & scikit-learn TF-IDF

Home Page:https://gaemin.tistory.com/category/Project%20Based%20Learning/%EC%9A%B4%EB%8F%99%20%EC%B6%94%EC%B2%9C%20%EC%9B%B9%EC%84%9C%EB%B9%84%EC%8A%A4%20-%20FitCuration

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

instagram_hashtag_analysis

Crawl and Analyze Instagram Hashtag Data

Header Numbers for files

  • 0: Crawl Instagram posts according to search result of #keyword
  • 1: Create and wrangle dataset with pandas
  • 2: KoNLPy tagging for Koran nouns, Korean action words
  • 3: Extract similar documents and make word2Vec models with gensim
  • 4: TF-IDF code without using scikit-learn library
  • 5: Extracting similar documents using scikit-learn library's tfidfvectorizer

๋ฌธ์„œ ์•ž์— ์žˆ๋Š” ๋ฒˆํ˜ธ๋Š” ๋‹ค์Œ์„ ์˜๋ฏธํ•จ

  • 0: #keyword ๊ฒ€์ƒ‰, ํ•ด์‹œํƒœ๊ทธ ๊ธฐ๋ฐ˜ ์ธ์Šคํƒ€๊ทธ๋žจ ํฌ๋กค๋ง

  • 1: ์ธ์Šคํƒ€๊ทธ๋žจ ๋ฐ์ดํ„ฐ ํ†ตํ•ฉ ๋ฐ ์กฐ์ž‘ - Pandas ๋ชจ๋“ˆ ์ด์šฉ

  • 2: KoNLPy ํ˜•ํƒœ์†Œ๋ถ„์„ -> ์ตœ๋Œ€ ๋นˆ๋„ ์ฒด์–ธ(๋ช…์‚ฌ), ์„œ์ˆ ์–ด(๋™์‚ฌ, ํ˜•์šฉ์‚ฌ) ๋„์ถœ

  • 3: Gensim์„ ์ด์šฉํ•œ Word2Vec ๋ชจ๋ธ ๋„์ถœ ๋ฐ ์œ ์‚ฌ ๋ฌธ์„œ ์ถ”์ถœ

  • 4: scikitlearn ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์€, Vanilla๋กœ ์ž‘์„ฑํ•œ TF-IDF ์˜ˆ์ œ

  • 5: scikitlearn ๋ชจ๋“ˆ์˜ TF-IDF Vectorizer์„ ์ด์šฉํ•œ ์œ ์‚ฌ ๋ฌธ์„œ ๋„์ถœ