Are there different emoji usages per borough / neighborhood ?
Are there emojis that occur in certain areas more than others?
Are there neighborhoods which overuse or underuse emojis?
If you take a topic (ie. soccer)
Objects/sentiment from image. Emoji w.r.t. Points of Interest. Create an alternative map of Manhattan based on emoji-driven neighborhoods.
-
test.py
script usessample.txt
-
emoji.py
will try to fetch the latest emoji definitions from unicode.org and store them in a file calledcodepoints_latest.json
pid2info, nycemoji.csv https://drive.google.com/drive/u/1/folders/0Bw7JqtQBdsZSSDdWN1VlMHVNN1E
Files chinese.tsv
and english.tsv
contain the extracted short canonical sequences, emojis, canonical emojis, skin tones and variations from each caption, with a Chinese and English pre-processing respectively.
Schema is: post_idcomma_separated_sequencescomma_separated_emojiscomma_separated_canonical_emojiscomma_separated_skin_tonescomma_separated_variations
Example:
977726107265881718_452803412 ππ«π¬πͺππΈπ π,π«,π¬,π©βπ©βπ§,π©ββ€οΈβπ©,πΈ,ππΌ π,π«,π¬,πͺ,π,πΈ,π -1,-1,-1,-1,-1,-1,2 -1,-1,-1,20,15,-1,-1
These files are meant to compute the counts.
Files chinese-tokens.tsv
and english-tokens.tsv
contain the Chinese and English pre-processed captions tokenized as we discussed (i.e., isolating the short canonical emoji sequences in context).
Schema is: post_id<T/E>text/emoji_sequence<T/E>text/emoji_sequence<T/E>text/emoji_sequence...
Example:
492186542997358343_11986392 T Pretty sure that's a smile! E πΎπΊ T #shibatatum E ππΎ
These files are meant to compute the embeddings.