ndrwchn / WeChat

Daily roughly 1 billion private messages get selected & routed to the closest "operator" based on geolocation in China

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WeChat

Type Value
Source 211.159.163.137 - City: Beijing - Country: China - Organization: Tencent cloud computing
Description 1.081.231.257 captured WeChat dialogues containing 3.784.309.399 messages dated 18 March 2019 were automatically selected for review based on a keyword trigger. Not all the dialogues were in Chinese or only had GPS coordinates in China.

Project files:

Natural Language Processing tools

Research files

Update

  • January 2020: The Google Translate Toolkit service is down. The translated data is still attainable via Google Takeout. The process of refactoring all the salvaged components into a new research project has started.

  • December 2019: We gave up brute-forcing the VerCrypt container and moved on to the data which was saved in screenshots and the data which was uploaded to Google Translate Toolkit

  • June 2019: We got a small disk image of Ubuntu 18.04.2 desktop image, which was used to build the Jupyter Notebook files. After many attempts to find any evidence on the system image, it became clear nothing was stored here. We found symlink a Veracrypt container, but we did not find a password that would open a hidden container. A simple password 'password' did open an empty container.

  • May 2019: We have lost access to the original data source. Also, the server with research data is not accessible anymore. Even, the Chinese student who was helping in building the Jupyter Notebook files (creating stop word lists, tokenization, lemmatization, and phrase matching) based on the WeChat dialogues

About

Daily roughly 1 billion private messages get selected & routed to the closest "operator" based on geolocation in China


Languages

Language:Jupyter Notebook 100.0%