MPI programming project in python to analyse twitter dataset.
This project is to implement a simple, parallelized application leveraging the University of Melbourne HPC facility SPARTAN. The purpose of the application is to identify twitter usage from a large geocoded Twitter dataset.
The project uses python and MPI4py and it applies master-worker model. In this application, after the master node reads a batch of data, scatter will divide the data into pieces and sends them to the worker nodes. The worker nodes will count the number of tweets and hashtags. Then MPI_Gather takes the results from workers and gathers them to one result.
- Yiming Zhang
This project is licensed under the MIT License.