TwitterDataAnalysisOnHPC-COMP90024

Analysing Twitter data to obtain sentiment of different blocks in Melbourne

Motivation

Project was created to compare the overall sentiment of people in different parts of Melboure city. As the data obtained from twitter was 15 GB, we utilized resources of a high performance computer (HPC) - Spartan (University of Melbourne's HPC) and parrellelize our program.

Results

Total number of nodes	Number of threads on each node	Execution time in seconds
1	1	991.4
1	8	189.6
2	4	193.1

Output

Area	Sentiment	Tweets
`A1`	763	2752
`A2`	4116	4904
`A3`	2679	5824
`A4`	54	381
`B1`	11614	21232
`B2`	32061	107386
`B3`	20211	34494
`B4`	5733	6643
`C1`	7551	10530
`C2`	191791	246828
`C3`	41434	69901
`C4`	19537	26097
`C5`	7551	5581
`D3`	7777	16220
`D4`	9698	16536
`D5`	3757	4705
Total	361428	580014

Observation

A 5 times performance improvement was observed when running the code parallelly on 8 threads as compared to running on a single thread.

How to use?

Clone

Clone this repo to your local machine or a HPC using https://github.com/arnavgarg123/TwitterDataAnalysisOnHPC-COMP90024.git

Setup

Make sure you have python3 installed on your system.
To run the script on Windows, install Microsoft MPI from this link.
To run the script on Linux, run the following commands
sudo apt update
sudo apt install python3-mpi4py

Using terminal/cmd navigate to the folder containing the files of this repo and run the command

mpiexec -n <number_of_threads> python main.py <data_file_name> <area_file_name> <sentiment_analysis_keywords_with_score>

Example

mpiexec -n 4 python main.py ./Data/smallTwitter.json ./Data/melbGrid.json ./Data/AFINN.txt

Contributors

Contributing

Step 1

Clone this repo to your local machine using https://github.com/arnavgarg123/TwitterDataAnalysisOnHPC-COMP90024.git

Step 2

HACK AWAY!

Step 3

Create a new pull request

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Data		Data
Report		Report
Screenshots		Screenshots
Slurm files		Slurm files
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TwitterDataAnalysisOnHPC-COMP90024

Motivation

Results

Output

Observation

How to use?

Clone

Setup

Contributors

Contributing

Step 1

Step 2

Step 3

License

About

Releases

Packages

Contributors 2

Languages

License

arnavgarg123/TwitterDataAnalysisOnHPC-COMP90024

Folders and files

Latest commit

History

Repository files navigation

TwitterDataAnalysisOnHPC-COMP90024

Motivation

Results

Output

Observation

How to use?

Clone

Setup

Contributors

Contributing

Step 1

Step 2

Step 3

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages