This project contains many elements based on learnings applied to my own Strava data. Strava is a fitness website dedicated primarily to cycling, but allowing for a very wide range of activity type tracking (e.g., running, swimming). The premise of my research is to gain statistical insight into my performance based on over 2,000 total activities compiled over several years (starting in 2012). However, you can modify this for your own use without too much difficulty, assuming you've downloaded your Strava data.
I included histograms for at least five key features I've also included some other descriptive characteristics.
This project was built using PyCharm, although I suspect any IDE will work. You'll need several packages installed using pip
or whatever Python package manager you use.
- pandas
- matplotlib
- seaborn
- thinkplot
- thinkplot2
- numpy
- statsmodels
The "thinkplot" packages come from the Think Stats book by Allen B. Downey available from Green Tea Press. You can't use a package manager to install these libraries - you need to use the author's GitHub: https://github.com/AllenDowney/ThinkStats2.
The working_with_strava_data.ipynb
Jupyter Notebook is the main source code for the project, which also contains more documentation on what's going on.