A look into how the different storage types in Python can effect the performance of the application.
Certain standards were maintained to get this project going.
- Test driven development.
- Sphinx style DocStrings
- Use of a common logger
- Pushing as much logic into functions as possible, keeping main decluttered
- Error handling baked in from the beginning
main.py
kicks off all of the processes. It will return just a few lines of logs about the average execution time for Python dictionaries, Numpy arrays, and Pandas DataFrame.
Tests are made for:
- generate_sample_data
- convert_to_dictionaries
- convert_to_numpy_array
- convert_to_pandas_df
- convert_to_polars_df
Still to do:
- logger
- speed_tests
- main
At small amounts of data (10x10):
2024-01-07 12:31:54,685 - speed_tests - INFO - Function <function convert_to_dictionary at 0x000002A64FDCF1A0> Average execution time (microseconds): 100
2024-01-07 12:31:54,688 - speed_tests - INFO - Function <function convert_to_numpy_array at 0x000002A65026DEE0> Average execution time (microseconds): 200
2024-01-07 12:31:54,720 - speed_tests - INFO - Function <function convert_to_pandas_df at 0x000002A65034F420> Average execution time (microseconds): 1401
2024-01-07 12:31:54,737 - speed_tests - INFO - Function <function convert_to_polars_df at 0x000002A65034F4C0> Average execution time (microseconds): 1000
Ranking for 10x10
- Dictionary
- Numpy
- Polars
- Pandas
For 100x100 datasets:
2024-01-07 12:32:23,448 - speed_tests - INFO - Function <function convert_to_dictionary at 0x000001C0050DF1A0> Average execution time (microseconds): 400
2024-01-07 12:32:23,466 - speed_tests - INFO - Function <function convert_to_numpy_array at 0x000001C00555DEE0> Average execution time (microseconds): 1800
2024-01-07 12:32:23,508 - speed_tests - INFO - Function <function convert_to_pandas_df at 0x000001C00563F420> Average execution time (microseconds): 2600
2024-01-07 12:32:23,543 - speed_tests - INFO - Function <function convert_to_polars_df at 0x000001C00563F4C0> Average execution time (microseconds): 2751
Ranking for 100x100
- Dictionary
- Numpy
- Pandas
- Polars
For 10000x10000 datasets:
2024-01-07 12:36:16,594 - speed_tests - INFO - Function <function convert_to_dictionary at 0x000001D5F0A1F1A0> Average execution time (microseconds): 165484
2024-01-07 12:39:17,557 - speed_tests - INFO - Function <function convert_to_numpy_array at 0x000001D5F0EBDEE0> Average execution time (microseconds): 83271
2024-01-07 12:40:41,103 - speed_tests - INFO - Function <function convert_to_pandas_df at 0x000001D5F0F9F420> Average execution time (microseconds): 339842
2024-01-07 12:49:13,951 - speed_tests - INFO - Function <function convert_to_polars_df at 0x000001D5F0F9F4C0> Average execution time (microseconds): 275781
Ranking for 10000x10000
- Numpy
- Dictionary
- Polars
- Pandas
Testing
- testing logger
- testing speed_tests
- testing main
Features
- Add addition
- Add sort
- Add search
- Add graphs of performance difference
Random
- Which ones are accepted by ML frameworks?