Skip to content

Commit

Permalink
refactor: fix XtraDB, no longer uses intermediate db
Browse files Browse the repository at this point in the history
Fix executing on Percona XtraDB.
Remove the need of temporary database.
Use temporary tables.
Prevent dead locks on XtraDB by executing inside a read transaction.
Change execution command.

fccn/nau-technical#293
  • Loading branch information
igobranco committed Oct 16, 2024
1 parent b846927 commit 7778315
Show file tree
Hide file tree
Showing 8 changed files with 124 additions and 192 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
__pycache__/
venv/
*.xlsx
config.ini
config.ini*
27 changes: 9 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,23 @@ update a Google Sheet file. On the NAU project it is used the second option.
The NAU dashboard, based on Google Data Studio, use that Google Sheet has one of
its data source.

This project requires an intermediate database on the same engine of the `edxapp`
openedx database. The mysql database user needs a read grant for the `edxapp` and
all grants for its own database. It produces precalculated tables/materialized views
that are 1 to 1 with the xlsx file sheets or each sheet of the google spreadsheet
file, each relevant table is prefixed with the `DATA_` string.
The mysql database user needs a read grant for the `edxapp` database.

Those scripts should be run at least once a day, preference after the midnight, so your
Google Sheet file always contain yesterday's data in full.
Google Sheet file always contains the yesterday's data in full.

The queries don't have any reference to individual users, and don't have specific
identification numbers, like user id, emails or similar data.
identification numbers, like user id, emails or similar data; so it's GDPR compliant.

# Usage

- Setup a Virtual Environment
- Set the `config.ini` file based on the `config.init.sample`.
- Execute `report_xlsx.py` or `report_google.py`.


### Activate virtual environment and install its dependencies

```bash
virtualenv venv --python=python3
source venv/bin/activate
Expand All @@ -37,19 +35,12 @@ cp config.init.sample config.ini
vim config.ini
```

### Update precalculated data
To update the precalculated data run:

```bash
python update_data.py
```

### Export has xlsx file
### Export data to a xlsx file
```bash
python report_xlsx.py
python export.py --config config.ini --export xlsx
```

### Update a Google Sheet
### Export data to a Google Sheet
```bash
python report_google.py
python export.py --config config.ini --export google_sheets
```
16 changes: 4 additions & 12 deletions config.ini.sample
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ password = password
database = edxapp

[sheets]
# progress = True
progress = True

[google_service_account]
type = service_account
Expand Down Expand Up @@ -40,14 +40,6 @@ distinct_users_by_day = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
distinct_users_by_month = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

[xlsx]
# File name
file = nau_reports.xlsx
default_date_format = yyyy-mm-dd
export = organizations,course_runs,course_run_by_date,enrollments_with_profile_info,users,distinct_users_by_day,distinct_users_by_month,final_summary

# Configuration when the `update_data.py` is run.
[data]
# Configure which data should be synchronized / updated
synchronize = organizations,course_runs,course_run_by_date,enrollments_with_profile_info,enrollments_year_of_birth,enrollments_gender,enrollments_level_of_education,enrollments_country,enrollments_employment_situation,users,registered_users_by_day,distinct_users_by_day,distinct_users_by_month
# number of seconds between each query /update so the database can breathe and we don't too much stress.
seconds_between_updates=120
; file = nau_reports.xlsx
; default_date_format = yyyy-mm-dd
; export = organizations,course_runs,course_run_by_date,enrollments_with_profile_info,enrollments_year_of_birth,enrollments_gender,enrollments_level_of_education,enrollments_country,enrollments_employment_situation,users,registered_users_by_day,distinct_users_by_day,distinct_users_by_month
36 changes: 36 additions & 0 deletions export.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""
Script that exports data to xlsx or to a Google Sheet.
"""
import argparse
import configparser

from nau import Reports


if __name__ == "__main__":
parser = argparse.ArgumentParser(
prog='NAU Open edX Database exporter',
description='Exports to xlsx or to Google Sheet with Open edX DB',
epilog='This program exports to a xlsx file or directly to a Google Sheet information from the Open edX database, so it can be analyze or integrated with dashboard application.',
)
parser.add_argument('--config', type=argparse.FileType('r'), required=True, help='The path to a config.ini with the required configurations.')
parser.add_argument('--export', required=True, choices=['xlsx','google_sheets'], help='The export mode selected.')
args = parser.parse_args()

config_file = args.config
config_file_content = config_file.read()

config = configparser.ConfigParser()
config.read_string(config_file_content)
reports:Reports = Reports(config)
export_mode = args.export

match export_mode:
case 'xlsx':
from report_xlsx import export_to_xlsx
export_to_xlsx(config, reports)
case 'google_sheets':
from report_google import export_queries_to_google
export_queries_to_google(config, reports)
case _:
raise ValueError(f"Invalid export mode selected {export_mode}")
Loading

0 comments on commit 7778315

Please sign in to comment.