Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import script that uses json config #612

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

import script that uses json config #612

wants to merge 8 commits into from

Conversation

struan
Copy link
Member

@struan struan commented Sep 19, 2024

This should enable add simple imports by adding to the JSON rather than writing a new script.

Also includes some examples from ONS and Commons Library plus a utility command to pivot CSVs to help with ONS imports.

@struan struan force-pushed the imports_as_config branch 2 times, most recently from ed857c8 to ffe90b8 Compare November 26, 2024 16:56
@struan
Copy link
Member Author

struan commented Nov 28, 2024

Note that once the ONS data is fetched (one file per variable) you need to pivot the data to get it into one row per constituency so:

./manage.py pivot_csv --infile 2024_travel_to_work_method.csv --outfile 2024_travel_to_work_method_pivoted.csv --column "Method used to travel to workplace (12 categories)" --value_column Observation --index "Post-2019 Westminster Parliamentary constituencies Code"

@struan struan changed the title rough version of imported that uses CSV config for imports rough version of imported that uses json config for imports Nov 28, 2024
Copy link

codecov bot commented Nov 28, 2024

Codecov Report

Attention: Patch coverage is 35.00000% with 13 lines in your changes missing coverage. Please review.

Project coverage is 78.70%. Comparing base (620f61f) to head (c9890bc).

Files with missing lines Patch % Lines
hub/management/commands/base_importers.py 35.00% 8 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #612      +/-   ##
==========================================
- Coverage   78.93%   78.70%   -0.23%     
==========================================
  Files         115      115              
  Lines        3921     3940      +19     
  Branches      410      415       +5     
==========================================
+ Hits         3095     3101       +6     
- Misses        720      728       +8     
- Partials      106      111       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zarino zarino self-requested a review December 10, 2024 10:30
@struan struan changed the title rough version of imported that uses json config for imports import script that uses json config Dec 10, 2024
@struan struan marked this pull request as ready for review December 10, 2024 10:31
Enable different names and labels for the dataset and type to make
creating range datasets easier
previously the only way was to override the delete method which is
slightly overkill when you can just set a property
This is helpful if you have multiple sheets in an excel file with
inconsistent column names, or if there is just no name
Do not always cast to int and hence lose precision
This adds an `import_from_config` command that gets the details of the
imports from a JSON file. You supply the location of the file and the
name of the config you want and then it pulls in all the information
from the config.

At the moment it only works with CSV and Excel files where the data is
one row per constituency.
Adds the following for post 2024 constituencies:
* constituency age distributution (HoC)
* constituency housing tenure (HoC)
* constituency population (ONS)
* constituency commute method (ONS)
Mostly so data where there are multiple rows per constituency can be
turned into one row per constituency with multuple columns of data. i.e.
turn

cons|data_name|data_value
CONS1|badgers|12
CONS1|snakes|3
CONS2|badgers|8
CONS2|snakes|4

to

cons|badgers|snakes
CONS1|12|3
CONS2|8|4

which you would do with

```
./manage.py pivot_csv --infile in.csv --outfile out.csv --column
data_name --value_colmn data_value --index cons
```

This is mostly because the import from config only copes with the second
of these and ONS provides data in the first
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant