Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a materialized view of totals for HDX dataset endpoint #145

Open
2 of 3 tasks
krissy opened this issue Jan 22, 2017 · 1 comment
Open
2 of 3 tasks

Create a materialized view of totals for HDX dataset endpoint #145

krissy opened this issue Jan 22, 2017 · 1 comment

Comments

@krissy
Copy link
Member

krissy commented Jan 22, 2017

Blocked by: #146
Blocks: #147

What

Create a materialized view in Postgres and corresponding Rails model (see helpful tutorial here!) to hold data we will share with the Humanitarian Data Exchange (HDX).

This is the first step to setting up a public API endpoint that exposes a comma-separated table of statistics from our collected drawings that help visualise the "Number of displaced children and their conveyed moods in art therapy". (#147)

Note: Refer to #146 for confirmed data design and refer to the data design spreadsheet.

How

Reminder: this tutorial will come in handy if all this is new to you.

  • Get your SQL on and implement a Rails migration that creates a Materialized View (query table) that queries the drawing and organisation models, returning rows of totals corresponding to the sample spreadsheet. E.g. one column will be the sum of drawings in an Organisation's country that had a negative mood rating, another will be the same but for children who identify as male, etc.
  • Create a Rails model for this view, that includes a method to refresh the view data
  • Create a rake task that calls the refresh method, that can be used anytime we want to sync the view with the latest drawing data

Example view output

NB: This is a quickly drawn-up example to roughly help visualise the column and values in this view, based on the state of the spreadsheet at the time of writing (very prone to change), design/naming details are up to you!

View name: report_drawing_mood_stats

org | country | mood_rating | total | identify_male | identify_female | identify_other | age_5_12 | age_13_18 | stage_at_home | stage_in_shelter | ...

Terres Des Hommes | RS | NEGATIVE |  58 | 38 | 18 | 2 | 30 | 28 | 5 | 40 | ...
MSF | RS | NEUTRAL |  80 | 40 | 40 | 0 | 50 | 30 | 10 | 30 | ...

@leungant
Copy link

leungant commented Jun 1, 2017

got questions!

had a start at replicating the example csv and found the following hurdles:

  1. where is the "Region" keyword - can't seem to find it in the db tables!
  2. similar for "Emotional State".. was this what mood rating used to be?
  3. Do all the columns need to appear in the same csv? if so -
    4) did you want nested group by statements or
    5) would it make sense to aggregate the counts with non db code or a plsql stored procedure (a la http://www.postgresqltutorial.com/plpgsql-function-returns-a-table/ ) - and then to create a materialized view based on a cache table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants