Whence

Tracking data quality

Purpose: determine the level to which individual country scores are reliable based on whether and how the values were gap-filled (and year mismatches?).

The types of gapfilling we used:

average
- what spatial level: neighbors, sovereign country, region, world
disaggregate model (i.e. value split from sovereign country using some form of weighting, e.g. relative proportion of coastal population)
modeled (trash as proportional to population density?) Level of gap-filling:
data set
data layer
score Proposed process:
generate a label for the data layer that describes type and level of gapfilling
generate an additional label if the gapfilling occurs during further calculations -> this needs to be embedded in the toolbox

Year mismatches – matter when:

the score is a ratio of two data layers from different years (e.g. employees in the tourism sector/workforce size)
a score uses a spatial reference point such as a global average or max (e.g. wages - countries with less recent data are at a disadvantage)
comparing scores across countries with data from very different years
Any other ways that year mismatches may affect interpretation of results?

Is there an easy way to know what year is associated to a country’s goal score? How to keep track of this? Maybe just for certain goals, e.g. Livelihoods

Provenance

Data Structure

field: value_whence_v01 column with basic data types. This can get tallied up by toolbox. 2013 documentation. And then how carried through. Shapefile limitation of 10 characters so could have little lookup table to go back and forth. Suggest breaking up procedures with a delimiter (like "|"). Associate with functions for each procedure.

uncertainty_v01 having some free text format like: [measure]: [value]. [description]. Some datasets have only point estimate whereas others get gapfilled.

file: layername_whence-v01.csv
- format for details & children
- incorporate to gapfilling functions
- spatial_id_output, whence procedure, whence procedure order, arguments, input vs output, spatial_id_input, uncertainty measure associated
similarity. ecological vs political.

Toolbox report

Per reporting region

Examples

Uncertainty
- FIS B/B_msy
- AO: avg GDP based on linear model

Document

See section 2013 documentation

KLo's notes

Questions:

how to label input data files when gap-filling occurred at that stage?
How to combine this label if subsequent gap-filling operations occur? (e.g. Darren just had “raw”, “modeled”, “mixed”)

Aspects of gap-filling:

just counting number of steps (cumulative number of gap-filling steps)
qualifying the type of step (use the acronyms)
actual sources identified (perhaps later)

run through some examples to test the procedure, and the acronyms for the keep a loose text for stat uncertainty we can add a “whence” column that has a value per region

Country	DatasetFIS_1	DatasetFIS_2	DatalayerFIS	DatalayerMAR_1	...	Score
Eritrea	0	TG	SG1	SG2		=0+1+1+2+…
Neverland	0	TG	SG1	SG2		=0+1+1+2+…

      |              |              |              |                |     |

The table would have as many columns as the total sum of individual datasets, datalayers and scores we gapfill

Can script the gapfilling functions to geneate an independent file, to get a list of operations in the “basic” whence column , then broken down in the details column

Provide feedback

Saved searches

Use saved searches to filter your results more quickly