This repository has been archived by the owner on Dec 13, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 5
/
table.tex
29 lines (25 loc) · 8.64 KB
/
table.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
\begin{table*}[ht!]
\centering
\caption{Examples of errors resulting in mirages along different stages of our analytics pipeline, sorted by the analytical step we believe is responsible for the resulting failure in the final visualization, and colored following \figref{fig:mirage-figure}. This list is not exhaustive, but presents examples of how decision-making at various stages of analysis can damage the credibility or reliability of the messages in charts. A longer version of this table with additional mirages is included in our supplemental materials.}
\ssmall
\begin{tabular}{>{\raggedright\arraybackslash}p{1.8cm}p{14.7cm}}
\normalsize{Error} & \normalsize{Mirage}\\ \hline
\rowcolor{colora}\multirow{4}{0em}{\hspace{-0.6cm}\rotatebox{90}{\normalsize{Curating}}}Missing or Repeated Records & We often assume that we have one and only one entry for each datum. However, errors in data entry or integration can result in missing or repeated values that may result in inaccurate aggregates or groupings (see \figref{fig:wrangling}). \cite{kim2003taxonomy} \\
\rowcolor{colora-opaque}Outliers & Many forms of analysis assume data have similar magnitudes and were generated by similar processes. Outliers, whether in the form of erroneous or unexpectedly extreme values, can greatly impact aggregation and discredit the assumptions behind many statistical tests and summaries. \cite{kim2003taxonomy} \\
\rowcolor{colora}Spelling Mistakes & Columns of strings are often interpreted as categorical data for the purposes of aggregation. If interpreted in this way, typos or inconsistent spelling and capitalization can create spurious categories, or remove important data from aggregate queries. \cite{wang2019uni}\\
\rowcolor{colora-opaque}Drill-down Bias & We assume that the order in which we investigate our data should not impact our conclusions. However, by filtering on less explanatory or relevant variables first, the full scope of the impact of later variables can be hidden. This results in insights that address only small parts of the data, when they might be true of the larger whole. \cite{lee2019avoiding}\\
\rowcolor{colorb}\multirow{4}{0em}{\hspace{-0.6cm}\rotatebox{90}{\normalsize{Wrangling}}}Differing Number of Records by Group & Certain summary statistics, including aggregates, are sensitive to sample size. However, the number of records aggregated into a single mark can very dramatically. This mismatch can mask this sensitivity and problematize per-mark comparisons; when combined with differing levels of aggregation, it can result in counter-intuitive results such as Simpson's Paradox. \cite{guo2017you}\\
\rowcolor{colorb-opaque}Cherry Picking & Filtering and subsetting are meant to be tools to remove irrelevant data, or allow the analyst to focus on a particular area of interest. However, if this filtering is too aggressive, or if the analyst focuses on individual examples rather than the general trend, this cherry-picking can promote erroneous conclusions or biased views of the relationships between variables. Failing to keep the broader dataset in context can also result in the Texas Sharpshooter Fallacy or other forms of HARKing~\cite{cockburn2018hark}. \cite{few2019loom}\\
\rowcolor{colorb}Analyst Degrees of Freedom & Analysts have a tremendous flexibility in how they analyze data. These ``researcher degrees of freedom''~\cite{gelman2013garden} can create conclusions that are highly idiosyncratic to the choices made by the analyst, or in a malicious sense promote ``p-hacking'' where the analyst searches through the parameter space in order to find the best support for a pre-ordained conclusion. A related issue is the ``multiple comparisons problem'' where the analyst makes \emph{so many} choices that at least one, just by happenstance, is likely to appear significant, even if there is no strong signal in the data. \cite{gelman2013garden,pu2018garden,zgraggen2018investigating}\\
\rowcolor{colorb-opaque}Confusing Imputation & There are many strategies for dealing with missing or incomplete data, including the imputation of new values. How values are imputed, and then how these imputed values are visualized in the context of the rest of the data, can impact how the data are perceived, in the worst case creating spurious trends or group differences that are merely artifacts of how missing values are handled prior to visualization. \cite{song2018s}\\
\rowcolor{colorc}\multirow{4}{0em}{\hspace{-0.6cm}\rotatebox{90}{\normalsize{Visualizing}}}Non-sequitur Visualizations & Readers expect graphics that appear to be charts to be a mapping between data and image. Visualizations being used as decoration (in which the marks are not related to data) present non-information that might be mistaken for real information. Even if the data are accurate, additional unjustified annotations could produce misleading impressions, such as decorating uncorrelated data with a spurious line of best fit. \cite{correll2017black}\\
\rowcolor{colorc-opaque}Overplotting & We expect to be able to clearly identify individual marks, and expect that one visual mark corresponds to a single value or aggregated value. Yet overlapping marks can hide internal structures in the distribution or disguise potential data quality issues, as in \figref{fig:opacity-permute}. \cite{correll2018looks,mayorga2013splatterplots,micallef2017towards}\\
\rowcolor{colorc}Concealed Uncertainty & Charts that don't indicate that they contain uncertainty risk giving a false impression as well a possible extreme mistrust of the data if the reader realizes the information hasn't been presented clearly. There is also a tendency to incorrectly assume that data is high quality or complete, even without evidence of this veracity. \cite{song2018s, few2019loom, mayrTrust2019, sacha2015role}\\
\rowcolor{colorc-opaque}Manipulation of Scales & The axes and scales of a chart are presumed to straightforwardly represent quantitative information. However, manipulation of these scales (for instance, by flipping them from their commonly assumed directions, truncating or expanding them with respect to the range of the data~\cite{pandey2015deceptive, correll2017black, cleveland1982variables, ritchie2019lie, correll2019truncating}, using non-linear transforms, or employing dual axes~\cite{KindlmannAlgebraicVisPedagogyPDV2016, cairo2015graphics}) can cause viewers to misinterpret the data in a chart, for instance by exaggerating correlation~\cite{cleveland1982variables}, exaggerating effect size~\cite{correll2019truncating,pandey2015deceptive}, or misinterpreting the direction of effects~\cite{pandey2015deceptive}. \cite{cairo2015graphics,correll2017black,correll2019truncating,cleveland1982variables,KindlmannAlgebraicVisPedagogyPDV2016,pandey2015deceptive,ritchie2019lie}\\
\rowcolor{colord}\multirow{4}{0em}{\hspace{-0.6cm}\rotatebox{90}{\normalsize{Reading}}}Base Rate Bias & Readers assume unexpected values in a visualization are emblematic of reliable differences. However, readers may be unaware of relevant base rates: either the relative likelihood of what is seen as a surprising value or the false discovery rate of the entire analytic process. \cite{correll2016surprise,pu2018garden, zgraggen2018investigating}\\
\rowcolor{colord-opaque}Inaccessible Charts & As charts makers we often assume that our readers are homogeneous groups. Yet, the way that people read charts is heterogeneous and dependent on underlying perceptual abilities and cognitive backgrounds that can be overlooked by the designer. Insufficient mindfulness of these differences can result in miscommunication. For instance, a viewer with color vision deficiency may interpret two colors as identical when the designer intended them to be separate. \cite{lundgard2019Sociotechnical, plaisant2005information}\\
\rowcolor{colord}Anchoring Effect & Initial framings of information tend to guide subsequent judgements. This can cause readers to place undue rhetorical weight on early observations, which may cause them to undervalue or distrust later observations. \cite{ritchie2019lie, hullman2011visualization}\\
\rowcolor{colord-opaque}Biases in Interpretation & Each viewer arrives to a visualization with their own preconceptions, biases, and epistemic frameworks. If these biases are not carefully considered, various cognitive biases such as the backfire effect or confirmation bias can cause viewers to anchor on only the data (or the reading of the data) that supports their preconceived notions, reject data that does not accord with their views, and generally ignore a more holistic picture of the strength of the evidence. \cite{dignazio2019draft, d2016feminist, few2019loom,wall2017warning,valdez2017framework}\\
\end{tabular}
\label{table:mirage-table}
\end{table*}