sample-output-type.md
: Clarifying questions and editing suggestions
#169
Labels
sample-output-type.md
: Clarifying questions and editing suggestions
#169
I am having a hard time understanding the sample output type chapter. One of the disconnects is that there are concepts that are presented before they are ever introduced (e.g. compound tasks and the implicit
compund_idx
column) and there are sections that are only pertinent to compound modelling tasks that are not subsections of that section.Introduction
The introduction for the sample output type needs reworking. From what I've found in the historical documents, it seems that the text in the introduction was written before the sample schema was fleshed out:
hubDocs/docs/source/user-guide/sample-output-type.md
Lines 3 to 23 in a273815
When I read it, I wonder, "Why are we talking about the mean output type? This is the sample output type."
Individual modeling tasks
Why is the
compound_idx
column here? It appears to be reiterating the grouping of the target column. Is this a column I should be worried about? The text says that it is implicit, but why does it have a name that indicates that it is a column that actually exists?Why is
column_idx
not defined in the schema?Compound modeling tasks
This description could be more specific to the example data presented, highlighting the columns in the text.
maybe:
What does "Base data: mean
output_type
" mean?hubDocs/docs/source/user-guide/sample-output-type.md
Line 70 in a273815
Four submissions
NOTE: For each submission, use level 4 headers, not bold text.
Pain points:
Submission A
I am confused as to why the sample numbers keep increasing across the stratification and why there are only two samples per stratum. Should each stratum contain at least 90 samples (according to the schema).
Submission B
I think I understand now that this is showing two samples per stratum, but I'm still confused as to why the sample numbers continue to increase after change in strata.
Are the values for each sample identical?
Submission C
"single compound modeling task" is confusing because there are two columns selected here. They do not vary, so it makes some sense in retrospect, but the initial read of this gives some roadblocks.
Submission D
The phrase "plain language" is a bit demotivating for a sentence with six prepositions.
Configuration of
output_type_id
This description is clear (but it could use some trimming to reduce complexity) and the table illustrates the validity question better, but I feel that two things need to happen:
compound_taskid_set
I feel like this sentence is at conflict with the meaning of the schema (emphasis mine)
Given that submission C is valid for all of the schema configurations means that we should use "may" and not "must".
"columns that may be used to define"
Number of samples
As I indicated above, it doesn't make sense why each task gets two samples. Also, I believe this belongs in the "Compound Modelling Tasks" section.
Relationship to output_types
I think this needs to be a subsection of "Compound Modelling Tasks".
The text was updated successfully, but these errors were encountered: