Handling of an explicit header #316

bkamins · 2018-10-03T12:29:35Z

Those two behaviors seem inconsistent (in one case we are not strict, while we are in the other):

julia> str1 = """a,b,c
       1,2,3,4
       5,6,7,8"""
"a,b,c\n1,2,3,4\n5,6,7,8"

julia> str2 = """1,2,3,4
       5,6,7,8"""
"1,2,3,4\n5,6,7,8"

julia> CSV.read(IOBuffer(str1))
2×3 DataFrames.DataFrame
│ Row │ a      │ b      │ c      │
│     │ Int64⍰ │ Int64⍰ │ Int64⍰ │
├─────┼────────┼────────┼────────┤
│ 1   │ 1      │ 2      │ 3      │
│ 2   │ 5      │ 6      │ 7      │

julia> CSV.read(IOBuffer(str2), header=["a","b","c"])
ERROR: ArgumentError: The length of provided header (3) doesn't match the number of columns at row 1 (4)

The text was updated successfully, but these errors were encountered:

quinnj · 2018-10-03T16:05:55Z

Kind of...........in the first case, it detected 3 columns, but the file has inconsistent rows that include additional columns. In the second case, we detect 4 columns and you only provide 3. The latter is a clear error; the former feels like ok behavior too since we generally try to not error on invalid rows.

bkamins · 2018-10-03T16:27:14Z

But passing explicit header is kind of "I pass an additional row of data that is missing in the file" so detection of number of columns should include it.

The reason for this issue is that this is a way to force dropping columns (i.e. keep only first few columns) - and this is something that recently was requested.

In general we could provide a separate kwarg that could be used to specify what columns to keep (not sure if it has been discussed earlier).

quinnj · 2018-10-03T16:36:35Z

Yes, see the discussion starting here: #154 (comment)

bkamins · 2018-10-03T16:42:22Z

Nice. I guess then that you can close this PR if you feel that your interpretation of detection of number of columns is more appropriate in CSV.jl context.

quinnj · 2018-10-03T16:47:20Z

Well, for the use-case of "select"ing columns, I do feel that having something external to CSV.File is the way to go: it's composable and there's no performance difference. Trying to do a select=... keyword argument would just be re-implementing the external functionality internally, which would be a shame for all those other table types out there. I'll try to clean up my select implementation this week. Maybe we can just have it live in CSV.jl for now until someone wants to start a proper TableOperations.jl package or something (we already have transform in CSV.jl anyway).

quinnj closed this as completed Oct 4, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of an explicit header #316

Handling of an explicit header #316

bkamins commented Oct 3, 2018

quinnj commented Oct 3, 2018

bkamins commented Oct 3, 2018

quinnj commented Oct 3, 2018

bkamins commented Oct 3, 2018

quinnj commented Oct 3, 2018

Handling of an explicit header #316

Handling of an explicit header #316

Comments

bkamins commented Oct 3, 2018

quinnj commented Oct 3, 2018

bkamins commented Oct 3, 2018

quinnj commented Oct 3, 2018

bkamins commented Oct 3, 2018

quinnj commented Oct 3, 2018