Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue with duplicate column name #136

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

TomorrowMC
Copy link

@TomorrowMC TomorrowMC commented Apr 26, 2024

The problem of data loss due to duplicate column names has been resolved, now if there is a duplicate column name in the first row, a CSVParseError exception will be thrown indicating the name of the first duplicate column name encountered.

Copy link
Contributor

@DivineDominion DivineDominion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the improvement! Makes sense to produce an error instead of doing nothing if this is unexpected behavior. I left a couple of small hints in the review for touch-ups before we merge.

I wonder how we could help users to load CSV files with duplicate columns in the future.

  • should we add an option to force load a CSV with duplicate column names to get the old behavior?
  • should we offer to enumerate duplicate column names, like samename, samename1, samename2?

If we don't offer any resolution at all, users will not be able to resolve duplicate column names. If you get the error, what do you do then? Perform a string replace operation on the first line of text to disambiguate? SwiftCSV would have the power to offer this in a convenient spot since our code is "column" aware.

"""

XCTAssertThrowsError(try CSV<Named>(string: csvString)) { error in
XCTAssertEqual(error as? CSVParseError, CSVParseError.generic(message: "Duplicate column names found: name"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you try to replace the equality check with a switch-case for this error?

Suggested change
XCTAssertEqual(error as? CSVParseError, CSVParseError.generic(message: "Duplicate column names found: name"))
switch error as? CSVParseError {
case .generic(message: let message): XCTAssertEqual("Duplicate column names found: name", message)
default: XCTFail("Expected CSVParseError.generic")
}

If that works, please revert the Equatable conformance of the error afterwards.

Motivation: there's just this one place in tests where equatability is used, and the conformance affects public SwiftCSV API. We can't undo that easily after we ship it, and there's no long-term motivation to make the errors equatable in apps. switch-case is the best bet there, too.

// Check for duplicate column names
let duplicateColumns = header.duplicates()
if !duplicateColumns.isEmpty {
throw CSVParseError.generic(message: "Duplicate column names found: \(duplicateColumns.joined(separator: ", "))")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I think this warrants a new error case 👍

case duplicateColumns(columnNames: [String]) or something like it maybe.

@@ -0,0 +1,13 @@
//
// Extension.swift
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename the file to Array+duplicates.swift

@DivineDominion
Copy link
Contributor

@TomorrowMC I'm preparing a Privacy Manifest release soon. Do you want to finish changes for that?

@DivineDominion
Copy link
Contributor

@TomorrowMC Friendly ping to see if you want to follow up on this

@TomorrowMC
Copy link
Author

Sorry for delay, The problems above have solved and equatable conformance has reverted

@DivineDominion
Copy link
Contributor

@TomorrowMC Great, thank you! Please add a line to the CHANGELOG.md about your PR to wrap this up, then we can merge

Comment on lines 13 to 26


public static func == (lhs: CSVParseError, rhs: CSVParseError) -> Bool {
switch (lhs, rhs) {
case (.generic(let message1), .generic(let message2)):
return message1 == message2
case (.quotation(let message1), .quotation(let message2)):
return message1 == message2
case (.duplicateColumns(let columns1), .duplicateColumns(let columns2)):
return columns1 == columns2
default:
return false
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I believe this is dead code a.t.m. and we can remove this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, has fixed these problems

@TomorrowMC
Copy link
Author

TomorrowMC commented Jul 17, 2024

@DivineDominion Thank you, I'v done these things. It is my first time to contribute to a open source repository, so if there is any I need to do please feel free to reply :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants