-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[boundary finder] add support for newlines in single quotes (Recidivi…
…z/recidiviz-data#34236) ## Description of the change Adds supporting for safely splitting files along row-safe boundaries when newlines are quoted. if we are splitting a quoted file, the control flow is (1) find the next single, unescaped quote in the block (2) classify that unescaped quote state (`UnescapedQuoteState`) (3) see if we can use that quote to determine our place in the csv. if we can't return to (1) and try again (4) walk the csv to find the next newline that is not in a quoted block ## Type of change > All pull requests must have at least one of the following labels applied (otherwise the PR will fail): | Label | Description | |----------------------------- |----------------------------------------------------------------------------------------------------------- | | Type: Bug | non-breaking change that fixes an issue | | Type: Feature | non-breaking change that adds functionality | | Type: Breaking Change | fix or feature that would cause existing functionality to not work as expected | | Type: Non-breaking refactor | change addresses some tech debt item or prepares for a later change, but does not change functionality | | Type: Configuration Change | adjusts configuration to achieve some end related to functionality, development, performance, or security | | Type: Dependency Upgrade | upgrades a project dependency - these changes are not included in release notes | ## Related issues Part of Recidiviz/recidiviz-data#30505 ## Checklists ### Development **This box MUST be checked by the submitter prior to merging**: - [x] **Double- and triple-checked that there is no Personally Identifiable Information (PII) being mistakenly added in this pull request** These boxes should be checked by the submitter prior to merging: - [x] Tests have been written to cover the code changed/added as part of this pull request ### Code review These boxes should be checked by reviewers prior to merging: - [ ] This pull request has a descriptive title and information useful to a reviewer - [ ] Potential security implications or infrastructural changes have been considered, if relevant GitOrigin-RevId: 2d56abe4a5943d11022dc2c34a1af26740334a39
- Loading branch information
Showing
11 changed files
with
953 additions
and
73 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 changes: 5 additions & 0 deletions
5
recidiviz/tests/cloud_storage/fixtures/example_file_structure_commas_carriage_return.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
numerical_col,string_col_1,string_col_2 | ||
1234,"Hello, world",I'm Anna | ||
4567,"This line | ||
is split in two","This word is ""quoted""" | ||
7890,"""quoted value""", |
Oops, something went wrong.