-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problematic .tsv processing #62
Comments
So it does sound like the data is corrupt somehow, have you managed to track down the duplicate |
There are no comma values for the |
Any chance that you could share the data file? Would be fine to obfuscate
it as long as it reproduces the error...
…On Wed, Aug 31, 2022, 16:54 suciokhan ***@***.***> wrote:
There are no comma values for the id column in the source data.
—
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWAVSVKKPIWUCHNEOSU77TV35W2NANCNFSM5753BHUA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sure, I will send you a link to the 2 files I was having trouble with. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When trying to ingest from .tsv files using Loader 1.4.1 on Ubuntu 20.04, I receive the following error:
However, I've reviewed the .tsv and confirmed there are no comma values in this column; all values are open_alex identifiers, which are URLs starting with https.
In my typeDB config.json file, I have it set to expect tab separators, and it successfully ingests hundreds of thousands of rows.
"separator": "\t",
Below is a screenshot of confirming there are no commas in the
id
column using Python and Pandas.I considered it being an issue with perhaps the header since it fails on the 2nd .tsv it's going through, as there is one record in the database with a comma for an id.
However, it doesn't fail until processing over 600,000 rows according to TypeDB processing updates.
The text was updated successfully, but these errors were encountered: