You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is an issue when parsing large file. I tested with a 1.4G JSON file and it throws :
buffer.js:490
throw new Error('toString failed');
^
Error: toString failed
at Buffer.toString (buffer.js:490:11)
at StringDecoder.write (string_decoder.js:130:21)
at StripBOMWrapper.write (/home/user/.nvm/versions/node/v5.10.0/lib/node_modules/d3-dsv/node_modules/iconv-lite/lib/bom-handling.js:35:28)
at Object.decode (/home/user/.nvm/versions/node/v5.10.0/lib/node_modules/d3-dsv/node_modules/iconv-lite/lib/index.js:38:23)
at /home/user/.nvm/versions/node/v5.10.0/lib/node_modules/d3-dsv/bin/dsv2json:27:35
at ReadStream.<anonymous> (/home/user/.nvm/versions/node/v5.10.0/lib/node_modules/d3-dsv/node_modules/rw/lib/rw/read-file.js:22:33)
at emitNone (events.js:85:20)
at ReadStream.emit (events.js:179:7)
at endReadableNT (_stream_readable.js:913:12)
at _combinedTickCallback (internal/process/next_tick.js:74:11)
at process._tickCallback (internal/process/next_tick.js:98:9)
I've found this link which illustrates the same issue with big files
You can test it with
wget http://download.geonames.org/export/dump/allCountries.zip
unzip allCountries.zip
sed -i '1s/^/geonameid\tname\tasciiname\talternatenames\tlatitude\tlongitude\tfeature_class\tfeature_code\tcountry_code\tcc2\tadmin1_code\tadmin2_code\tadmin3_code\tadmin4_code\tpopulation\televation\tdem\ttimezone\tmodification_date\n/' allCountries.txt
time tsv2json < allCountries.txt > allCountries-pre.json
Do you have a recommended way to parse big files using either command line or via API ?
This is not a streaming parser, so it is subject to Node’s buffer size limitations. This failure is occurring before it even gets to parsing; it’s just trying to decode the input file bytes into a string.
The way to fix this is to rewrite this library to be streaming. That’s doable, but it requires a new API. (The CLI could remain unchanged, however.) This request has already been filed at #20. It’d be a nice improvement, however I have no immediate plans to work on it.
There is an issue when parsing large file. I tested with a 1.4G JSON file and it throws :
I've found this link which illustrates the same issue with big files
You can test it with
Do you have a recommended way to parse big files using either command line or via API ?
Note that it's working well with csv-parser :
The text was updated successfully, but these errors were encountered: