From a2facce660bb4895b56c62a655d0f252efc3d99f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Philippe=20Rivi=C3=A8re?= Date: Fri, 6 Oct 2023 17:02:43 +0200 Subject: [PATCH] restructure README (#100) https://github.com/d3/d3/issues/3775 --- README.md | 494 +----------------------------------------------------- 1 file changed, 7 insertions(+), 487 deletions(-) diff --git a/README.md b/README.md index 437da1d..c3a6026 100644 --- a/README.md +++ b/README.md @@ -1,492 +1,12 @@ # d3-dsv -This module provides a parser and formatter for delimiter-separated values, most commonly [comma-](https://en.wikipedia.org/wiki/Comma-separated_values) (CSV) or tab-separated values (TSV). These tabular formats are popular with spreadsheet programs such as Microsoft Excel, and are often more space-efficient than JSON. This implementation is based on [RFC 4180](http://tools.ietf.org/html/rfc4180). - -Comma (CSV) and tab (TSV) delimiters are built-in. For example, to parse: - -```js -d3.csvParse("foo,bar\n1,2"); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]] -d3.tsvParse("foo\tbar\n1\t2"); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]] -``` - -Or to format: - -```js -d3.csvFormat([{foo: "1", bar: "2"}]); // "foo,bar\n1,2" -d3.tsvFormat([{foo: "1", bar: "2"}]); // "foo\tbar\n1\t2" -``` - -To use a different delimiter, such as “|” for pipe-separated values, use [d3.dsvFormat](#dsvFormat): - -```js -const psv = d3.dsvFormat("|"); - -console.log(psv.parse("foo|bar\n1|2")); // [{foo: "1", bar: "2"}, columns: ["foo", "bar"]] -``` - -For easy loading of DSV files in a browser, see [d3-fetch](https://github.com/d3/d3-fetch)’s [d3.csv](https://github.com/d3/d3-fetch/blob/master/README.md#csv), [d3.tsv](https://github.com/d3/d3-fetch/blob/master/README.md#tsv) and [d3.dsv](https://github.com/d3/d3-fetch/blob/master/README.md#dsv) methods. - -## Installing - -If you use npm, `npm install d3-dsv`. You can also download the [latest release on GitHub](https://github.com/d3/d3-dsv/releases/latest). For vanilla HTML in modern browsers, import d3-dsv from Skypack: - -```html - -``` - -For legacy environments, you can load d3-dsv’s UMD bundle from an npm-based CDN such as jsDelivr; a `d3` global is exported: - -```html - - -``` - -## API Reference - -# d3.csvParse(string[, row]) [<>](https://github.com/d3/d3-dsv/blob/master/src/csv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)(",").[parse](#dsv_parse). Note: requires unsafe-eval [content security policy](#content-security-policy). - -# d3.csvParseRows(string[, row]) [<>](https://github.com/d3/d3-dsv/blob/master/src/csv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)(",").[parseRows](#dsv_parseRows). - -# d3.csvFormat(rows[, columns]) [<>](https://github.com/d3/d3-dsv/blob/master/src/csv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)(",").[format](#dsv_format). - -# d3.csvFormatBody(rows[, columns]) [<>](https://github.com/d3/d3-dsv/blob/master/src/csv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)(",").[formatBody](#dsv_formatBody). - -# d3.csvFormatRows(rows) [<>](https://github.com/d3/d3-dsv/blob/master/src/csv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)(",").[formatRows](#dsv_formatRows). - -# d3.csvFormatRow(row) [<>](https://github.com/d3/d3-dsv/blob/master/src/csv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)(",").[formatRow](#dsv_formatRow). - -# d3.csvFormatValue(value) [<>](https://github.com/d3/d3-dsv/blob/master/src/csv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)(",").[formatValue](#dsv_formatValue). - -# d3.tsvParse(string[, row]) [<>](https://github.com/d3/d3-dsv/blob/master/src/tsv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)("\t").[parse](#dsv_parse). Note: requires unsafe-eval [content security policy](#content-security-policy). - -# d3.tsvParseRows(string[, row]) [<>](https://github.com/d3/d3-dsv/blob/master/src/tsv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)("\t").[parseRows](#dsv_parseRows). - -# d3.tsvFormat(rows[, columns]) [<>](https://github.com/d3/d3-dsv/blob/master/src/tsv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)("\t").[format](#dsv_format). - -# d3.tsvFormatBody(rows[, columns]) [<>](https://github.com/d3/d3-dsv/blob/master/src/tsv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)("\t").[formatBody](#dsv_formatBody). - -# d3.tsvFormatRows(rows) [<>](https://github.com/d3/d3-dsv/blob/master/src/tsv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)("\t").[formatRows](#dsv_formatRows). - -# d3.tsvFormatRow(row) [<>](https://github.com/d3/d3-dsv/blob/master/src/tsv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)("\t").[formatRow](#dsv_formatRow). - -# d3.tsvFormatValue(value) [<>](https://github.com/d3/d3-dsv/blob/master/src/tsv.js "Source") - -Equivalent to [dsvFormat](#dsvFormat)("\t").[formatValue](#dsv_formatValue). - -# d3.dsvFormat(delimiter) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js) - -Constructs a new DSV parser and formatter for the specified *delimiter*. The *delimiter* must be a single character (*i.e.*, a single 16-bit code unit); so, ASCII delimiters are fine, but emoji delimiters are not. - -# *dsv*.parse(string[, row]) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js "Source") - -Parses the specified *string*, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of objects representing the parsed rows. - -Unlike [*dsv*.parseRows](#dsv_parseRows), this method requires that the first line of the DSV content contains a delimiter-separated list of column names; these column names become the attributes on the returned objects. For example, consider the following CSV file: - -``` -Year,Make,Model,Length -1997,Ford,E350,2.34 -2000,Mercury,Cougar,2.38 -``` - -The resulting JavaScript array is: - -```js -[ - {"Year": "1997", "Make": "Ford", "Model": "E350", "Length": "2.34"}, - {"Year": "2000", "Make": "Mercury", "Model": "Cougar", "Length": "2.38"} -] -``` - -The returned array also exposes a `columns` property containing the column names in input order (in contrast to Object.keys, whose iteration order is arbitrary). For example: - -```js -data.columns; // ["Year", "Make", "Model", "Length"] -``` - -If the column names are not unique, only the last value is returned for each name; to access all values, use [*dsv*.parseRows](#dsv_parseRows) instead (see [example](https://observablehq.com/@d3/parse-csv-with-duplicate-column-names)). - -If a *row* conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the `+` operator), but better is to specify a *row* conversion function. See [d3.autoType](#autoType) for a convenient *row* conversion function that infers and coerces common types like numbers and strings. - -If a *row* conversion function is specified, the specified function is invoked for each row, being passed an object representing the current row (`d`), the index (`i`) starting at zero for the first non-header row, and the array of column names. If the returned value is null or undefined, the row is skipped and will be omitted from the array returned by *dsv*.parse; otherwise, the returned value defines the corresponding row object. For example: - -```js -const data = d3.csvParse(string, (d) => { - return { - year: new Date(+d.Year, 0, 1), // lowercase and convert "Year" to Date - make: d.Make, // lowercase - model: d.Model, // lowercase - length: +d.Length // lowercase and convert "Length" to number - }; -}); -``` - -Note: using `+` rather than [parseInt](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/parseInt) or [parseFloat](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/parseFloat) is typically faster, though more restrictive. For example, `"30px"` when coerced using `+` returns `NaN`, while parseInt and parseFloat return `30`. - -Note: requires unsafe-eval [content security policy](#content-security-policy). - -# dsv.parseRows(string[, row]) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js "Source") - -Parses the specified *string*, which must be in the delimiter-separated values format with the appropriate delimiter, returning an array of arrays representing the parsed rows. - -Unlike [*dsv*.parse](#dsv_parse), this method treats the header line as a standard row, and should be used whenever DSV content does not contain a header. Each row is represented as an array rather than an object. Rows may have variable length. For example, consider the following CSV file, which notably lacks a header line: - -``` -1997,Ford,E350,2.34 -2000,Mercury,Cougar,2.38 -``` - -The resulting JavaScript array is: - -```js -[ - ["1997", "Ford", "E350", "2.34"], - ["2000", "Mercury", "Cougar", "2.38"] -] -``` - -If a *row* conversion function is not specified, field values are strings. For safety, there is no automatic conversion to numbers, dates, or other types. In some cases, JavaScript may coerce strings to numbers for you automatically (for example, using the `+` operator), but better is to specify a *row* conversion function. See [d3.autoType](#autoType) for a convenient *row* conversion function that infers and coerces common types like numbers and strings. - -If a *row* conversion function is specified, the specified function is invoked for each row, being passed an array representing the current row (`d`), the index (`i`) starting at zero for the first row. If the returned value is null or undefined, the row is skipped and will be omitted from the array returned by *dsv*.parse; otherwise, the returned value defines the corresponding row object. For example: - -```js -const data = d3.csvParseRows(string, (d, i) => { - return { - year: new Date(+d[0], 0, 1), // convert first column to Date - make: d[1], - model: d[2], - length: +d[3] // convert fourth column to number - }; -}); -``` - -In effect, *row* is similar to applying a [map](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/map) and [filter](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Array/filter) operator to the returned rows. - -# dsv.format(rows[, columns]) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js "Source") - -Formats the specified array of object *rows* as delimiter-separated values, returning a string. This operation is the inverse of [*dsv*.parse](#dsv_parse). Each row will be separated by a newline (`\n`), and each column within each row will be separated by the delimiter (such as a comma, `,`). Values that contain either the delimiter, a double-quote (`"`) or a newline will be escaped using double-quotes. - -If *columns* is not specified, the list of column names that forms the header row is determined by the union of all properties on all objects in *rows*; the order of columns is nondeterministic. If *columns* is specified, it is an array of strings representing the column names. For example: - -```js -const string = d3.csvFormat(data, ["year", "make", "model", "length"]); -``` - -All fields on each row object will be coerced to strings. If the field value is null or undefined, the empty string is used. If the field value is a Date, the [ECMAScript date-time string format](https://www.ecma-international.org/ecma-262/9.0/index.html#sec-date-time-string-format) (a subset of ISO 8601) is used: for example, dates at UTC midnight are formatted as `YYYY-MM-DD`. For more control over which and how fields are formatted, first map *rows* to an array of array of string, and then use [*dsv*.formatRows](#dsv_formatRows). - -# dsv.formatBody(rows[, columns]) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js "Source") - -Equivalent to [*dsv*.format](#dsv_format), but omits the header row. This is useful, for example, when appending rows to an existing file. - -# dsv.formatRows(rows) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js "Source") - -Formats the specified array of array of string *rows* as delimiter-separated values, returning a string. This operation is the reverse of [*dsv*.parseRows](#dsv_parseRows). Each row will be separated by a newline (`\n`), and each column within each row will be separated by the delimiter (such as a comma, `,`). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes. - -To convert an array of objects to an array of arrays while explicitly specifying the columns, use [*array*.map](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/map). For example: - -```js -const string = d3.csvFormatRows(data.map((d, i) => { - return [ - d.year.getFullYear(), // Assuming d.year is a Date object. - d.make, - d.model, - d.length - ]; -})); -``` - -If you like, you can also [*array*.concat](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/concat) this result with an array of column names to generate the first row: - -```js -const string = d3.csvFormatRows([[ - "year", - "make", - "model", - "length" - ]].concat(data.map((d, i) => { - return [ - d.year.getFullYear(), // Assuming d.year is a Date object. - d.make, - d.model, - d.length - ]; -}))); -``` - -# dsv.formatRow(row) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js "Source") - -Formats a single array *row* of strings as delimiter-separated values, returning a string. Each column within the row will be separated by the delimiter (such as a comma, `,`). Values that contain either the delimiter, a double-quote (") or a newline will be escaped using double-quotes. - -# dsv.formatValue(value) [<>](https://github.com/d3/d3-dsv/blob/master/src/dsv.js "Source") - -Format a single *value* or string as a delimiter-separated value, returning a string. A value that contains either the delimiter, a double-quote (") or a newline will be escaped using double-quotes. - -# d3.autoType(object) [<>](https://github.com/d3/d3-dsv/blob/master/src/autoType.js "Source") + -Given an *object* (or array) representing a parsed row, infers the types of values on the *object* and coerces them accordingly, returning the mutated *object*. This function is intended to be used as a *row* accessor function in conjunction with [*dsv*.parse](#dsv_parse) and [*dsv*.parseRows](#dsv_parseRow). For example, consider the following CSV file: - -``` -Year,Make,Model,Length -1997,Ford,E350,2.34 -2000,Mercury,Cougar,2.38 -``` - -When used with [d3.csvParse](#csvParse), - -```js -d3.csvParse(string, d3.autoType) -``` - -the resulting JavaScript array is: - -```js -[ - {"Year": 1997, "Make": "Ford", "Model": "E350", "Length": 2.34}, - {"Year": 2000, "Make": "Mercury", "Model": "Cougar", "Length": 2.38} -] -``` - -Type inference works as follows. For each *value* in the given *object*, the [trimmed](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/Trim) value is computed; the value is then re-assigned as follows: - -1. If empty, then `null`. -1. If exactly `"true"`, then `true`. -1. If exactly `"false"`, then `false`. -1. If exactly `"NaN"`, then `NaN`. -1. Otherwise, if [coercible to a number](https://www.ecma-international.org/ecma-262/9.0/index.html#sec-tonumber-applied-to-the-string-type), then a number. -1. Otherwise, if a [date-only or date-time string](https://www.ecma-international.org/ecma-262/9.0/index.html#sec-date-time-string-format), then a Date. -1. Otherwise, a string (the original untrimmed value). - -Values with leading zeroes may be coerced to numbers; for example `"08904"` coerces to `8904`. However, extra characters such as commas or units (*e.g.*, `"$1.00"`, `"(123)"`, `"1,234"` or `"32px"`) will prevent number coercion, resulting in a string. - -Date strings must be in ECMAScript’s subset of the [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601). When a date-only string such as YYYY-MM-DD is specified, the inferred time is midnight UTC; however, if a date-time string such as YYYY-MM-DDTHH:MM is specified without a time zone, it is assumed to be local time. - -Automatic type inference is primarily intended to provide safe, predictable behavior in conjunction with [*dsv*.format](#dsv_format) and [*dsv*.formatRows](#dsv_formatRows) for common JavaScript types. If you need different behavior, you should implement your own row accessor function. - -For more, see [the d3.autoType notebook](https://observablehq.com/@d3/d3-autotype). - -### Content Security Policy - -If a [content security policy](http://www.w3.org/TR/CSP/) is in place, note that [*dsv*.parse](#dsv_parse) requires `unsafe-eval` in the `script-src` directive, due to the (safe) use of dynamic code generation for fast parsing. (See [source](https://github.com/d3/d3-dsv/blob/master/src/dsv.js).) Alternatively, use [*dsv*.parseRows](#dsv_parseRows). - -### Byte-Order Marks - -DSV files sometimes begin with a [byte order mark (BOM)](https://en.wikipedia.org/wiki/Byte_order_mark); saving a spreadsheet in CSV UTF-8 format from Microsoft Excel, for example, will include a BOM. On the web this is not usually a problem because the [UTF-8 decode algorithm](https://encoding.spec.whatwg.org/#utf-8-decode) specified in the Encoding standard removes the BOM. Node.js, on the other hand, [does not remove the BOM](https://github.com/nodejs/node-v0.x-archive/issues/1918) when decoding UTF-8. - -If the BOM is not removed, the first character of the text is a zero-width non-breaking space. So if a CSV file with a BOM is parsed by [d3.csvParse](#csvParse), the first column’s name will begin with a zero-width non-breaking space. This can be hard to spot since this character is usually invisible when printed. - -To remove the BOM before parsing, consider using [strip-bom](https://www.npmjs.com/package/strip-bom). - -## Command Line Reference - -### dsv2dsv - -# dsv2dsv [options…] [file] - -Converts the specified DSV input *file* to DSV (typically with a different delimiter or encoding). If *file* is not specified, defaults to reading from stdin. For example, to convert to CSV to TSV: - -``` -csv2tsv < example.csv > example.tsv -``` - -To convert windows-1252 CSV to utf-8 CSV: - -``` -dsv2dsv --input-encoding windows-1252 < latin1.csv > utf8.csv -``` - -# dsv2dsv -h -
# dsv2dsv --help - -Output usage information. - -# dsv2dsv -V -
# dsv2dsv --version - -Output the version number. - -# dsv2dsv -o file -
# dsv2dsv --out file - -Specify the output file name. Defaults to “-” for stdout. - -# dsv2dsv -r delimiter -
# dsv2dsv --input-delimiter delimiter - -Specify the input delimiter character. Defaults to “,” for reading CSV. (You can enter a tab on the command line by typing ⌃V.) - -# dsv2dsv --input-encoding encoding - -Specify the input character encoding. Defaults to “utf8”. - -# dsv2dsv -w delimiter -
# dsv2dsv --output-delimiter delimiter - -Specify the output delimiter character. Defaults to “,” for writing CSV. (You can enter a tab on the command line by typing ⌃V.) - -# dsv2dsv --output-encoding encoding - -Specify the output character encoding. Defaults to “utf8”. - -# csv2tsv [options…] [file] - -Equivalent to [dsv2dsv](#dsv2dsv), but the [output delimiter](#dsv2dsv_output_delimiter) defaults to the tab character (\t). - -# tsv2csv [options…] [file] - -Equivalent to [dsv2dsv](#dsv2dsv), but the [input delimiter](#dsv2dsv_output_delimiter) defaults to the tab character (\t). - -### dsv2json - -# dsv2json [options…] [file] - -Converts the specified DSV input *file* to JSON. If *file* is not specified, defaults to reading from stdin. For example, to convert to CSV to JSON: - -``` -csv2json < example.csv > example.json -``` - -Or to convert CSV to a newline-delimited JSON stream: - -``` -csv2json -n < example.csv > example.ndjson -``` - -# dsv2json -h -
# dsv2json --help - -Output usage information. - -# dsv2json -V -
# dsv2json --version - -Output the version number. - -# dsv2json -o file -
# dsv2json --out file - -Specify the output file name. Defaults to “-” for stdout. - -# dsv2json -a -
# dsv2json --auto-type - -Use type inference when parsing rows. See d3.autoType for how it works. - -# dsv2json -r delimiter -
# dsv2json --input-delimiter delimiter - -Specify the input delimiter character. Defaults to “,” for reading CSV. (You can enter a tab on the command line by typing ⌃V.) - -# dsv2json --input-encoding encoding - -Specify the input character encoding. Defaults to “utf8”. - -# dsv2json -r encoding -
# dsv2json --output-encoding encoding - -Specify the output character encoding. Defaults to “utf8”. - -# dsv2json -n -
# dsv2json --newline-delimited - -Output [newline-delimited JSON](https://github.com/mbostock/ndjson-cli) instead of a single JSON array. - -# csv2json [options…] [file] - -Equivalent to [dsv2json](#dsv2json). - -# tsv2json [options…] [file] - -Equivalent to [dsv2json](#dsv2json), but the [input delimiter](#dsv2json_input_delimiter) defaults to the tab character (\t). - -### json2dsv - -# json2dsv [options…] [file] - -Converts the specified JSON input *file* to DSV. If *file* is not specified, defaults to reading from stdin. For example, to convert to JSON to CSV: - -``` -json2csv < example.json > example.csv -``` - -Or to convert a newline-delimited JSON stream to CSV: - -``` -json2csv -n < example.ndjson > example.csv -``` - -# json2dsv -h -
# json2dsv --help - -Output usage information. - -# json2dsv -V -
# json2dsv --version - -Output the version number. - -# json2dsv -o file -
# json2dsv --out file - -Specify the output file name. Defaults to “-” for stdout. - -# json2dsv --input-encoding encoding - -Specify the input character encoding. Defaults to “utf8”. - -# json2dsv -w delimiter -
# json2dsv --output-delimiter delimiter - -Specify the output delimiter character. Defaults to “,” for writing CSV. (You can enter a tab on the command line by typing ⌃V.) - -# json2dsv --output-encoding encoding - -Specify the output character encoding. Defaults to “utf8”. - -# json2dsv -n -
# json2dsv --newline-delimited - -Read [newline-delimited JSON](https://github.com/mbostock/ndjson-cli) instead of a single JSON array. - -# json2csv [options…] [file] - -Equivalent to [json2dsv](#json2dsv). +This module provides a parser and formatter for delimiter-separated values, most commonly [comma-](https://en.wikipedia.org/wiki/Comma-separated_values) (CSV) or tab-separated values (TSV). These tabular formats are popular with spreadsheet programs such as Microsoft Excel, and are often more space-efficient than JSON. This implementation is based on [RFC 4180](http://tools.ietf.org/html/rfc4180). -# json2tsv [options…] [file] +## Resources -Equivalent to [json2dsv](#json2dsv), but the [output delimiter](#json2dsv_output_delimiter) defaults to the tab character (\t). +- [Documentation](https://d3js.org/d3-dsv) +- [Examples](https://observablehq.com/collection/@d3/d3-dsv) +- [Releases](https://github.com/d3/d3-dsv/releases) +- [Getting help](https://d3js.org/community)