Skip to content
This repository has been archived by the owner on Oct 28, 2024. It is now read-only.

[Draft] Clarify serialization and discoverability #40

Closed
wants to merge 13 commits into from
12 changes: 8 additions & 4 deletions content/docs/specifications/data-package.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,15 +81,19 @@ Several example data packages can be found in the [datasets organization on gith

### Descriptor

On logical level, Data Package descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Data Package descriptor is represented by a file. A data producer `MAY` use any suitable serialization format and `SHOULD` name the file `datapackage.json`. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Package while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.
roll marked this conversation as resolved.
Show resolved Hide resolved

This specification does not define any discoverability mechanisms making a serialized Data Package be referenced only directly by its URI. It means that techically the name of a Data Package file is irrelevant although it is good practice to use `datapackage.json` as a public convention.
roll marked this conversation as resolved.
Show resolved Hide resolved

The descriptor is the central file in a Data Package. It provides:

- General metadata such as the package's title, license, publisher etc
- A list of the data "resources" that make up the package including their location on disk or online and other relevant information (including, possibly, schema information about these data resources in a structured form)

A Data Package descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627][]). When available as a file it `MUST` be named `datapackage.json` and it `MUST` be placed in the top-level directory (relative to any other resources provided as part of the data package).

[RFC 4627]: http://www.ietf.org/rfc/rfc4627.txt

The descriptor `MUST` contain a `resources` property describing the data resources.

All other properties are considered `metadata` properties. The descriptor `MAY` contain any number of other `metadata` properties. The following sections provides a description of required and optional metadata properties for a Data Package descriptor.
Expand Down
10 changes: 8 additions & 2 deletions content/docs/specifications/data-resource.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,15 @@ A comprehensive Data Resource example with all required, recommended and optiona
}
```

### Descriptor
## Descriptor

A Data Resource descriptor `MUST` be a valid JSON `object`. (JSON is defined in [RFC 4627][]).
On logical level, Data Resource descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Data Resource descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

The above states that JSON is the only serialization format that `MUST` be used for publishing a Data Resource while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms making a serialized Data Resource be referenced only directly by its URI.
roll marked this conversation as resolved.
Show resolved Hide resolved

Key properties of the descriptor are described below. A descriptor `MAY` include any number of properties in additional to those described below as required and optional properties.

Expand Down
10 changes: 9 additions & 1 deletion content/docs/specifications/table-dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,15 @@ CSV Dialect is useful for programmes which might have to deal with multiple dial

Some related work can be found in [this comparison of csv dialect support](https://docs.google.com/spreadsheet/ccc?key=0AmU3V2vcPKrIdEhoU1NQSWtoQmJwcUNCelJtdkx2bFE&usp=sharing), this [example of similar JSON format](http://panda.readthedocs.org/en/latest/api.html#data-uploads), and in Python's [PEP 305](http://www.python.org/dev/peps/pep-0305/).

## Specification
## Descriptor

On logical level, Table Dialect descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Table Dialect descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Dialect while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms making a serialized Table Dialect be referenced only directly by its URI.
roll marked this conversation as resolved.
Show resolved Hide resolved

A CSV Dialect descriptor, `dialect`, `MUST` be a JSON `object` with the following properties:

Expand Down
12 changes: 11 additions & 1 deletion content/docs/specifications/table-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,17 @@ For example, `constraints` `SHOULD` be tested on the logical representation of d

## Descriptor

A Table Schema is represented by a descriptor. The descriptor `MUST` be a JSON `object` (JSON is defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt)).
On logical level, Table Schema descriptor is represented by a data structure. The data structure `MUST` be a JSON-serializable `object` as defined in [RFC 4627](http://www.ietf.org/rfc/rfc4627.txt).

On physical level, Table Schema descriptor is represented by a file. A data producer `MAY` use any suitable serialization format. A data consumer `MUST` support JSON serialization format and `MAY` support other serialization formats like YAML or TOML.

The above states that JSON is the only serialization format that `MUST` be used for publishing a Table Schema while other serialization formats can be used in projects or systems internally if supported by corresponding implementations.

This specification does not define any discoverability mechanisms making a serialized Table Schema be referenced only directly by its URI.
roll marked this conversation as resolved.
Show resolved Hide resolved

## Metadata

### Fields

It `MUST` contain a property `fields`. `fields` `MUST` be an array where each entry in the array is a field descriptor (as defined below). The order of elements in `fields` array `SHOULD` be the order of fields in the CSV file. The number of elements in `fields` array `SHOULD` be the same as the number of fields in the CSV file.

Expand Down