Dialect map data

About

This repository contains static data to be used in the rest of the Dialect Map components 💬.

Jargons are grouped in order to improve one-on-one comparison when the meaning of the jargons are equal, although the term to describe it varies from science to science. These groups are later on used by a range of data-ingestion pipelines to generate NLP metrics on the ArXiv papers dataset, so they can be compared within the Dialect map UI.

Environment setup

The project uses AJV-CLI to validate the JSON schemas, and the jargon list. It can be installed by running:

npm install --no-optional

Syntax validation

To validate the JSON-Schema syntax:

make validate

Available data

Jargons

Initial

The initial set of jargon groups was collected through a Google form set up by Kyle Cranmer on Twitter, having the scientific community responses collected from December 01 to December 31, 2020.

⚠️ Disclaimer: no more terms will be collected this way.

New terms

New terms can be added by creating a Pull Request (PR). These PRs will be later on reviewed by the Dialect map team to ensure that the resulting JSON is well formatted.

For information about how to add new terms, check the contributing documentation.
For information about how changes are propagated, check the computing documentation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dialect map data

About

Environment setup

Syntax validation

Available data

Categories

Jargons

Initial

New terms

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dialect map data

About

Environment setup

Syntax validation

Available data

Categories

Jargons

Initial

New terms