Skip to content

dialect-map/dialect-map-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dialect map data

About

This repository contains static data to be used in the rest of the Dialect Map components 💬.

Jargons are grouped in order to improve one-on-one comparison when the meaning of the jargons are equal, although the term to describe it varies from science to science. These groups are later on used by a range of data-ingestion pipelines to generate NLP metrics on the ArXiv papers dataset, so they can be compared within the Dialect map UI.

Environment setup

The project uses AJV-CLI to validate the JSON schemas, and the jargon list. It can be installed by running:

npm install --no-optional

Syntax validation

To validate the JSON-Schema syntax:

make validate

Available data

Categories

The full corpus of ArXiv categories is formed by both currently and legacy used ones.

Jargons

Initial

The initial set of jargon groups was collected through a Google form set up by Kyle Cranmer on Twitter, having the scientific community responses collected from December 01 to December 31, 2020.

⚠️ Disclaimer: no more terms will be collected this way.

New terms

New terms can be added by creating a Pull Request (PR). These PRs will be later on reviewed by the Dialect map team to ensure that the resulting JSON is well formatted.

About

Public ArXiv related data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published