CRediT Statement Parsing

This repository contains a series of notebooks to turn unstructured CRediT statements and author contribution statements within the body text of scholarly documents into a structured output that can be analyzed.

This is useful since while CRediT statements are increasingly collected in a structured way, many documents only contain this information in a free-text field.

Moreover, in the free-text version of statements, sometimes this statement describes a large range of activities undertaken within the project, which in some cases can be mapped to standard CRediT roles, and in others not.

Some of this work was done in a Databricks environment, so some use of PySPARK methods were applied.

It was initially applied to XML records from Science Direct. You can read more about the XML structure that this presumes here

You can also apply this software to unstructured author contribution statements from other sources by following steps 2-6 of this process.

Usage

The notebooks should be used in a certain order:

Preprocess_XML to extract credit statement and author information from full text XML.
CreditParser to transform and structure the credit statements at a person-level.
Training_and_Inference/gpt_annotation to generate a train/test set for multilabel classifier training.
Training_and_Inference/bert_multilabel_training to train a bert model on multilabel classification task.
Training_and_Inference/credit_inference to perform inference over output from CreditParser.
Training_and_Inference/credit_inference_cleanup to clean up data.

Citing this Repository

Please use the following citation in your publications when referencing this repository.

@software{Elsevier_Credit_Contribution_Parser,
    author = {Elsevier},
    license = {MIT},
    title = {{Credit Contribution Parser}},
    url = {https://github.com/elsevierlabs-os/credit-contribution-parser}
}

Questions and contact

Josh Fisher: [email protected]
Kristy James: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CreditParser		CreditParser
CITATION.cff		CITATION.cff
CODEOWNERS		CODEOWNERS
LICENSE.txt		LICENSE.txt
README.md		README.md
pdm.lock		pdm.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CRediT Statement Parsing

Usage

Citing this Repository

Questions and contact

About

Releases 1

Packages

Contributors 3

Languages

License

elsevierlabs-os/credit-contribution-parser

Folders and files

Latest commit

History

Repository files navigation

CRediT Statement Parsing

Usage

Citing this Repository

Questions and contact

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages