Skip to content

Commit

Permalink
Docs for citation functionality (#146)
Browse files Browse the repository at this point in the history
* Write user guide for citation tool and add placeholder pages otherwise

* Add dev docs for cite-brainglobe

* Mention CITATION.cff files

* Update wording in help

* Update with brainglobe- long form example

* Apply suggestions from code review

Co-authored-by: Adam Tyson <[email protected]>

---------

Co-authored-by: Adam Tyson <[email protected]>
  • Loading branch information
willGraham01 and adamltyson authored Feb 16, 2024
1 parent 5753fb1 commit abc2a80
Show file tree
Hide file tree
Showing 3 changed files with 230 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# brainglobe-utils

`brainglobe-utils` is a collection of functions that can be re-used across multiple repositories: for BrainGlobe tools, plugins, or analyses (in the case of `brainglobe-workflows`).
It underpins a number of our user-facing tools like `brainreg` and `cellfinder`, as well as our common interfaces to napari in `brainglobe-napari-io`, for example.
The main feature it provides to users is the [`cite-brainglobe`](#cite-brainglobe-and-the-citation-submodule) command-line program.

## `cite-brainglobe` and the `citation` submodule

The `cite-brainglobe` command-line program is provided by the `citation` submodule, with the command-line wrapper and main function themselves being available in `citation.cite:cli` and `citation.cite:cite` respectively.

At the top level, the workflow for the citation tool works as follows:

- Parse the tool names that the user provides on the command-line. The repositories known to the package, and the names that they use, are defined in the `citation.repository`.
- Fetch the citation data from each `repository` by reading its CITATION.cff file.
- Read the CITATION.cff metadata into the appropriate citation format. This creates an instance of (a subclass of) `citation.format:Format`.
- Invoke the `generate_ref_string()` method of the `Format` instance to obtain the citation string.
- Append the citation string to those already created, and repeat for each tool requested.
- Write the output text to a file, or dump to `stdout` if not provided with an output location.

The `citation` submodule itself breaks down further to accommodate the steps of the workflow above.

- As mentioned, the `citation.cite` module contains the high-level command-line interface and backend function.
- The `citation.fetch` module contains helper functions for quickly retrieving the files we want from their GitHub repositories.
- The `citation.repository` submodule contains the `Repository` class which is a convenient wrapper for storing information about the BrainGlobe tools that we want `cite-brainglobe` to be aware of, and recognise the names of.
- The `citation.format` submodule contains the `Format` base class for reading and generating citation strings.
- The additional modules that match `citation.*_fmt` contain classes that derive from `Format`. This allows us to accommodate different citation formats needing different metadata, amongst other things.

### Adding new repositories or tools

`cite-brainglobe` is only aware of the repositories that we tell it about - and any such repository must have a CITATION.cff (or equivalent metadata file) present in it that we can fetch.

To make `cite-brainglobe` aware of a new BrainGlobe tool, add a static `Repository` instance to the `citation.repositories` submodule [as detailed here](#citationrepositories), specifying the required information.

### Adding a new supported citation format

Create a new submodule called `new-format_fmt` in the `citation` submodule.
Then write a class that inherits from `citation.format.Format` and defines the `required` and `optional` keys that your citation type needs from the `.cff` files we read from GitHub.
Finally, in your class you'll need to overwrite the `generate_ref_string` method to produce the citation string from the metadata information you specified.

Then, head to the `citation.cite` module and add your new format as an option to the command-line interface (`citation.cite.cli`) and backend function (`citation.cite.cite`).

### `citation.repositories`

This module contains the `Repository` class, a function to find the repositories that are referred to in a list of tool names, and the static instances of all the BrainGlobe repositories that we provide a reference for via `cite-brainglobe`.

The `Repository` class is just a convenient container for all the information pertaining to one particular BrainGlobe tool or repository.
The class itself just holds any information we need and some useful actions on that information, such as providing the URL for the GitHub repo, storing the alternative names for the tool that is held there, etc.
The static instances of this class are instantiated in this module too - by convention, the variable name should match the repository's name on GitHub.
Each instance is created using the syntax:

```python
brainglobe_tool = Repository("brainglobe-tool", ["list", "of", "alternative", "or", "informal", "names"])
```

This defines a repository that `cite-brainglobe` expects to be called `bg-tool`, under the `brainglobe` organisation on GitHub.
It also expects the CITATION.cff file to be on the `main` branch of this repository - though this location (and the organisation if we really need to point to non-BrainGlobe tools) can be changed when calling the constructor.
The second argument defines the (case-insensitive) names (in addition to the repository name itself) that `cite-brainglobe` will match to this repository; some of these names are automatically generated from the repository name, by the following rules:

- The characters `"_"`, `"-"`, and `" "` are considered interchangeable. Providing `brainglobe-utils` as a name will automatically cause `brainglobe_utils` and `brainglobe utils` to be alternative names that the repository can be referred to by, for example.
- The `brainglobe` prefix can be dropped from the name automatically. `brainglobe-utils` being provided as the repository name will mean that the repository can also be referred to as `utils`, for example.

To illustrate,

```python
brainglobe_utils = Repository(
"brainglobe_utils",
[
"utilities",
],
)
```

creates a `Repository` object which, if the user asks `cite-brainglobe` to cite any one of `"brainglobe-utils"`, `"brainglobe_utils"`, `"brainglobe utils"`, `"utils"`, or `"utilities"`, the program will recognise as the `brainglobe-utils` package.

The `all_citable_repositories` function is intended for imports in other areas of the `citation` submodule — it automatically detects the static `Repository` instances that we define in the `repositories` and returns them as a set.

The `unique_repositories_from_tools` function takes in a list of tool names or aliases (in particular, the list provided by the user on the command-line) and returns the unique repositories that need to be cited given this list.

### `citation.fetch`

This submodule contains two helper functions.
The simplest is the `yaml_str_to_dict` function which takes a string containing the yaml-formatted content of a file and parses it into a Python `dict` — this is for use when retrieving files from the internet as opposed to loading them from disk.

The `fetch_from_github` function provides a streamlined function for fetching the content of files from GitHub repositories, and validates that the request was successful.

### `citation.format` and the `citation.*_fmt` submodules

The `Format` class resides in this submodule.
This class is the intended base class for any citations that we want to generate, and provides this abstract functionality.
The various `*_fmt` submodules then provide the derived classes for each of the citation formats that we want to support - `bibtex_fmt` provides the `BibTexEntry` class, and `text_fmt` supports writing citations as human-readable strings.

The key properties of the `Format` class are the `required` and `optional` class variables - these are not set in the base class and are intended to be overwritten by classes that inherit from `Format`.
Both of these class variables should be lists of strings, corresponding to the keys in a `CITATION.cff` format that the citation (described by the inheriting class) needs (respectively can make use of) when being produced.
Existence of these keys in the CITATION.cff information is checked for in the `Format.__init__` class constructor to avoid repetition across submodules and catch errors.
Optional keys are allowed to be missing - these can be set to be reported if so.
The `Format` class also contains a class-wide function for parsing the `authors` key information into a string, `Format._prepare_authors_field` - which is again invoked on instantiation.
Finally, `Format` provides a placeholder `generate_ref_string` function that should be overwritten by any inheriting classes — this is the method that will parse the data read in and produce the citation string.

For example, the `TextCitation` format inherits from `Format` and requires the keys "author", "title" and "year" to produce a reference.
It also provides a list of optional keys that the format can still make use of when producing the citation, but it does not require them.
`TextCitation` itself does not need to define a constructor function since the inherited constructor from `Format` now suffices with the overwritten values for `TextCitation.required` and `TextCitation.optional`.
`TextCitation` does implement `generate_ref_string` in order to overwrite the placeholder function defined by `Format` — this will dictate how the citation string is formatted and assembled.

The `BibTexEntry` class is slightly more subtle since any given entry type in a bibtex file may require different keys present, _and_ the syntax for the author field in a bibtex reference differs from the default format implemented in the `Format._prepare_authors_field`.
As such, `BibTexEntry` overwrites both the `_prepare_authors_field` function, _and_ implements some preliminary steps in `BibTexEntry.__init__` before invoking the base constructor in `Format.__init__`.
`BibTexEntry` _does_ however implement the `generate_ref_string` method, since this is the same for all bibtex citation types.
Each bibtex citation type, such as `Software` or `Article`, then inherits from the `BibTexEntry` class and defines the `required` and `optional` fields as usual.
122 changes: 122 additions & 0 deletions docs/source/documentation/brainglobe-utils/citation-module.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Generating Citations for BrainGlobe tools

If you have `brainglobe-utils` installed, you can use the `cite-brainglobe` command-line tool it provides to generate citations or acknowledgement sentences.
`brainglobe-utils` comes with the one-line BrainGlobe install (`pip install brainglobe`), but will also be fetched by most of our tools if you decide to install them as standalone.
You can check whether or not you have the `cite-brainglobe` program available by running

```bash
cite-brainglobe --help
```

in a terminal in your BrainGlobe environment.
If `cite-brainglobe` is installed, you should see the help and information about it printed to your terminal.
If you get a "no such file or directory" or "no program found" message back, it means that you don't have `cite-brainglobe` installed - you might need to update your BrainGlobe installation, or activate your BrainGlobe environment.

## Using the tool

You can tell `cite-brainglobe` which BrainGlobe tools you are using, and it will return to you a citation for each of those tools that you can copy into your work.
You don't need to use the exact tool name either, though `cite-brainglobe` is not perfect so it helps if you can be as close to the name as possible!
Case-sensitivity is ignored, but you will have to use quotation marks if you refer to a tool that has a space in its name.

For example, if you want to cite BrainGlobe's AtlasAPI tool;

```bash
$ cite-brainglobe "brainglobe atlasapi"
@article{bg-atlasapi,
authors = "Federico Claudi and Luigi Petrucco and Adam Tyson and Tiago Branco and Troy Margrie and Ruben Portugues",
title = "BrainGlobe Atlas API: a common interface for neuroanatomical atlases",
journal = "Journal of Open Source Software",
year = "2020",
volume = "5",
month = "10",
doi = "10.21105/joss.02668",
}
```

You can also ask for citations for multiple tools at once, and leave out the "brainglobe" prefix for most tools.
Asking for a citation for "brainglobe" will give you a citation that points to the BrainGlobe project webpage.
The following command for example, asks for a citation for `atlasapi` (which is interpreted as "BrainGlobe AtlasAPI"), and for `brainglobe` - the BrainGlobe tool suite:

```bash
$ cite-brainglobe atlasapi brainglobe
@article{bg-atlasapi,
authors = "Federico Claudi and Luigi Petrucco and Adam Tyson and Tiago Branco and Troy Margrie and Ruben Portugues",
title = "BrainGlobe Atlas API: a common interface for neuroanatomical atlases",
journal = "Journal of Open Source Software",
year = "2020",
volume = "5",
month = "10",
doi = "10.21105/joss.02668",
}

@software{brainglobe-meta,
authors = "BrainGlobe Developers and Wiliam Michael Graham",
title = "BrainGlobe",
url = "https://brainglobe.info/",
year = "2024",
abstract = "The BrainGlobe Initiative exists to facilitate the development of interoperable Python-based tools for computational neuroanatomy.",
license = "BSD-3-Clause",
}
```

By default, citations are printed to the terminal screen (`stdout`), which you can then copy where you need.
The default output format for citations is `bibtex`.
These can be changed by providing the appropriate flags as detailed in the [usage pattern](#usage-pattern).

### Usage Pattern

`cite-brainglobe`'s usage pattern is available by passing the `-h` or `--help` flags:

```bash
$ cite-brainglobe --help
usage: cite-brainglobe [-h] [-l] [-s] [-w] [-o OUTPUT_FILE] [-f FORMAT] [tools ...]

Citation generation for BrainGlobe tools.

positional arguments:
tools BrainGlobe tools to be cited.

options:
-h, --help show this help message and exit
-l, --list List citable BrainGlobe tools, and formats, then exit.
-s, --software-citations
Explicitly cite software source code over academic papers.
-w, --warn-unused Print out when citation information is omitted by the chosen citation
format.
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Output file to write citations to.
-f FORMAT, --format FORMAT
Citation format to write. Will be overwritten by the inferred format if
the output file argument is also provided. Valid formats can be listed
with the -l, --list option.
```
The `-l` (`--list`) option will provide you with a list of citation formats that the tool supports, and the list of tools that the program is aware of.
Currently supported citation formats are:
- Bibtex (`*.tex`), use `--format bibtex` to request this citation type.
- Text (`*.txt`), use `--format text` to request this citation type. This option is mainly for when you want to generate a citation you can copy/paste into a bibliography, or an acknowledgements section.
The `-s` (`--software-citations`) option will prioritise citing BrainGlobe tool _software_ — that is, the source code or program — rather than the article or journal entry that provides the theoretical basis for the tool or algorithm.
By default we expect users to prefer citing the article, however if you specifically want to credit the software or tool implementation - in cases where you have made a contribution to the source code for example - you can use this option.
Keep in mind that this option is set for _all_ tools that you ask to be cited.
If you want to cite some tools by software, and others by article, you will need to run `cite-brainglobe` twice, once with the `-s` flag and once without.
Some citation formats do not make use of all the metadata that we make available.
If you want to be aware of cases where metadata has not been used, you can pass the `-w` or `--warn-unused` flag.
This flag primarily exists for developers when debugging the tool.
You can redirect the citation output to a file of your choice by providing the `-o` (`--output-file`) option, followed by a valid file path.
The file will be overwritten if it exists already, or created if it does not exist.
If you do not provide a format via the `-f` flag, `cite-brainglobe` will attempt to infer the citation format you want from your output files extension.
In the event this is impossible, the tool with report a failure.
The `-f` or `--format` option can be used to toggle the format of the citation that is produced, so long as it is one of the supported format types given by the `--list` option.
For example,
```bash
cite-brainglobe -f txt
```
will write text (human-readable) citations as opposed to the default bibtex style.
The `-f` flag will be ignored if you provide an output file _with a supported extension_ via the `-o` option.
1 change: 1 addition & 0 deletions docs/source/documentation/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Once you have installed `brainglobe`, or [installed an individual tool](#install
```{toctree}
:maxdepth: 1
setting-up/index
brainglobe-utils/citation-module
bg-atlasapi/index
brainglobe-space/index
brainreg/index
Expand Down

0 comments on commit abc2a80

Please sign in to comment.