ALA Name Matching Service

This priovides a set of web services for name matching, using the ala-name-matching library. It consists of three components. all with maven groupId au.org.ala.names:

ala-namematching-core A core library containing common objects
ala-namematching-client A client library that can be linked into other applications and which accesses the web services
ala-namemacthing-server A server application that can be used for name searches

Client Library

To include the client library in an application include the following dependency

<dependency>
    <groupId>au.org.ala.names</groupId>
    <version>1.8.1</version>
    <artifactId>ala-namematching-client</artifactId>
</dependency>

or for gradle

compile "au.org.ala.names:ala-namematching-client:1.8.1"

To access the client library, create a configuration and then create a client based on the configuration. The client implements the name matching API. You can do this either programmatically, using the client configuration builder:

ClientConficonfiguration configuration = ClientConfiguration.builder()
    .baseUrl(new URL("https://namematching-ws.arg.au"))
    .timeOut(300000)
    .cacheSize(200000)
    .build();
this.client = new ALANameUsageMatchServiceClient(configuration);

The possible configuration parameters are

parameter	default	description
baseUrl		The base URL of the name matching service
timeOut	30000	The connection timeout in milliseconds
cache	true	Cache server requests and responses (see below for data caching)
cacheDir		The cache directory (defaults to a temporary directory)
cacheSize	52428800 (50Mb)	The cache size in bytes

Or you can read a configuration from a json or YML document, via Jackson. For example:

{
  "baseUrl": "https://namematching-ws.arg.au",
  "timeOut": 3000,
  "cache": false
}

ObjectMapper mapper = new ObjectMapper();
ClientConficonfiguration configuration = om.readValue(new File("config.json"), ClientCondifguration.class);
this.client = new ALANameUsageMatchServiceClient(configuration);

Data caching

As well as a web service cache, the application can configure a data cache that holds the results of name searches. The data cache can be used to improve the performance of the match(NameSearch) and matchAll(List<NameSearch>) calls by caching responses. In the case of the matchAll call, partial matches result in a partial request to the server, with the already cached items filled from the cache.

The client library has data caching disabled by default. If you intend to use a sara cache, you will need to include an cache2k implementation in your dependencies. For example:

<dependency>   
  <groupId>org.cache2k</groupId>
  <artifactId>cache2k-jcache</artifactId>
  <version>1.2.0.Final</version>
</dependency>

or for gradle

runtime "org.cache2:cache2k-jcache:1.2.0.Final"

To build a data-cached client, you need to build a data cache configuration and add it to the client cofiguration.

DataCacheConfiguration dataCache = DataCacheConfiguration.builder()
        .enableJmx(false)
        .build();
ClientConficonfiguration configuration = ClientConfiguration.builder()
        .baseUrl(new URL("https://namematching-ws.arg.au"))
        .dataCache(dataCache)
        .build();
this.client = new ALANameUsageMatchServiceClient(configuration);

or

{
  "baseUrl": "https://namematching-ws.arg.au",
  "dataCache": {
    "enableJmx": false
  }
}

The possible data cache configuration parameters are:

parameter	default	description
enableJmx	true	Enable Java Management Extension monitoring of the cache. This allows a running applicationm to be queried about cache performance via applications such as `jconsole`
entryCapacity	100000	The number of entries to cache
eternal	true	If true, do not expire old entries
keepDataAfterExpired	false	Keep data in the cache after expiry
permitNullValues	true	Allow caching of nulls
suppressExceptions	false	Suppress, rather than propagate exceptions

How to start the ALANameMatchingService application

Run mvn clean install to build your application
Download a pre-built name matching index (e.g https://archives.ala.org.au/archives/nameindexes/20210811-3/namematching-20210811-3.tgz), and untar in /data/lucene This will create a /data/lucene/namematching-20210811 directory.
cd to the server subdirectory
Start application with java -jar target/ala-name-matching-server-1.8.1.jar server config-local.yml
To check that your application is running enter url http://localhost:9180
Test with http://localhost:9179/api/search?q=Acacia. The response should look similar to:

{
    "success": true,
    "scientificName": "Acacia",
    "scientificNameAuthorship": "Mill.",
    "taxonConceptID": "http://id.biodiversity.org.au/node/apni/6719673",
    "rank": "genus",
    "rankID": 6000,
    "lft": 590410,
    "rgt": 593264,
    "matchType": "exactMatch",
    "nameType": "SCIENTIFIC",
    "synonymType": null,
    "kingdom": "Plantae",
    "kingdomID": "http://id.biodiversity.org.au/node/apni/9443092",
    "phylum": "Charophyta",
    "phylumID": "http://id.biodiversity.org.au/node/apni/9443091",
    "classs": "Equisetopsida",
    "classID": "http://id.biodiversity.org.au/node/apni/9443090",
    "order": "Fabales",
    "orderID": "http://id.biodiversity.org.au/node/apni/9443087",
    "family": "Fabaceae",
    "familyID": "http://id.biodiversity.org.au/node/apni/9443086",
    "genus": "Acacia",
    "genusID": "http://id.biodiversity.org.au/node/apni/6719673",
    "species": null,
    "speciesID": null,
    "vernacularName": "Acacia",
    "speciesGroup": [
        "Plants"
    ],
    "speciesSubgroup": [],
    "issues": [
       "noIssue"
    ]
}

Web Services

To see complete documentation of the webservices available enter url http://localhost:9179

Hinting

Search requests may contain hints. These are lists of possible values for un-specified elements of the search. An example search with hints is:

{
  "scientificName": "Acacia dealbata",
  "family": "Fabaceae",
  "hints": {
    "kingdom": [ "Plantae", "Fungi" ],
    "family": [ "Fabaceae", "Chenopodiaceae" ]
  }
}

Hints are used in two ways, if the server is configured to use them - see below

Hints are used to fill out the search if the corresponding term is absent in the search. In the above example, the service will try and match against a copy of the term where the kingdom is null, Plantae and Fungi. The family hint is not used, since it has been supplied. Searches proceed from least specific match using the least number of hints (none) to the most specific match using the largest number of hints, stopping when something is found.
Hints are also used to sanity-check the resulting match. If hints are available, then the resulting match is checked against the list of hints and flagged with a hintMismatch issue if the match does not correspond to the hint.

Loose matches

Generally, the scientificName attribute is assumed to be a scientific name. However, some sources of information may also provide a name that is either a vernacular name or the taxon identifier associated with a specific taxon. This sort of search is termed a loose search.

Search requests may contain a "loose": true value. Loose searches will see if the supplied scientificName value is actually a verncaular name or taxon identifier, as well as a normal scientific name.

The loose parameter is only used by the search POST request, where it can be specified as part of the request body. Search requests that use GET requests and URL parameters are always loose.

A server must be configured to honour loose requests. See below.

Health Check

To see your applications health enter url http://localhost:9180/healthcheck

Test

http://localhost:9179/search?q=macropus+rufus

Configuration

The name matching service uses a YAML configuration file with a number of possible entries. Most of these entries have suitable defaults.

		Description	Default
logging		Logging configuration, see https://www.dropwizard.io/en/latest/manual/configuration.html for documentation
server		Server configuration, see https://www.dropwizard.io/en/latest/manual/configuration.html for documentation
search		Search configuration
	index	The path of the index directory	`/data/lucene/namematching`
	groups	URL of the groups configuration	`file:///data/ala-namematching-service/config/groups.json`
	subgroups	URL of the subgroups configuration	`file:///data/ala-namematching-service/config/subgroups.json`
	useHints	Use hints supplied by the request to aid matching	true
	checkHints	Check the resulting match against the supplied hints as a sanity check	true
	allowLoose	Allow loose searches	true

The groups.json file is a list of common names for taxa, eg.

[
...
{
    "name": "Molluscs",
    "rank": "phylum",
    "included": ["Mollusca"],
    "excluded": [],
    "parent": "Animals"
  }
...
]

Where name is the descriptive group name, usually a common name, rank is the rank of the associated taxa, included and excluded provide lists of taxonomic names for the group and parent is the name of immediate parent group. The taxa are matched against the taxonomic index, specified by search.index in the configuration.

The subgroups.json file is a list of further, more detailed descriptions for taxa. The subgroups are attached to parent groups. Eg.

[
...
  {
    "speciesGroup": "Birds",
    "taxonRank": "order",
    "taxa": [
      {
        "name": "ANSERIFORMES",
        "common": "Ducks, Geese, Swans"
      },
      {
        "name": "APODIFORMES",
        "common": "Hummingbirds, Swifts"
      },
      ...
    ]
   },
...
]

The speciesGroup refers to the group name, from the file above, the taxonRank gives the rank of the names in the list and taxa is a list of mappings from scientific names to common descriptions. As with the groups list, the subgroups are matched using the current name index.

Building the docker image

Change directory to the server module.

docker build -f docker/Dockerfile . -t ala-namematching-service:v20210811-3

for use ALA namematching and for use the GBIF backbone:

docker build -f docker/Dockerfile . -t  ala-namematching-service:v20210811-3 --build-arg ENV=gbif-backbone

If you want a quick'n'easy docker instance for testing, use

docker build -f docker/Dockerfile-test . -t ala-namematching-service:test

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.github/workflows		.github/workflows
ala-namematching-tools		ala-namematching-tools
ala-ws-client-common		ala-ws-client-common
ala-ws-client-testing		ala-ws-client-testing
client		client
core		core
debian		debian
sbdi		sbdi
server		server
.gitignore		.gitignore
.travis.yml		.travis.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
lombok.config		lombok.config
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALA Name Matching Service

Client Library

Data caching

How to start the ALANameMatchingService application

Web Services

Hinting

Loose matches

Health Check

Test

Configuration

Building the docker image

About

Releases

Packages

Languages

License

biodiversitydata-se/ala-namematching-service

Folders and files

Latest commit

History

Repository files navigation

ALA Name Matching Service

Client Library

Data caching

How to start the ALANameMatchingService application

Web Services

Hinting

Loose matches

Health Check

Test

Configuration

Building the docker image

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages