The Armadillo suite can be used by data stewards to share datasets on a server. Researchers can then analyse these datasets and datasets shared on other servers using the DataSHIELD analysis tools. Researchers will only be able to access aggregate information and cannot see individual rows.
The Armadillo uses the DataSHIELD platform to facilitate analysis. It contains a variety of statistical packages applicable to different research areas. There are DataSHIELD packages for standard statistical analysis , exposome studies , survival studies , microbiome studies and analysis tools for studies that are using large genetic datasets. These packages can all be installed in the Armadillo suite.
How does it work? A researcher connects from an R client to one or multiple Armadillo servers. The data is loaded into an R session on the Armadillo server specifically created for the researcher. Analysis requests are sent to the R session on each Armadillo server. There the analysis is performed and aggregated results are sent back to the client.
Data stewards can use the Armadillo web user interface or MolgenisArmadillo R client to manage their data on the Armadillo file server. Data is stored in parquet format that supports fast selections of the columns (variables) you need for analysis. Data stewards can manage the uploaded data in the web browser. The data can be stored encrypted on the Armadillo file server. When using the web user interface you must first convert your data into parquet. (CSV uploads will be supported in the near future).
Everybody logs in via single sign on using an OIDC central authentication server such as KeyCloack or Fusion auth that federates to authentication systems of connected institutions, ideally using a federated AAI such as LifeScience AAI.
To spin up your own server on a laptop, you can run java -jar armadillo-3.x.x.jar
For armadillo 2.x you can follow instructions at
- for testing we use docker compose at https://github.com/molgenis/molgenis-service-armadillo/tree/armadillo-service-2.2.3
- for production we are using Ansible at https://galaxy.ansible.com/molgenis/armadillo
You can explore the User interface endpoints at localhost:8080/ui
Here you will find user interfaces for:
- defining projects and their data
- defining users and their project authorizations
- defining and managing datashield profiles
You can also explore the API endpoints at localhost:8080/swagger-ui/index.html
Finally, you can download the R client.
Of course the next step would be to use a DataSHIELD client to connect to Armadillo for analysis.
This repository uses pre-commit
to manage commit hooks. An installation guide can be found
here. To install the hooks, run pre-commit install
once in
the root folder of this repository. Now your code will be automatically formatted whenever you commit.
For local storage, you don't need to do anything. Data is automatically stored in the data/
folder in this repository.
You can choose another location in application.yml
by changing the storage.root-dir
setting.
If you want to use MinIO as storage (including the test data), do the following:
- Start the container with
docker-compose --profile minio up
- In your browser, go to
http://localhost:9090
- Log in with molgenis / molgenis
- Add a bucket
shared-lifecyle
- Copy the folders in
data/shared-lifecycle
in this repository to the bucket - In
application.yml
, uncomment theminio
section. - Now Armadillo will automatically connect to MinIO at startup.
Note: When you run Armadillo locally for the first time, the
lifecycle
project has not been added to the system metadata yet. To add it automatically, see Application properties. Or you can add it manually:
- Go to the Swagger UI (
http://localhost:8080/swagger-ui/index.html
)- Go to the
PUT /access/projects
endpoint- Add the project
lifecycle
Now you're all set!
You can configure the application in application.yml
. During development however, it is more convenient to override
these settings in a local .yml file that you do not commit to git. Here's how to set that up:
- Next to
application.yml
, create a fileapplication-local.yml
(this file is ignored by git) - Give it the following content:
armadillo:
oidc-permission-enabled: false
docker-management-enabled: true
oidc-admin-user: <your OIDC email>
spring:
security:
oauth2:
client:
registration:
molgenis:
client-id: <OIDC client ID>
client-secret: <OIDC client secret>
Note: If can't configure an oauth2 client for any reason, just remove the
spring
section.
- Now, in the Run Configuration for the DatashieldServiceApplication, add the following program argument:
--spring.config.additional-location=file:armadillo/src/main/resources/application-local.yml
Now the lifecycle test project (including its data) will work out of the box, and you will be able to log in with your OIDC account immediately.