-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Germplasm Accession Importer #6
Conversation
…ganismID function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I will do this review in stages.
Review of the associated documentation can be seen here: TripalCultivate/docs#2
Review of the import form:
- It would be good to include a link on the import page to your more detailed documentation! Something like "For more detailed information on this format including links to lookup various codes, please see our official documentation."
- Instructions block feels a bit like placeholder text. Suggestion in separate review comment.
trpcultivate_germplasm/src/Plugin/TripalImporter/GermplasmAccessionImporter.php
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next I tried a functional test. I followed the steps outlined for setup and then tested as follows:
- Put in my own simple test file with altered names / organism: simple_example.txt
✅ This worked beautifully. Importer ran with no errors and the records in the stock and dbxref tables were as expected.
- Imported the sample file as in 1 a second time.
✅ This also worked beautifully. Importer ran with no errors and detected that the records already existed (did not say "inserting" in the job output.
✅ Records in database still as expected.
- Imported my version of the properties example file and included the same two accessions as in the first file but with more details:
props_syns_example.txt
❌ This unfortunately failed. The job produced the following output which shows it was unable to determine that the same germplasm already existed 🙈
docker exec trpcultivate drush trp-run-jobs --username=drupaladmin --root=/var/www/drupal/web
2023-12-12 22:18:34
Tripal Job Launcher
Running as user 'drupaladmin'
-------------------
2023-12-12 22:18:34: Job ID 3.
2023-12-12 22:18:34: Calling: tripal_run_importer(36)
Running 'Tripal Cultivate: Germplasm Accessions' importer
NOTE: Loading of this file is performed using a database transaction. If it fails or is terminated prematurely then all insertions and updates are rolled back and will not be found in the database
Percent complete: 0%. Memory: 33,996,184 bytes.
Percent complete: 1.12 %. Memory: 34,005,008 bytes.
Percent complete: 3.21 %. Memory: 34,005,080 bytes.
Percent complete: 4.33 %. Memory: 34,005,144 bytes.
Percent complete: 32.96 %. Memory: 34,005,424 bytes.
Percent complete: 56.56 %. Memory: 34,005,552 bytes.
Percent complete: 74.58 %. Memory: 34,014,168 bytes.
Percent complete: 100.00 %. Memory: 34,014,288 bytes.
Percent complete: 100.00 %. Memory: 34,014,648 bytes.
Inserting "Test1".
ERROR: SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "stock_c1"
DETAIL: Key (organism_id, uniquename, type_id)=(1, GERM:00001, 9) already exists.: INSERT INTO "testchado"."stock" ("organism_id", "name", "uniquename", "type_id") VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3) RETURNING stock_id; Array
(
)
[site http://default] [TRIPAL] ERROR: SQLSTATE[23505]: Unique violation: 7 ERROR: duplicate key value violates unique constraint "stock_c1"DETAIL: Key (organism_id, uniquename, type_id)=(1, GERM:00001, 9) already exists.: INSERT INTO "testchado"."stock" ("organism_id", "name", "uniquename", "type_id") VALUES (:db_insert_placeholder_0, :db_insert_placeholder_1, :db_insert_placeholder_2, :db_insert_placeholder_3) RETURNING stock_id; Array()
❌ I also ended up with partially inserted data rather then everything being rolled back. Interestingly the output says Test1 (end of file) was inserted but it was NOT. However, the stockprop table does have a bunch of records that should have been rolled back 🤔
…ate germplasm accessions
…d better handling of exceptions in general.
Thank you for your review Lacey! I made the following two changes to address the error that you saw:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the same file I used last time on the new changes:
✅ I did not getting an error "ERROR: A stock already exists for accession "GERM:00001" but with a germplasm name of "CDC Ch@d" which does not match the input file."
❌ It still didn't actually roll back the inserted records 😕
$connection = \Drupal::service('tripal_chado.database');
$stock_query = $connection->select('1:stock', 's')->fields('s', ['organism_id', 'name', 'uniquename', 'type_id']);
$stock_record = $stock_query->execute()->fetchAll();
= [
{#6212
+"organism_id": "1",
+"name": "CDC Ch@d",
+"uniquename": "GERM:00001",
+"type_id": "9",
},
{#6211
+"organism_id": "1",
+"name": "USASK Ken",
+"uniquename": "GERM:00002",
+"type_id": "9",
},
]
To replicate, I used a fresh docker image/container on this branch, ran the following in drush php and then submitted an import job with the same file I used before.
$config_factory = \Drupal::configFactory();
$germplasm_config = $config_factory->getEditable('trpcultivate_germplasm.settings');
$germplasm_config->set('terms.accession', 9);
$germplasm_config->set('terms.institute_code', 10);
$germplasm_config->set('terms.institute_name', 11);
$germplasm_config->set('terms.country_of_origin_code',12);
$germplasm_config->set('terms.biological_status_of_accession_code', 13);
$germplasm_config->set('terms.breeding_method_DbId', 14);
$germplasm_config->set('terms.pedigree', 15);
$germplasm_config->set('terms.synonym', 16);
$germplasm_config->set('terms.stock_relationship_type_synonym', 17);
$germplasm_config->set('terms.subtaxa', 18);
$germplasm_config->save();
$connection = \Drupal::service('tripal_chado.database');
$subspecies_cvterm_query = $connection->select('1:cvterm','cvt')
->fields('cvt', ['cvterm_id', 'name', 'definition'])
->condition('name', 'subspecies', '=');
$subspecies_cvterm_record = $subspecies_cvterm_query->execute()->fetchAll();
$subspecies_cvterm_id = $subspecies_cvterm_record[0]->cvterm_id;
$organism_id = $connection->insert('1:organism')
->fields([
'genus' => 'Tripalus',
'species' => 'postgresquelus',
'infraspecific_name' => 'chadoii',
'type_id' => $subspecies_cvterm_id,
])
->execute();
$db_id = $connection->insert('1:db')
->fields([
'name' => 'Germplasm Database',
])
->execute();
...s/src/Kernel/TripalImporter/GermplasmAccessionImporter/GermplasmAccessionImporterRunTest.php
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working through the code review :-) I'm going to list all the functions in this comment and as I review them I'm going to check them off. All comments/suggestions I'll do inline in the code.
Functions to review in GermplasmAccessionImporter class:
- Dependency injection (create + _construct)
- describeUploadFileFormat()
- setCVterm() + getCVterm()
- form() + formValidate()
- setUpCVterms()
- run()
- getOrganismID()
- getStockID()
- getDbxrefID()
- loadStockProperties()
- loadSynonyms()
trpcultivate_germplasm/src/Plugin/TripalImporter/GermplasmAccessionImporter.php
Show resolved
Hide resolved
trpcultivate_germplasm/src/Plugin/TripalImporter/GermplasmAccessionImporter.php
Outdated
Show resolved
Hide resolved
trpcultivate_germplasm/src/Plugin/TripalImporter/GermplasmAccessionImporter.php
Show resolved
Hide resolved
continue 2; | ||
} | ||
} | ||
$germplasm_name = $germplasm_columns[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll want to add trim() around all of these ;-p
$germplasm_name = $germplasm_columns[0]; | |
$germplasm_name = trim($germplasm_columns[0]); |
trpcultivate_germplasm/src/Plugin/TripalImporter/GermplasmAccessionImporter.php
Outdated
Show resolved
Hide resolved
trpcultivate_germplasm/src/Plugin/TripalImporter/GermplasmAccessionImporter.php
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete 🎉
Overall, this importer is looking really solid :-) I have a few inline comments but for the most part everything is looking good and these suggestions are relatively minor. I'll create an issue for the discussion regarding stock lookup. Any updates you get added in based on review are a bonus but if you run out of time before vacation, I feel comfortable merging this as is and moving the suggestions into another issue.
For prioritizing, I would say the biggest priority is getting the test updated so it fails demonstrating the bug in core regarding transactions.
...ests/src/Kernel/TripalImporter/GermplasmAccessionImporter/GermplasmAccessionImporterTest.php
Outdated
Show resolved
Hide resolved
...ests/src/Kernel/TripalImporter/GermplasmAccessionImporter/GermplasmAccessionImporterTest.php
Outdated
Show resolved
Hide resolved
...ests/src/Kernel/TripalImporter/GermplasmAccessionImporter/GermplasmAccessionImporterTest.php
Show resolved
Hide resolved
…de more information
…g compared if a test fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes addressed or moved to issues for later.
✅ READY TO MERGE 🎉
Issue #2
Motivation
To create a Tripal 4 Importer instance for germplasm accessions which accommodates all the metadata required to meet BrAPI standards. Essentially, to upgrade this Tripal 3 code to Tripal 4: https://github.com/UofS-Pulse-Binfo/uofspb_germplasm/blob/7.x-3.x/includes/TripalImporter/GermplasmAccessionImporter.inc
User documentation for this importer, for which I am also open to feedback, has been updated here: https://tripalcultivate.github.io/docs/docs/curation/germplasm-data
What does this PR do?
Sets up CV term configuration in YML format to store CV term IDs that are relevant to processing germplasm. Also implements the getCVterm() and setCVterm() function for grabbing and setting throughout the form as well as in kernel tests. In the future, the curator will be able to set these terms through the UI. This will be implemented at a later time since configuration will be shared among all of the TripalCultivate Germplasm modules.
Divides the run() function into multiple smaller functions that each take care of a specific chunk of the importer. Kernel tests for each function are also given their own testing function in the format
testGermplasmAccessionImporterFunctionName()
. The following have been implemented:Testing
Automated Testing
The test is marked complete if implemented
testGermplasmAccessionImporterForm()
testGermplasmAccessionImporterRun()
testGermplasmAccessionImporterGetOrganismID()
testGermplasmAccessionImporterGetStockID()
testGermplasmAccessionImporterGetDbxrefID()
testGermplasmAccessionImporterloadStockProperties()
testGermplasmAccessionImporterloadSynonyms()
Manual Testing
At localhost in your browser, login to the site and navigate in your Admin navigation bar to Tripal -> Data Loaders and make sure
Tripal Cultivate: Germplasm Accessions
is in the listing. If not, try re-runningdocker exec trpcultivate drush cr
a couple more times and refresh the page.You will need to configure CVterms through the command line since this module does not yet have an interface for it. You will also need an organism and an external DB to already exist. You can either use
docker exec -it trpcultivate drush php:cli
or PHP Execute through the interface (you may have to configure the Devel bar to show this). Copy and paste the following PHP code:TripalCultivate-Germplasm/trpcultivate_germplasm/tests/src/Fixtures/simple_example.txt
TripalCultivate-Germplasm/trpcultivate_germplasm/tests/src/Fixtures/props_syns_example.txt
Submit the form, then you will be prompted to run a Tripal job. Just ensure the job command is prefixed with
docker exec trpcultivate
to run it within the docker container. Hopefully, you will see output such asReached end of file without encountering any errors. Transaction will be committed to the database.
on your terminal, indicating that the import worked!You can verify that the stocks in your database match the ones in your file of choice by going back to the php CLI or Execute PHP and running the following select query:
If testing with the file
props_syns_example.txt
, it is also good to check thestockprop
andstock_synonym
for values as well: