- 2024-11-02/dmichaels - Fix for unexpected 'sid' indexing problem.
2024-09-03/dmichaels - Fix in snovault/tests/elasticsearch_fixture.py (use only for local/dev deploy) for
strange (new as of 2024-09-02) behavior where it was hanging on startup during ElasticSearch index mapping creation, related to ElasticSearch logging output, and the way we were using subprocess.Popen and reading the subprocess output; more correct way is to inherit stdout/stderr of the partent.
Minor changes to allow running (for example) both cgap-portal and smaht-portal simultaneously locally, for localhost/dev purposes only: - Minor updates to dev_servers.py and tests/elasticsearch_fixture.py
to allow defining transport_port for elasticsearch.
- Minor updates to dev_servers.py and tests/postgresql_fixture.py to allow parsing sqlalchemy.url in the ini file (e.g. development.ini) for the postgres port and temporary directory path.
- Fix in indexing_views.py for frame=raw not including the uuid.
- Bug fix: use loadxl_order() in staggered reindexing
- Add B-tree index to rid column in propsheets to optimize revision history retrieval
- Fix for revision history - deepcopy history as to not modify props in place
Dropped support for Python 3.8.
Updates related to Python 3.12. - Had to update venusian (from 1.2.0) to 3.1.0. - Had to update pyramid (from 1.10.4) to 1.10.8 (for imp import not found).
- Had to add pmdarima (no module pyramid.compat).
- Had to define/update numpy (to 1.26.4) for this as it was implicitly, due to something else, using 1.24.4 which failed to build with Python 3.12. - And had to update lower bound of Python (from 3.8.1) to 3.9 for this.
- Had to update dcicutils (from 8.11.0) to 8.13.0 (for pyramid update for imp import not found).
Minor change to dev_servers.py to facilitate running a local ElasticSearch proxy to observe traffic (resquests/responses) between the portal and ElasticSearch with a tool like mitmproxy or mitmweb; see comments in dev_server.py.
- Add /routes endpoint to return all routes and select item views in the application
- Update /submission-schemas/ to capture required prop via new key in property schema
- Update
drs_download
to not guard on Authentication, as this check is superfluous since @@drs as_user is evaluated
- Update
drs
primitive to only return JSON
- Fix update-inserts-from-server command to display --help option
- Fix update-inserts-from-server command to move away from direct ES interaction * Rewrite signicantly * Add new options to allow for more flexible use
- Create constants for submission-schemas endpoint to share with downstream portals
- 2024-03-25
- Changes to loadxl to support tracking ingestion progess for smaht-submitr (via Redis).
- Changed dev_servers.py
- Fix in loadxl to PATCH on validate_only for items which already exist; discovered during smaht-submitr testing.
- Fix in loadxl.normalize_deleted_properties which was creating/returning a new (an_item) item, which was messing up determination of identifying path for patch (as second_round_items comes from store but we had set uuid in an_item which, without this fix, became a different object).
- Added skip_links feature to loadxl which will cause reference/link integrity checking to be skipped altogether; this is (currently) only set by smaht-portal/ ingestion/loadxl_extensions.py for smaht-submitr, since that process already does thorough reference integrity checking anyways (via structured_data).
- Remove restricted permissions for AccessKey status to enable non-admins to delete access keys
- Changed ACCESSION_PREFIX in server_defaults.py to GET_ACCESSION_PREFIX() function; called only within snovault (and only from schema_formats.py); to get around app_project call at file scope (came up as circular import in smaht ingester).
- Gets total results from ES, then try to get exact count if total hits ES_MAX_HIT_TOTAL limitation
- Repairs schema format validation
- Change the exception message for a unresolved object reference (linkTo) in schema_validation.normalize_links.
- Added instance info to ERROR in loadxl.load_all_gen.
- Both of above in support of reference integrity validation code within smaht-submitr.
- Removes strip of
role.
permissions so smaht-portal roles work
- Version updates to dcicutils. Changes to itemize SMaHT submission ingestion create/update/diff situation.
- Added support for an optional gitinfo.json file (deployed via portal buildspec.yml).
- Add submission-schemas api
- Updated dcicutils to 8.6.0 (with minor fixes related to structured_data and SMaHT ingestion).
- Updated dcicutils to 8.4.1 (with structured_data).
- Updated loadxl to pass "filename" in yields (for smaht-portal/ingester).
- More work related to SMaHT ingestion.
- RAS updates
- Broaden schema
$merge
regex to allow mixin and other references
- Another thug commit to add CHANGELOG for below.
- Thug commit to change dcictuils from 8.2.0 to ^8.2.0.
- Merging in Doug's drr_schema_updates branch with new types.
- Added limited support to loadxl for required properties within anyOf of data type schemas.
- Merged in load_data_fix branch.
- Update dcicutils to 8.2.0
- 2023-11-02
- Repair reference to
load_data_by_type
to resolve correctly when loadxl is absent entirely from the application repo
Upgrade to Python 3.11.
Fixed access of user in types/access_key.py in access_key_add WRT request.validated['user'].
Added identifyingProperties with just uuid in schemas/access_key.json.
Fix in setup_eb.py to handle jsonschema in pyproject.toml like {extras = ..., version = ...}.
Added snovault/commands/generate_local_access_key.py script; originally just for smaht-portal to create access-key for local dev/testing because doing it via UI not yet fully supported; but generally convenient for cgap-portal and fourfront as well. * Minor changes (e.g. create_testapp) to loadxl.py to help load data from a specified directory;
called from dev_server.py; for creating access-keys on the fly after startup for local dev/testing. * Enhancement in load_data in loadxl.py to respect a fully qualified data directory path name,
i.e. do not make it relative to the current working directory if it is fully qualified.
- Updates to load_all_gen to allow object create/update with no uuid.
Added snovault/commands/view_local_object.py script for dev/testing to retrieve and output a given object (uuid) from a locally running portal.
Added support for consortia and submission_centers in ingestion_listener.py.
Added unique_key to types/access_key.py (helps get rid of this in cgap-portal/fourfront).
- Bug fix in schema reference resolution when the schema is loaded from a file
- Bug fix in access key refresh to predicate on whether
expiration is enabled
- Update
drs
primitive to resolve specific access types with preferential defaulting to https, http
- Repair bug in
permission
implementation involving restricted fields - Repair bug in user registration, allowing customization through
app_project
definition
- Extend
FormatChecker
to ensure date and date-time validation
- Updates
jsonschema
version, removing dependency onjsonschema-serialize-fork
and allowing us to use$merge
refs. * Breaking Change: dependencies --> dependentRequired in schema * Breaking Change: object serialization in schema no longer valid
- Small fix for JWT Decode incompatible change
- Fix for MIME type ordering in renderers.py (differs between cgap and fourfront).
- Merge/unify ingestion and other code from cgap-portal and fourfront.
- Add several modules/commands from upstream portals that are generic enough to live in this repository (to reduce code/library maintenace overhead)
- Port support for
make deploy1
from the portals:- In
Makefile
:- Support for
make deploy1
- Support for
make psql-dev
- Support for
make psql-test
- Support for
make kibana-start
(commented out for now, pending testing) - Support for
make kibana-start-test
(commented out) - Support for
make kibana-stop
(commented out)
- Support for
- In
pyproject.toml
:- Template file
development.ini.template
- Template file
test.ini.template
- Support for
prepare-local-dev
script, which createsdevelopment.ini
fromdevelopment.ini.template
andtest.ini
fromtest.ini.template
.
- Template file
- In
- Port the
dev_servers.py
support from CGAP.- In the
scripts/
dir:
- Add
scripts/psql-start
in support ofmake psql-dev
andmake psql-test
.
- Fix some warnings from
pytest
- If a method has "test" in its name but isn't a test, it needs a prefix "_"
- Fix some warnings from
sqlalchemy
session.connection()
doesn't need to.connect()
.join(x, y, ...)
should be.join(x).join(y)...
session.query(Foo).get(bar)
should besession.get(Foo, bar)
- Redis support, adding /callback info to /auth0_config if a Redis server is configured
- Change
pytest.yield_fixture
topytest.yield
. This is techinically incompatible since it would break downstream portals if they were belowpytest
6, but they are both atpytest 7
now, so they should be unaffected. - Address some places involving
.execute(raw_string)
that should be.execute(text(raw_string))
.
- In
Makefile
:- Make sure
make test
andmake test-full
also runmake test-static
.
- Make sure
- In
snovault/storage.py
:- Add
POSTGRES_COMPATIBLE_MAJOR_VERSIONS
(moved fromsnovault/tests/test_storage.py
)
- Add
- In
snovault/elasticsearch/create_mapping.py
:- Per Will's direction, replace a call to
run_index_data
with avapp
creation and a call to an index post with given uuids.
- Per Will's direction, replace a call to
- In
snovault/elasticsearch/mpindexer.py
:- Very minor syntactic refactor to make a use of
global
more clear.
- Very minor syntactic refactor to make a use of
- In
snovault/tools.py
:- Reimplement
index_n_items_for_testing
for better clarity and to fix a potential bug.
- Reimplement
- In
snovault/tests/test_indexing.py
- Various test optimizations using better synchronization for robustness.
- In
Makefile
:- New
make
targettest-one
. - Separate testing of indexing tests from other unit tests, renaming the "npm" tests to "indexing" tests.
- New
- Make github workflow
main.yml
consistent withMakefile
changes. - In
pyproject.toml
:- Use
pytest 7.2.2
.
- Use
In
Makefile
:- Add
make test-full
to test likemake test
but without theinstafail
option. - Add
make test-static
to run static checks. - Add
make test-one TEST_NAME=<test_name_or_filename_base>
so you can test a single file or test frommake
. This is not so important insnovault
as incgap-portal
but I want the interface to be uniform. - In all testing, added
SQLALCHEMY_WARN_20=1
at start of command line to enable SQLAlchemy 2.0 compatibility warnings, since we're usingSQLAlchemy 1.4
, which has those warnings.
- Add
In
pyproject.toml
: * Requiredcicutils 6,7
for fixes toEventually
.- Include
pipdeptree
as a dev dependency for debugging. - Remove "backports.statistics", needed for Python 3.3 support and earlier.
- Bump python_magic foothold (no effective change, just faster locking)
- Update some comments.
- Include
In
snovault/updater.py
:- Better error message for UUID integrity errors, noting they might not be conflits but just maybe also UUID missing.
- Rearrange imports for clarity.
In new file
snovault/tools.py
:- New functions
make_testapp
,make_htmltestapp
,make_authenticated_testapp
,make_submitter_testapp
,make_indexer_testapp
, andmake_embed_testapp
. - New context managers
being_nested
andlocal_collections
. - New function
index_n_items_for_testing
.
These functions are potentially useful in the portal repos, so are not part of the test files.
- New functions
In file
snovault/tests/serverfixtures.py
:- New fixture
engine
- New fixture
In file
snovault/tests/test_indexing.py
:- Material changes to testing to use better storage synchronization (semaphor-style rather than sleep-style), hopefully achieving fewer intermittent errors in testing both locally and in GA.
- Bug fixes in a few tests that were assigning settings or other dictionary structures but not assuring an undo was done if the test failed.
In files
snovault/util.py
,snovault/tests/test_embedding.py
,snovault/tests/test_storage.py
:- Various changes for PEP8 or other readability reasons, including to satisfy
PyCharm
linters. - Allow Postgres 14 to be used.
- Various changes for PEP8 or other readability reasons, including to satisfy
- In
upgrader.py
, defaultparse_version
argument to'0'
, rather than'1'
whenNone
or the empty string is given. - Remove the Python 3.7 classifier in
pyproject.toml
. - Add
make clear-poetry-cache
inMakefile
. - Misc PEP8.
- Fix C4-984:
- Add
pip install wheel
inmake configure
. - Remove dependency in
pyproject.toml
onfutures
library.
- Add
- Fix C4-985:
- Make a wrapper for
pkg_resources.parse_version
inupgrader.py
that parses the empty string as if'1'
had been supplied.
- Make a wrapper for
- Fix C4-987:
- Use
in str(exc.value)
rather thanin str(exc)
afterwith pytest.raises(....) as exc:
- Use
- Small fix/adjustment to snapshot related error handling when re-mapping
- Supress log errors from skip_indexing
- Suppress errors from SQLAlchemy relationship overlap
- Add reindex_by_type capabilities
- Small changes to indexing tests to speed them up
- Upgrades ElasticSearch to version 7 (OpenSearch 1.3 in production)
- Upgrades SQLAlchemy to 1.4.41 (and other associated versions)
- Adds B-Tree index on max_sid to optimize retrieval of this value in indexing
- Drop support for Python 3.7
- Environment variable NO_SERVER_FIXTURES suppresses creation of server fixtures during testing.
- Miscellaneous PEP8.
- Evaluate KMS args as truthy for blob storage to avoid errors for empty string KMS key
- Add a CHANGELOG.rst file.
- Add tests for consistency of version and changelog.
- Make dev dependency on docutils explicit, adding a constraint that gets rid of a deprecation warning.
PR 225 Genelist upload (C4-875)
Instrumentation added to help debug C4-875.
- Improved error messages for
ValidationFailure
inattachment.py
.
Actual proposed fix:
- In
attachment.py
, replacedmimetypes.guess_type
with new functionguess_mime_type
(adjusting the receipt of return value, since I adjusted that slightly to return the mime type, not a tuple of mime type and encoding). - Make sure that we have useful return values for common file extensions.
Opportunistic:
- Better
.flake8
file excluding a bunch of whitespace-related issues we don't need to care about yet. - Add a lint target to the
Makefile
. - Suppress an annoying warning from the
jose
package (included bymoto 1.3.7
) about how it's not going to work in Python 3.9. - Do keyword-calling of
ValidationFailure
inattachment.py
just to clarify what the weird args are. - Add an extra warning message in
create_mapping.py
for certain unusual argument combinations. (This had come up elsewhere in a discussion I had with Will and was just waiting for a PR to ride in on.)
- Retry delete_index in case of an error, likely related to a snapshot occurring at the same time as the delete operation. Give it two minutes (12 tries) to succeed.
NOTE: The breaking change here is the use of dcicutils 4.x
.
- This accepts
dcicutils 4.0
. - Minor change to
.gitignore
to add.python-cmd
. - Constrains
boto3
,botocore
,boto3-stubs
, andbotocore-stubs
.
PR 222 Invalidation Scope Fix (C4-854)
- Repairs several important cases in invalidation scope by revising the core algorithm,
which is now described in the
filter_invalidation_scope
docstring. - Should work correctly for object fields, links beyond depth
1
and*
. - Other small changes include repairing the test script and allowing indexer worker runs to re-use testapp for 100 iterations (thus preserving cache, probably speeding up indexing and reducing DB load)
PR 221 Remove embeds of unmappable properties
- Here, we remove embeds of properties that cannot be mapped within our system,
namely those that fall under
additionalProperties
orpatternProperties
in our schema. - As far as I understand things, since these fields cannot be mapped, adding them to an item's embedding list
will not work regardless of the changes here, specifically the explicit removal of the properties
from the default embeds in
find_default_embeds_for_schema
. Thus, no properties in the schema defined underadditionalProperties
orpatternProperties
can be embedded or used for invalidation scope with our current set-up, and significant refactoring would be required to make these work.
PR 220 Further upgrader version fix
The recent upgrader fix (in v.5.6.0) added the default version of 1
for upgrader calls,
but not all calls to the upgrader were included in the fix.
Specifically, the upgrader call within resources.py
is still resulting in errors.
We fix that here, as well as the call within the possibly defunct batchupgrade.py
for good measure.
(Grepping snovault
for upgrader.upgrade
didn't reveal any other instances of calls to the upgrader to fix.)
PR 218 Lock 3.8, Repair Upgraders
- Locks Python 3.8, which appears stable with no changes
- Default
current_version
in upgraders to1
instead of''
, so items that do not have a defaultschema_version
will default to a sane value that should hit an upgrade target.
PR 217 Repair mirror health resolution
- Resolve
IDENTITY
so authenticated requests can be made with credentials
- Undo JSON serializer override, falling back to the pyramid default which appears to be ~10x more performant with waitress
PR 214 Type Specific Index Setting
- Implements type specific index settings, documenting the important settings
- Configurable by overriding the
Collection.index_settings
method to return a customsnovault.util.IndexSettings
object
PR 213 Make pillow, wheel, and pyyaml be dev dependencies. If the portals wa...
- Make
pillow
,wheel
, andpyyaml
be dev dependencies. If the portals want them, they can make them be regular dependencies.
PR 212 Fix some dependencies to be a bit more flexible
- Various adjustments in
pyproject.toml
.
PR 211 Python 3.7 compatibility changes (C4-753)
This change intends to let Snovault work in Python 3.7.
- Update
psycopg2
to usepsycopg2-binary
. - Use matrix format testing and adjust the way indices are built in so they include Python version number. Needed to assure proper cleanup, but also to avoid these different processes colliding with one another.
- Adjusted GA testing to use
250
timeout instead of200
.
Opportunistic:
- Phase out use of
TRAVIS_JOB_ID
in favor ofTEST_JOB_ID
. A tiny bit of additional code is retained in casecgap-portal
orfourfront
still use any of this, but none of the calls insnovault
try to useTRAVIS_JOB_ID
any more. - Rename the
travis-test
recipe toremote-test
inMakefile
.
- Implements encryption support for S3BlobStorage
- Adds tests for (encrypted) S3BlobStorage (previously untested) by repurposing and slightly modifying the existing tests for the RDB blob storage
PR 209 Changes to remove variable imports from env_utils (C4-700)
A record of older changes can be found
in GitHub.
To find the specific version numbers, see the version
value in
the poetry.app
section of pyproject.toml
for the corresponding change, as in:
[poetry.app] name = "dcicutils" version = "100.200.300" ...etc.
This would correspond with dcicutils 100.200.300
.