Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove expired ODL items #25

Merged
merged 1 commit into from
Aug 25, 2021
Merged

Conversation

vbessonov
Copy link
Contributor

@vbessonov vbessonov commented Aug 16, 2021

Description

This PR includes the following changes:

  • It adds additional logic skipping expired items during initial import process
  • It adds ODLExpiredItemsReaper and ODL2ExpiredItemsReaper classes responsible for clearing expired items on a scheduled basis.

NOTE: This PR depends on ThePalaceProject/circulation-core#19.

Motivation and Context

How Has This Been Tested?

Checklist:

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

Base automatically changed from python3 to main August 19, 2021 20:01
@vbessonov vbessonov force-pushed the feature/odl-dont-show-expired-items branch from d0cfcda to 06d508a Compare August 23, 2021 08:28
@vbessonov vbessonov changed the title [WIP] Remove expired ODL items Remove expired ODL items Aug 23, 2021
api/odl.py Outdated
Comment on lines 1051 to 1056
response_content = response.content

if isinstance(response_content, bytes):
response_content = response_content.decode("utf-8")
elif not isinstance(response_content, str):
raise ValueError("Response content must be a string or byte-encoded value")

feed = feedparser.parse(response_content)
Copy link
Contributor

@tdilauro tdilauro Aug 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think feedparser.parse can handle both bytes and str. If so, it may be possible to simplify this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I think I had some issue back in Python 2.x. Python 3.x handles both cases correctly.

Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good. A couple issues with docstring cut/paste and some minor import order issues for you to have another look at.

api/odl.py Outdated Show resolved Hide resolved
api/odl2.py Outdated Show resolved Hide resolved
bin/odl2_reaper Outdated Show resolved Hide resolved
bin/odl_reaper Outdated Show resolved Hide resolved
api/odl.py Outdated
Comment on lines 1575 to 1573
licenses_reserved=0,
patrons_in_hold_queue=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any issues here if there are still licenses owned and there were patrons in the hold queue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to leave those values as is. licenses_reserved is getting updated in ODLAPI.update_hold_queue each time patrons try to checkout or checkin.

As per removing licenses when there are active loans, TestODLExpiredItemsReaper.test_odl_reaper_removes_expired_licenses ensures that the reaper can hide licenses of a license pool with an active loan.

tests/test_odl.py Show resolved Hide resolved
tests/test_odl2.py Show resolved Hide resolved
Comment on lines +2020 to +2022
# 1.2. Ensure that the license pool was successfully created but it does not have any available licenses.
assert len(imported_pools) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to ensure that the license pool is created in this circumstance?

Or is this more to help document the current behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, LicensePool is required by CM to function correctly. I just wanted to emphasise that even when CM skips expired licenses, it's still business as usual and it still creates a LicensePool object.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a situation like this it's best to either create Edition + LicensePool + Work (with the LicensePool accurately representing the fact that the book is not currently available) or to leave the whole thing out of the database. There are use cases for creating an Edition with no LicensePool or Work, but I think creating all three makes sense in this case.

I don't believe you can have an Edition and a Work with no LicensePool; the Work will get reaped.

tests/test_odl.py Outdated Show resolved Hide resolved
@vbessonov vbessonov force-pushed the feature/odl-dont-show-expired-items branch from 63b3926 to 5ebfdd3 Compare August 24, 2021 16:09
@vbessonov vbessonov force-pushed the feature/odl-dont-show-expired-items branch from 5ebfdd3 to 32100e7 Compare August 24, 2021 16:40
Copy link
Contributor

@tdilauro tdilauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@vbessonov vbessonov merged commit 2a7e758 into main Aug 25, 2021
@vbessonov vbessonov deleted the feature/odl-dont-show-expired-items branch August 25, 2021 14:50
Copy link
Contributor

@leonardr leonardr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this PR and will be merging it into NYPL-Simplified/circulation, with credit, with few if any changes. I'll put up a link to the corresponding NYPL-Simplified/circulation PR once I create it.

@@ -1518,3 +1539,38 @@ def _get(self, url, patron=None, headers=None, allowed_response_codes=None):
self.request_args.append((patron, headers, allowed_response_codes))
response = self.responses.pop()
return HTTP._process_response(url, response, allowed_response_codes=allowed_response_codes)


class ODLExpiredItemsReaper(IdentifierSweepMonitor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reviewing this PR for merging into the NYPL-Simplified/circulation repo. Hopefully I won't find any serious issues and my comments will be purely advisory.

I see what this is doing but the performance could be improved quite a bit. You can override SweepMonitor.item_query to put additional constraints on the query. The is_expired implementation can be represented as a set of SQLAlchemy constraints. So you should be able to make this Monitor query only the Identifiers that have a LicensePool that has expired but does not have licenses_owned and licenses_available set appropriately.

class BaseODLTest(object):
base_path = os.path.split(__file__)[0]
resource_path = os.path.join(base_path, "files", "odl")

@classmethod
def get_data(cls, filename):
path = os.path.join(cls.resource_path, filename)
return open(path, "rb").read()
return open(path, "r").read()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally read test data as binary to eliminate the chance of bytestring/string confusion, since real-world files of this type would come in as bytestrings. For JSON files it's not as much of an issue, but you've got both JSON and XML files in this test suite. Was this causing a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking at the core changes (specifically ThePalaceProject/circulation-core#16) I see that this works because the eventual consumer of this data has been changed to handle either bytes or strings. That's a good change but since real data from HTTP responses will come in as bytes I think the test was more realistic before.

Comment on lines +2020 to +2022
# 1.2. Ensure that the license pool was successfully created but it does not have any available licenses.
assert len(imported_pools) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a situation like this it's best to either create Edition + LicensePool + Work (with the LicensePool accurately representing the fact that the book is not currently available) or to leave the whole thing out of the database. There are use cases for creating an Edition with no LicensePool or Work, but I think creating all three makes sense in this case.

I don't believe you can have an Edition and a Work with no LicensePool; the Work will get reaped.

@@ -58,12 +61,17 @@ def _get_delivery_mechanism_by_drm_scheme_and_content_type(

return None

def sample_opds(self, filename):
def sample_opds(self, filename, file_type="r"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_type isn't used here but as I mentioned before I'd expect sample files to be read in as bytestrings to simulate actual data coming over the wire.


if licenses_owned != licensepool.licenses_owned or licenses_available != licensepool.licenses_available:
licenses_owned = max(licenses_owned, 0)
licenses_available = max(licenses_available, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this logic. What happens if a LicensePool has multiple licenses, and one of those licenses has expired but the others have not? I think this is going to clear out an entire LicensePool if only one of the licenses has expired. Instead I'd expect the LicensePool's totals to be recalculated based on the licenses that haven't expired.

Unlike the other stuff I've mentioned on this PR, this seems serious enough that I'd like to hear from @vbessonov and/or @tdilauro before proceeding with NYPL-Simplified/circulation#1636.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about this code clearing out an entire LicensePool (I was misreading -= 1 as = -1). However after talking about this we agreed that this isn't accurately doing the math necessary to add up License values to get LicensePool values.

We don't actually know what the math is -- other vendors take care of it for us -- but we have a pretty good idea.

jonathangreen referenced this pull request in jonathangreen/circulation Sep 8, 2021
…types (#25)

* Add ability to override the list of available for import identifiers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants