-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove expired ODL items #25
Conversation
d0cfcda
to
06d508a
Compare
api/odl.py
Outdated
response_content = response.content | ||
|
||
if isinstance(response_content, bytes): | ||
response_content = response_content.decode("utf-8") | ||
elif not isinstance(response_content, str): | ||
raise ValueError("Response content must be a string or byte-encoded value") | ||
|
||
feed = feedparser.parse(response_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think feedparser.parse
can handle both bytes
and str
. If so, it may be possible to simplify this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I think I had some issue back in Python 2.x. Python 3.x handles both cases correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good. A couple issues with docstring cut/paste and some minor import order issues for you to have another look at.
api/odl.py
Outdated
licenses_reserved=0, | ||
patrons_in_hold_queue=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any issues here if there are still licenses owned and there were patrons in the hold queue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to leave those values as is. licenses_reserved
is getting updated in ODLAPI.update_hold_queue
each time patrons try to checkout or checkin.
As per removing licenses when there are active loans, TestODLExpiredItemsReaper.test_odl_reaper_removes_expired_licenses
ensures that the reaper can hide licenses of a license pool with an active loan.
# 1.2. Ensure that the license pool was successfully created but it does not have any available licenses. | ||
assert len(imported_pools) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to ensure that the license pool is created in this circumstance?
Or is this more to help document the current behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, LicensePool
is required by CM to function correctly. I just wanted to emphasise that even when CM skips expired licenses, it's still business as usual and it still creates a LicensePool
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a situation like this it's best to either create Edition + LicensePool + Work (with the LicensePool accurately representing the fact that the book is not currently available) or to leave the whole thing out of the database. There are use cases for creating an Edition with no LicensePool or Work, but I think creating all three makes sense in this case.
I don't believe you can have an Edition and a Work with no LicensePool; the Work will get reaped.
63b3926
to
5ebfdd3
Compare
5ebfdd3
to
32100e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reviewed this PR and will be merging it into NYPL-Simplified/circulation, with credit, with few if any changes. I'll put up a link to the corresponding NYPL-Simplified/circulation PR once I create it.
@@ -1518,3 +1539,38 @@ def _get(self, url, patron=None, headers=None, allowed_response_codes=None): | |||
self.request_args.append((patron, headers, allowed_response_codes)) | |||
response = self.responses.pop() | |||
return HTTP._process_response(url, response, allowed_response_codes=allowed_response_codes) | |||
|
|||
|
|||
class ODLExpiredItemsReaper(IdentifierSweepMonitor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reviewing this PR for merging into the NYPL-Simplified/circulation repo. Hopefully I won't find any serious issues and my comments will be purely advisory.
I see what this is doing but the performance could be improved quite a bit. You can override SweepMonitor.item_query
to put additional constraints on the query. The is_expired
implementation can be represented as a set of SQLAlchemy constraints. So you should be able to make this Monitor query only the Identifiers that have a LicensePool that has expired but does not have licenses_owned and licenses_available set appropriately.
class BaseODLTest(object): | ||
base_path = os.path.split(__file__)[0] | ||
resource_path = os.path.join(base_path, "files", "odl") | ||
|
||
@classmethod | ||
def get_data(cls, filename): | ||
path = os.path.join(cls.resource_path, filename) | ||
return open(path, "rb").read() | ||
return open(path, "r").read() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally read test data as binary to eliminate the chance of bytestring/string confusion, since real-world files of this type would come in as bytestrings. For JSON files it's not as much of an issue, but you've got both JSON and XML files in this test suite. Was this causing a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at the core changes (specifically ThePalaceProject/circulation-core#16) I see that this works because the eventual consumer of this data has been changed to handle either bytes or strings. That's a good change but since real data from HTTP responses will come in as bytes I think the test was more realistic before.
# 1.2. Ensure that the license pool was successfully created but it does not have any available licenses. | ||
assert len(imported_pools) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a situation like this it's best to either create Edition + LicensePool + Work (with the LicensePool accurately representing the fact that the book is not currently available) or to leave the whole thing out of the database. There are use cases for creating an Edition with no LicensePool or Work, but I think creating all three makes sense in this case.
I don't believe you can have an Edition and a Work with no LicensePool; the Work will get reaped.
@@ -58,12 +61,17 @@ def _get_delivery_mechanism_by_drm_scheme_and_content_type( | |||
|
|||
return None | |||
|
|||
def sample_opds(self, filename): | |||
def sample_opds(self, filename, file_type="r"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_type isn't used here but as I mentioned before I'd expect sample files to be read in as bytestrings to simulate actual data coming over the wire.
|
||
if licenses_owned != licensepool.licenses_owned or licenses_available != licensepool.licenses_available: | ||
licenses_owned = max(licenses_owned, 0) | ||
licenses_available = max(licenses_available, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about this logic. What happens if a LicensePool has multiple licenses, and one of those licenses has expired but the others have not? I think this is going to clear out an entire LicensePool if only one of the licenses has expired. Instead I'd expect the LicensePool's totals to be recalculated based on the licenses that haven't expired.
Unlike the other stuff I've mentioned on this PR, this seems serious enough that I'd like to hear from @vbessonov and/or @tdilauro before proceeding with NYPL-Simplified/circulation#1636.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong about this code clearing out an entire LicensePool (I was misreading -= 1
as = -1
). However after talking about this we agreed that this isn't accurately doing the math necessary to add up License values to get LicensePool values.
We don't actually know what the math is -- other vendors take care of it for us -- but we have a pretty good idea.
…types (#25) * Add ability to override the list of available for import identifiers
Description
This PR includes the following changes:
ODLExpiredItemsReaper
andODL2ExpiredItemsReaper
classes responsible for clearing expired items on a scheduled basis.NOTE: This PR depends on ThePalaceProject/circulation-core#19.
Motivation and Context
How Has This Been Tested?
Checklist: