-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sha256 uniqueness to CollectionVersion #1815
Conversation
e97a863
to
775fb7c
Compare
pulp_ansible/tests/functional/api/role/test_crud_distribution.py
Outdated
Show resolved
Hide resolved
03bef99
to
baf54a1
Compare
try: | ||
await sync_to_async(artifact.save)() | ||
except IntegrityError: | ||
artifact = Artifact.objects.get(sha256=artifact.sha256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line would never have worked in an async context.
25ac028
to
d4e93f5
Compare
d4e93f5
to
d2949b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is based on my original work on this, but there was discussion back then on how the sha256 for the Collection should be the digest of the MANIFEST.json
file inside the tarball. I actually never got around to finishing all of the changes to my PR, so most of my comments here is me pointing out all the areas I didn't change it.
pulp_ansible/app/serializers.py
Outdated
) | ||
|
||
return col | ||
return CollectionVersion.objects.filter(sha256=validated_data["sha256"]).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is so much more simple.
# There was an autogenerated validator rendering sha256 required. | ||
validators = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually a really annoying feature of DRF. It tripped me up a lot when I was originally adding domains.
except IntegrityError: | ||
artifact = Artifact.objects.get(sha256=artifact.sha256) | ||
if existing_artifact := await Artifact.objects.filter(sha256=artifact.sha256).afirst(): | ||
existing_artifact.touch() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be wrapped in a sync_to_async
or do we have a atouch
?
pulp_ansible/app/migrations/0057_collectionversion_sha256_migrate.py
Outdated
Show resolved
Hide resolved
for key in ["description", "documentation", "homepage", "issues", "repository"]: | ||
if collection_info[key] is None: | ||
collection_info.pop(key) | ||
collection, created = Collection.objects.get_or_create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a database operation? This breaks the comment about metadata_only=True
.
if artifact_file_name := d_artifact_files.get(d_artifact): | ||
artifact_file = open(artifact_file_name, mode="rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this logic. Can you help explain it? Edit Looking at the original PR I was the one who wrote this... Well present me can't remember what past me intended with this logic.
The reason raised for this idea was: A user can rebuild the collection and reupload it. After that they may be confused to find two content units with the same (namespace, name, version) but different artifacts. The whole idea of this pr is to allow having different artifacts claiming the same (namespace, name, version). I as a user would on the other hand be most suspicious if i built a collection, uploaded it downloaded it again and then got a different artifact. |
80667d2
to
b09eed7
Compare
dcd8b05
to
2733da5
Compare
d857fe1
to
b4ded81
Compare
b4ded81
to
1149713
Compare
70cafff
to
27c3a55
Compare
27c3a55
to
7ef8820
Compare
7ef8820
to
c44d205
Compare
7aead6d
to
834a3bf
Compare
10cfcb0
to
5fb7a0e
Compare
5fb7a0e
to
845c7e6
Compare
52973f6
to
58f25f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small suggestions, everything else looks good.
if count == 0: | ||
return | ||
|
||
for unit in unit_qs.iterator(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add theprefetch_related
here to reduce the number of queries being made?
relative_path=collection_version.relative_path, | ||
) | ||
collection_version, tags = await sync_to_async(create_collection_from_importer)(metadata) | ||
await _save_collection_version(collection_version, artifact, tags) | ||
except ValidationError as e: | ||
if "unique" in str(e): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update the CollectionVersion.get below on line 148 to include the sha256, also can make it use aget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think this is the place where we still have the old behaviour, and we should not change that.
Remember this is not fixing the issue, but preparing the database to fix the issue with proper lookup later.
collection_versions.append(cv) | ||
|
||
if build_artifacts: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_artifacts
is no longer being used in this method, can we remove it from the signature?
This should be ZDU compatible. The transition will be completed with the next y-release. This commit only adds the field an starts to populate it. It comes with a data repair command to help transition to an actual uniqueness constraint in the next release. relates: pulp#1052 Co-authored-by: Matthias Dellweg <[email protected]>
58f25f4
to
566ebbf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hurray! One step closer.
No description provided.