-
-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce addAlias method. #833
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #833 +/- ##
==========================================
+ Coverage 57.57% 57.60% +0.02%
==========================================
Files 99 99
Lines 4542 4590 +48
Branches 1909 1922 +13
==========================================
+ Hits 2615 2644 +29
- Misses 668 677 +9
- Partials 1259 1269 +10 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current implementation information about presence of clones in nowhere recorded.
Have conscious decisions been made (and properly implemented) regarding the following questions:
- How should clones be accounted for in media type counts?
- How should full text indexing on cloned entries work? If FT indexing associates a document only with the entry added via the
zim::Creator::addItem()
method, then the cloned dirents better be named aliases because of the resulting asymmetry. - Shouldn't it be possible to iterate over items while skipping cloned entries (i.e. having each data item returned only once)?
- Shouldn't it be possible to find out all the clone-siblings of a given dirent?
It is somehow intended. In fact nothing in the specification (from the beginning) prevent two dirent to point to the same data. It is just that no implementation allow that. Have conscious decisions been made (and properly implemented) regarding the following questions:
Indeed. For now, clones are handled "as redirect":
This is mainly a technical limitation as we don't have access to the content (only the cluster/blob index) when we add the clone.
I'm not sure we need that. The only need I see for now is in zimcheck but I handle that there. This new feature may evolve and be used for something not foreseen. But for now, the use case is to "reference" binary content (as youtube video) under different names for zimit files.
This a good point. I agree with you. Maybe alias is a better name. I have named it with a generic name as it is technically a generic feature but the intended use case is pretty limited:
So I don't think we need to do more, at least for now. If other use cases appear, we may extend our API accordingly. |
f111b5f
to
4d80e13
Compare
Recovering information about a dirent being an alias requires extra effort. Can't we preserve it easily? For example is |
Dirent::parameter is indeed not used for now. But is our only way to extend a dirent without changing the zim format so if we start to use it, we have to define a way to use it for several things. If we use it "blindly" to store "isClone" we lost a way to store other values. The idea of this change was it was "creator" only and do not change anything on reading part. |
@mgautierfr Status not clear to me, seems you are over your side but new review has not been requested. |
4d80e13
to
fa724ba
Compare
fa724ba
to
fcc3eec
Compare
PR updated and ready for another review. What is left open is the question about extending the format to use I would suggest that we go with this PR and discuss the extension of the format in a issue. As long as I have not finished kiwix/overview#95, there is no need to merge it. So maybe we can simply continue the review process to agree with @veloman-yunkan. |
I'm not convince about the necessity to do that in general, in right now in particular, I would recommend to open a dedicated ticket to keep track of this idea. But it is pretty urgent to move forward with the code review process! That said, we should be able to identify clearly a ZIM file which clearly potentialy have such clones (even if this was allowed implicitly in the past, it will be from now explicitly) I would recommend to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Fixup commits must be eliminated.
This allow user to add a new entry which is a `Clone` of a previously added one. The new entry is the "same" that the original one but for the path and title.
This is use in zim-check to not count clone entries as duplicated entries.
fcc3eec
to
9a59af5
Compare
This is the first PR for kiwix/overview#95
Fix #824