-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indexing context: make it easier to log and report on indexing status #405
base: master
Are you sure you want to change the base?
Conversation
ee215ba
to
b4ec7b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wild!
I think we should probably test this on the same version of Perl pause is running on to be sure we haven't introduced more incompatibilities (like indented heredocs).
I mostly understand it but it's a lot to reason about.
I think this commit message needs rewording:
Code that wants to abort can $ctx->abort_indexing, which adds a
dist_error uses an exception to stop processing the dist further.
"which adds a dist_error uses an exception" I can't seem to parse
And the final commit regarding attempting to avoid partial indexing doesn't quite make sense. I'm reasonably sure that already happens inside of a transaction.
do_the_database work starts a transaction then does:
- examine_pms->index_by_files->examine_fio->examine_pkg->give_regdowner_perms
Or - examine_pms->index_by_meta->examine_pkg->give_regdowner_perms
while in the transaction.
A bunch of code can also be removed if you wish like things referencing INDEX_WARNINGS and has_index_warnings... etc |
I reworded the awkward commit. I also removed INDEX_WARNING stuff, which led me to find that the prelude-to-warnings text was not being included. |
af03add
to
ba36b83
Compare
5c2f1b0
to
39e973c
Compare
@wolfsage As discussed today, I think this is reviewable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checks clock 2h6m. whew.
There's.... a lot here. I've only looked at the code, not played with it. I've left a number of questions.
I think there's at least one legitimate bug which implies more tests could be written, and some code that was either lost in a rebase, or didn't make sense to keep in (lc_package in an UPDATE where it's probably not needed).
I question whether we've introduced mem leaks by not deleting $SELF->{DIO}
or FIO everywhere like we used to.
I'd like to play with this and test it on a running inabox, but I cannot do that tonight. Wednesay perhaps if you want to wait more.
Otherwise, seems like good improvements, for the bits I could understand the changes without my brain melting
lib/PAUSE/Indexer/Context.pm
Outdated
sub add_dist_error { | ||
my ($self, $error) = @_; | ||
|
||
$error = ref $error ? $error : { ident => $error, message => $error }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later code does:
my @dist_errors = $ctx->dist_errors;
for my $error (@dist_errors) {
my $header = $error->{header};
my $body = $error->{body};
But if you say add_dist_error("foobar")
, there will be no header or body. We should probably just drop the string support and always expect the object/hashref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're confused, but maybe I'm confused.
The ternary expression says "if $error was a reference, leave it one. otherwise, turn it into one where it's both ident and message are the string that was $error". So, whether you pass a string or reference to add_dist_error
, you get a reference stored in all_dist_errors
. The "later code" you cite will get those references and treat them as references.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't have to do with whether or not later code expects references, it's what that later code expects the references to contain.
As I said, later code wants a header
and a body
key, and doesn't care at all about ident
or message
. Here's the full snippet of what later code wants:
my @dist_errors = $ctx->all_dist_errors;
for my $error (@dist_errors) {
my $header = $error->{header};
my $body = $error->{body};
$body = $body->($self) if ref $body;
push @m, "## $header\n\n";
push @m, $tf->format($body), qq{\n\n};
$status_over_all = "Failed";
}
If add_dist_error is called with a string, and we ever somehow do a mail summary send, it seems to me we'll push:
## \n\n
(for header and body respectively, as they'll be undef)
and generate two warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatedly, if we ever do send a public summary, it seems like we don't attempt to filter out non-public errors! We should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've fixed this.
@@ -257,21 +241,13 @@ sub examine_pkg { | |||
|
|||
# should they be cought earlier? Maybe. | |||
# but as an ultimate sanity check suggested by Richard Soderberg | |||
if ($self->_pkg_name_insane) { | |||
$Logger->log("package[$package] name seems illegal"); | |||
delete $self->{FIO}; # circular reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lost a lot of circular reference cleanup in this PR. Is this going to cause leaks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but it almost doesn't matter: mldistwatch does not make a lot of these, and it doesn't run persistently. Still I'll give it a think.
another) workaround against "Safe" limitations.)}, | ||
|
||
); | ||
# TODO: get $err->{openerr} back in here, I guess? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this todo something to worry about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to leave like this, but … I dug into this whole hunk of code and HOO BOY. This is a whole 'nother project for a whole 'nother time.
delete $self->{FIO}; # circular reference | ||
return; | ||
|
||
# TODO: get $err->{line} back in here, I guess? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this todo something to worry about?
This is largley to make the subroutine easier to think about.
The indexer, here, kept doing this: if ($ok) { ... } if ($ok) { ... } ...and any given block might set $ok to 0, but never return early. That meant that it was hard to know whether $ok might get set back later. It never would. Instead, the code now returns early, because $ok can only become true early on, and from then on is really a way of saying "abort, but keep running through code anyway". Better to just return.
This adds the code needed to have the context object track package indexing status. It isn't used yet, but it shows that we need to rename distribution-level errors, so they're unambiguous, and then does that!
Here, we replace all the package-level errors, logged with ->index_status, with ->abort_indexing_package. This lets us put all the error status text in one place and ensure it exists. It also means we can stop worrying about the flow control of package indexing: throw to stop.
In the previous commit, we'd replaced all use of ->index_status to set the current status, and instead used the context methods to record package status. Still, though, those methods set *both* the new status and the old status, and we only read things from the old status. In this commit, we switch to only reading things from the new status, and delete the attributes used to store the old status.
That way, your error can say "this was in previous file X" and not just "in a previous file, not named here".
It just looks better!
I had commented out some code so that we would only check in a package update if we were in an OK state. That is no longer needed, because if we are not in an OK state, we will have thrown an exception. Removing the commented-out code just makes things easier to skim.
We pass "package" and not "pack".
The old code would promote a string to a hashref, but it did it badly. Even when done correctly, the code in abort_indexing_dist would look for $error->{message}, but when passing a DISTERROR result, there was none. This was just general confusion. Now, ->abort_indexing_dist, and thus ->add_dist_error, expect a hashref with, at minimum, a "header" value.
Now that abort_indexing_dist must be passed a hashref, fix all the calling code to do so. This has the side-effect of making all these errors public. That's okay because: 1. we were ignoring "public" anyway! 2. they're all reasonable to show the user
This case doesn't come up yet, and wasn't handled properly. This code will better handle private errors, and also errors with a header and no body.
I've added commits from PAUSE::Indexer::Errors: fix keys from $old to PAUSE::dist: do not include private errors in report which I think address the last(???) outstanding thing. I'm going to look into coverage a bit more. |
This makes it easier to see that subs in this file are not part of the top-level package.
Right now, this has no effect, but it will give us a place to put extra diagnostic tooling for collecting mail from test runs.
...then I used this to get all the emails produced by the mldistwatch tests to inspect in my MUA!
I added code to let me get an mbox of all mail sent during testing, which I then inspected in my MUA. Looks pretty good. |
I added just a bit more test coverage for a notable but never-called-during-tests method. |
This branch provides a common interface (the context object) for collecting and reporting on errors or warnings found during indexing. It also replaces the "set skip or return early" logic with an exception system, to allow deep-down checks to declare that a bit of indexing can't succeed.
I have compared the messages generated by test runs, and believe that while there are some changes, they are either neutral or improvements.
Here's more detail:
index_packages
andmaybe_index_dist
methods have been updated to catch abort exceptions and handle them.