Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix number of inferers considered in the reward cycle #691

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

fernandofcampos
Copy link
Contributor

@fernandofcampos fernandofcampos commented Dec 4, 2024

Purpose of Changes and their Description

k.lowestInfererScoreEma was not being updated on a new Epoch. Then, after the 32 slots get filled up, and when the 33rd inference arrives, it was checked against the value in k.lowestInfererScoreEma from previous Epoch.

If the new inferer has a greater score, it gets added and the old one gets removed. But since the old one was from the previous epoch (that is, the worker in the stored Score struct of lowest score), it does not exist anymore, and the function that removes the inferer won't return and error if it does not exist. So we don't remove any inferer, and we add a new one. That's how we end up with 33.

This PR effectively resets that stored lowest score (stored by topic) upon the start of a new worker submission window. Therefore, whenever one attempts to move a worker out of the active set, we can be assured the worker is actually a part of the topic's current active set.

Link(s) to Ticket(s) or Issue(s) resolved by this PR

https://linear.app/alloralabs/issue/PROTO-2851/fix-number-of-inferers-considered-in-the-reward-cycle

Are these changes tested and documented?

  • If tested, please describe how. If not, why tests are not needed.
    • Unit tests provided
  • If documented, please describe where. If not, describe why docs are not needed.
    • Evident in the code and its comments
  • Added to Unreleased section of CHANGELOG.md?
    • Added to stanza of next release

Still Left Todo

Fill this out if this is a Draft PR so others can help.

Copy link
Contributor

@xmariachi xmariachi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@guilherme-brandao guilherme-brandao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated comment below

Copy link
Contributor

@guilherme-brandao guilherme-brandao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, I'm almost sure we need to also apply this to forecasters and reputers as well, since they follow the same logic. Can you double-check?

Comment on lines +1115 to 1119
// Get active inferers for topic
workerAddresses, err := k.GetActiveInferersForTopic(ctx, topic.Id)
if err != nil {
return errorsmod.Wrap(err, "error getting active inferers for topic")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid potential unnecessary queries, I'd move this below GetLowestInfererScoreEma since it's just used after getting the lowest score.

Comment on lines +1176 to +1184
// Check if the inferer with lowest score is active before removing it, because remove will not fail if the inferer is not active
isActive, err := k.IsActiveInferer(ctx, topic.Id, lowestEmaScore.Address)
if err != nil {
return errorsmod.Wrap(err, "error checking if inferer is active")
}
if !isActive {
return errors.New("inferer with lowest score is not active")
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it should be handled this way. If we are trying to remove an active inferer that doesn't exist, it's a bug on our end. Maybe it's just a sanity check to avoid disrupting the reward flow, but if this happens, we should be made aware somehow. Lmk your thoughts.

Copy link
Collaborator

@kpeluso kpeluso Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an extra check indeed. "Defense in depth" applies here imo. Could def argue it's excessive, or warranted. imo it's good to have, much like our other excessive checks

Comment on lines +1136 to 1143
// If there are no lowest inferer score ema, it means it is the first inference for the topic
if !found {
lowestEmaScore = previousEmaScore
err = k.SetLowestInfererScoreEma(ctx, topic.Id, lowestEmaScore)
if err != nil {
return errorsmod.Wrap(err, "error setting lowest inferer score ema")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be removed; this condition is already checked in the code below.

It will avoid double persistence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please double-check if we still need UpdateLowestScoreFromInfererAddresses in this new implementation, I can see some cases that we persist the same lowest score 3 times.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "not found" check applies for the very first time the topic begins an epoch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still do need UpdateLowestScoreFromInfererAddresses I believe because if we remove an actor, it may be any actor => we don't know the new lowest actor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "not found" check applies for the very first time the topic begins an epoch

the condition below is already covering this case: if uint64(len(workerAddresses)) == 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants