Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull Precheck #45

Open
0x4007 opened this issue Sep 14, 2024 · 63 comments · May be fixed by ubiquity-os-marketplace/daemon-pull-review#1
Open

Pull Precheck #45

0x4007 opened this issue Sep 14, 2024 · 63 comments · May be fixed by ubiquity-os-marketplace/daemon-pull-review#1

Comments

@0x4007
Copy link
Member

0x4007 commented Sep 14, 2024

We have waves of contributors that open pulls sometimes keeping our review team busy.

We can save valuable man hours by having the bot preemptively check if the pull achieves the specification.

Pull Flow

  1. Assuming that the contributor follows directions, the pull must be initially opened up as a draft.
  2. When it is ready for review, they should turn it into a finalized pull request.
  3. The bot should consume the issue specification and then the pull diff into its context.
  4. The bot should return actionable feedback for what is missing from the specification. It should leave the review state as requested changes. It should also convert the pull back into a draft. However if it passes, it should leave as just "commented" (not approval).
  5. If a collaborator (reviewer on the team) converts the pull back from a draft into a finalized pull, then the bot should back off and stop intervening. Basically the inspection should only occur on pull.created and when it's converted from draft to finalized by the pull author.

Bonus but maybe can be handled in other tasks:

  1. Ensure CI is passing. I didn't include this here because sometimes there are problems out of the control of the pull author.
  2. Limits: some novice contributors might abuse the code review feature and burn out credits by continuously requesting reviews every time they make a tiny change that doesn't achieve the specification. We have seen folks do hundreds of commits for tasks that required a small amount of lines of code. We can limit the reviews to one per day for the LLM. However it isn't clear to me how we can automatically have it check again the next day if they request a review. Maybe that can be handled by some later task.

We should use Claude 3.5 Sonnet.

@ubadineke
Copy link

/start

Copy link

ubiquity-os bot commented Sep 18, 2024

! Please set your wallet address with the /wallet command first and try again.

Copy link

ubiquity-os bot commented Sep 18, 2024

! No wallet address found

@ubadineke
Copy link

/wallet 0x3Ea855E4D6440D937117c776501e7653a770b759

Copy link

ubiquity-os bot commented Sep 18, 2024

+ Successfully registered wallet address

@ubadineke
Copy link

/start

Copy link

ubiquity-os bot commented Sep 18, 2024

DeadlineWed, Oct 2, 9:41 PM UTC
Beneficiary 0x3Ea855E4D6440D937117c776501e7653a770b759
Tips:
  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

@ubadineke
Copy link

@Keyrxng should I implement this feature in a new file? and also what should happen if the pull is not initially opened as a draft?

@Keyrxng
Copy link
Member

Keyrxng commented Sep 19, 2024

@0x4007 I had gpt re-rewrite for brevity but it removed what I'd call context but this still makes sense.

Discussion:

Spec consideration > Embedding effectiveness > Bot intent & current model

https://chatgpt.com/share/66ec086f-7d70-8000-a914-991b77a819b0

Conclusion:
The current model (embedding-based) struggles to provide detailed, code-specific feedback. While embeddings are useful for high-level semantic comparisons, they lack the precision needed for effective PR reviews. You'll need additional tools to meet your goal.


Task Understanding:

  • Capture PR diffs (optimize by excluding files like yarn.lock)
  • Compare diff embedding with task spec embedding to identify issues.
  • If PR fails, convert it to draft and provide actionable feedback.
  • Stop intervention when a reviewer finalizes the draft.

This is very similar to /review, and separately, /gpt currently has the ability to process full PR diffs, spec, and conversation per query across multiple issues/repos. However, embeddings aren't ideal for this. Raw text analysis (task spec + diff) is more effective since we need precise feedback, not high-level similarity or semantic meaning.

The plugin's effectiveness must be high to avoid becoming annoying or useless. Therefore, it makes sense to re-spec or close this idea since embeddings aren’t suitable.


To be clear, the intention of this spec was to automate the initial review process and is feasible but better done without embeddings in my opinion. If the /review approach is being replaced by this then I think the following workflow works best:

Suggested Workflow:

  • Plugin triggers on pull_request.created (can run a review here or not and on whether it's opened as a draft or ready_for_review)
  • Plugin triggers on pull_request.ready_for_review.
  • Use the task spec and sanitized PR diff to prompt GPT for code review.
  • Post review comment and convert PR to draft if GPT identifies issues; add commented status otherwise.
  • Define a consistent comment format for GPT reviews, since all GPT tasks I've handled had pretty specific input/output formats/prompt structures introduced.
  • If a collaborator (not the assignee) finalizes the PR, stop intervening.
  • Limit bot reviews to once every 24 hours by detecting if a previous AI review comment exists or a timeline review event does.

Additionally the task is way overpriced, ubiquity/ubiquibot#746 is the /review command I authored and was priced at $150 and the /gpt command is priced at $200 lmaoo, and I spent days/weeks on these. $400 max, we are simply feeding two bodies of text to GPT with a custom prompt to have it perform a review with a couple of conditionals afterwards to add a comment and update a review status.


should I implement this feature in a new file? and also what should happen if the pull is not initially opened as a draft?

@ubadineke I'm unsure personally at this point as I did not write this task specification. I have left comments and once @0x4007 replies I'll have a much better idea of how this task should be implemented.

  1. Yes this feature will span multiple new files, try to keep things clean and tidy.
  2. Personally I'd account for this and just run a review on pull_request.created which will allow it through or kick it back into draft just like 0x4007 said. Future reviews would be tied to pull_request.ready_for_review distanced by at least 24hrs between each using the timeline or review comments.

@ubadineke
Copy link

ubadineke commented Sep 19, 2024

@Keyrxng @0x4007 What if the PR is created and the related issue is not immediately mentioned in the body. We just move it back to draft right and tell the user to edit the PR and specify issue?

The existing event payload style doesn't have a field for PR Number, maybe it was just designed for comments. Can modifications be made?
If the issue must be specified from the payload then the first paragraph in my comment is redundant.

@Keyrxng
Copy link
Member

Keyrxng commented Sep 19, 2024

I guess that's a good idea as we like to strongly enforce the linked issue although it's not a guarantee each PR has or needs this such as quick fixes for example.

I'd say don't do anything, our pull request template enforces this format and legit cases exist where it is not used.


The existing event payload style doesn't have a field for PR Number, maybe it was just designed for comments. Can modifications be made?
If the issue must be specified from the payload then the first paragraph in my comment is redundant.

I'm not sure I understand but issues and PRs are both considered as an issue by the GitHub api, there is no PR number only issue.number. You can get linked issues via the GraphQL API most reliably.

If a PR does not have an associated task specification then without having codebase embeddings to compare the diff against we don't have a lot of context other than what they have changed via the diff. Which can still be reviewed by GPT on it's own for logic errors and optimizations it'll just be more general review without the guard rail of the spec.

@0x4007
Copy link
Member Author

0x4007 commented Sep 20, 2024

I think it would be more appropriate for you to extend your /gpt command logic. You can handle it in a separate pull. Over on your old pull I said to merge and let's test in production. From there you can extend the logic as part of this task.

This task makes it more seamlessly integrated (but looks like your previous work set a lot of relevant ground work)

This requires more deeper integration into GitHub review workflow such as by setting review states and switching between draft and non draft etc.

I have extra time because the prompt engineering alone I'm sure will take a couple days to get the results we want.

Yes let's keep this plaintext no embeddings needed.

@0x4007 0x4007 assigned Keyrxng and unassigned ubadineke Sep 20, 2024
Copy link

ubiquity-os bot commented Sep 20, 2024

@Keyrxng the deadline is at Fri, Oct 4, 2:47 AM UTC

@0x4007 0x4007 transferred this issue from ubiquity-os-marketplace/text-vector-embeddings Sep 20, 2024
@Keyrxng
Copy link
Member

Keyrxng commented Nov 25, 2024

@Keyrxng is there absolutely no way to test with an openai api key? Mocking endpoints by using some other llm model

It's impossible to build this AI feature without interacting with an OpenAI style API that has tools/functions logic as that's what this feature depends on. If another provider like OpenRouter is what you mean then yes you can use those.

@henalolp
Copy link

/help

@ishowvel
Copy link

/start

@ishowvel
Copy link

/help

@ishowvel
Copy link

/start

@henalolp
Copy link

henalolp commented Dec 2, 2024

/help

@henalolp
Copy link

henalolp commented Dec 2, 2024

/rewrite

@ishowvel
Copy link

ishowvel commented Dec 3, 2024

/start

Copy link

Warning! This task was created over 79 days ago. Please confirm that this issue specification is accurate before starting.
Deadline Tue, Dec 10, 12:34 PM UTC
Beneficiary 0x340D8d2bd82dEb4f485623453c9F6ad307e6B027

Tip

  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

Copy link

@ishowvel, this task has been idle for a while. Please provide an update.

@ishowvel
Copy link

ishowvel commented Dec 5, 2024

Still working on the pr almost done

@ishowvel
Copy link

ishowvel commented Dec 6, 2024

can anyone please explain this error, i was setting up my local commits for testing

[
  {
    type: 45,
    schema: { anyOf: [Array], [Symbol(TypeBox.Kind)]: 'Union' },
    path: '/command',
    value: undefined,
    message: 'Expected required property',
    errors: []
  },
  {
    type: 62,
    schema: { anyOf: [Array], [Symbol(TypeBox.Kind)]: 'Union' },
    path: '/command',
    value: undefined,
    message: 'Expected union value',
    errors: [ [ValueErrorIterator], [ValueErrorIterator] ]
  }
] { depth: null }

@0x4007
Copy link
Member Author

0x4007 commented Dec 6, 2024

This looks like it's related to the manifest but @whilefoo and @gentlementlegen may know best.

I also want to highlight that I changed the spec. We should use Claude 3.5 Sonnet instead of ChatGPT

@gentlementlegen
Copy link
Member

You need to give a command: null at least in your payload to make the network call correct (I do not know how you're testing it).

@ishowvel
Copy link

ishowvel commented Dec 7, 2024

i am running the worker locally to test

from the worker
[wrangler:inf] POST / 400 Bad Request (13ms)

from the kernel

✘ [ERROR] An error occurred while processing the plugin chain, will skip plugin "http://localhost:4003" SyntaxError: Unexpected token 'I', "Invalid signature" is not valid JSON

      at async handleEvent
  (file:///home/aj/ubiquity-os-kernel/src/github/handlers/index.ts:104:9)
      at null.<anonymous> (async
  file:///home/aj/ubiquity-os-kernel/.wrangler/tmp/dev-uFF4dq/cloudflare-worker.js:38383:7)
      at async Promise.all (index 2)

@gentlementlegen
Copy link
Member

You need to either provide the good set of public and private keys or disable the signature checks on your local setup.

@ishowvel
Copy link

ishowvel commented Dec 7, 2024

Done

  • subscrining to pull request events
  • parsing pr to get issue data
  • consuming pull diff and issue spec
  • prompting ai

TODO

  • Use result to make review and convert to draft if necessary

@ishowvel
Copy link

ishowvel commented Dec 7, 2024

@gentlementlegen
image
image
may you know why the events arent being delievered?

@gentlementlegen
Copy link
Member

Sorry it's a big of a vague question. What kind of event? Not delivered you mean the kernel not sending them or the plugin not picking them up?

@ishowvel
Copy link

ishowvel commented Dec 8, 2024

what should be done for when claude returns a invalid json output, ie "lorem ipsum" or '{"lorem":1}'
the expected output should have been '{"confidenceThreshold":1}'

for now ill just throw an error which says Invalid json object found, Aborting

@gentlementlegen
Copy link
Member

That is the current behavior we have when gpt doesn't return proper json.

@ishowvel
Copy link

ishowvel commented Dec 8, 2024

Copy link

@ishowvel, this task has been idle for a while. Please provide an update.

@gentlementlegen
Copy link
Member

@ishowvel you got a comment on your PR.

@ishowvel
Copy link

Yeah I just realised 😓

@ishowvel
Copy link

@0x4007 how do i submit a whole repository for review?
the repository is https://github.com/ishowvel/pull-precheck
here is the status of the tests https://github.com/ishowvel/pull-precheck/actions/runs/12315354584
here is the qa
ishowvel-org/.ubiquity-os#57
ishowvel-org/.ubiquity-os#54

@0x4007
Copy link
Member Author

0x4007 commented Dec 13, 2024

Open a pull against https://github.com/ubiquity-os-marketplace/daemon-pull-review and resolve this issue, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment