feat: adds scraper script #59

sshivaditya2019 · 2024-12-08T07:25:05Z

Resolves #56

Scrapes issues based on the username passed in.
Reads the token either as a user input or from the cli.
Updates issues in the repo, even with same node_id exists.
Issue Dedup and Matchmaking Results.

github-actions · 2024-12-08T07:25:52Z

Unused files (1)

src/handlers/issue-scraper.ts

ubiquity-os-beta · 2024-12-10T02:00:56Z

@sshivaditya2019, this task has been idle for a while. Please provide an update.

sshivaditya2019 · 2024-12-12T09:56:03Z

@0x4007, This is the base scraper logic. Should I write a script for adding the issues for all the users mentioned in the auth.users.json ?

0x4007 · 2024-12-12T09:58:15Z

Yes and please update the database with it. You can QA with some task matchmaking scoring improvements and second goal some issue dedupe improvements- right?

sshivaditya2019 · 2024-12-12T21:04:16Z

QA:

After Issue Scraper for Issue Matching

Previously 1%

Previously 1% & 0%

Previously 0%

This will not impact issue deduplication since it is restricted to issues within the same organization and repository.

0x4007 · 2024-12-13T00:29:21Z

Is there some type of bias in the algorithm to make 75% the peak of the bell curve?

talent referrals

I looked through a few of the results and I think above 80% seems actually relevant. What are your thoughts?

It might make sense to exclude showing matches below 80%?

And then always recommend at least two contributors still.

issue deduplication

The markup seems very noisy and also based on my quick look it seems that below 80% seems kind of irrelevant. What are your thoughts on this?

For near term testing purposes I think we should leave on all the markup but I can see us needing to reduce the noisiness and hide anything below a certain threshold, like that 80% again.

sshivaditya2019 · 2024-12-13T00:31:38Z

I looked through a few of the results and I think above 80% seems actually relevant. What are your thoughts?

It might make sense to exclude showing matches below 80%?

I think we should include matches below 80%, as this would allow for a larger pool of contributors. We can always exclude them by removing alwaysRecommend and setting the jobMatchingThreshold to 0.8.

Is there some type of bias in the algorithm to make 75% the peak of the bell curve?

The current similarity search uses a weighted sum of cosine distance (0.8) and L2 distance (0.2). Without this weighting, the results tend to cluster around 90% similarity, that is just using the cosine distance¹. With the weighted sum, they are more likely to cluster around 75%. This helps make the results more varied and accurate.

https://docs.voyageai.com/discuss/660499a8c27dbb000f201a40 ↩

feat: adds scraper script

b53e2d8

sshivaditya added 2 commits December 12, 2024 02:11

fix: adpaters

408ac43

fix: org issue

1f656b7

sshivaditya2019 marked this pull request as ready for review December 12, 2024 21:06

0x4007 requested review from rndquu and whilefoo December 13, 2024 00:23

gentlementlegen mentioned this pull request Dec 13, 2024

Scraper: Populate "Closed As Complete" Issue Specifications #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds scraper script #59

feat: adds scraper script #59

sshivaditya2019 commented Dec 8, 2024 •

edited

Loading

github-actions bot commented Dec 8, 2024

ubiquity-os-beta bot commented Dec 10, 2024

sshivaditya2019 commented Dec 12, 2024

0x4007 commented Dec 12, 2024

sshivaditya2019 commented Dec 12, 2024 •

edited

Loading

0x4007 commented Dec 13, 2024 •

edited

Loading

sshivaditya2019 commented Dec 13, 2024 •

edited

Loading

feat: adds scraper script #59

Are you sure you want to change the base?

feat: adds scraper script #59

Conversation

sshivaditya2019 commented Dec 8, 2024 • edited Loading

github-actions bot commented Dec 8, 2024

Unused files (1)

ubiquity-os-beta bot commented Dec 10, 2024

sshivaditya2019 commented Dec 12, 2024

0x4007 commented Dec 12, 2024

sshivaditya2019 commented Dec 12, 2024 • edited Loading

0x4007 commented Dec 13, 2024 • edited Loading

talent referrals

issue deduplication

sshivaditya2019 commented Dec 13, 2024 • edited Loading

Footnotes

sshivaditya2019 commented Dec 8, 2024 •

edited

Loading

sshivaditya2019 commented Dec 12, 2024 •

edited

Loading

0x4007 commented Dec 13, 2024 •

edited

Loading

sshivaditya2019 commented Dec 13, 2024 •

edited

Loading