Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GitHub users in orgs command #105

Open
marnovo opened this issue Jun 26, 2019 · 7 comments
Open

Support GitHub users in orgs command #105

marnovo opened this issue Jun 26, 2019 · 7 comments
Labels
enhancement New feature or request triage/needs-product-input This needs input from product

Comments

@marnovo
Copy link
Member

marnovo commented Jun 26, 2019

Right now the orgs command seems to only support, well, Github orgs, but definitely a common (given most people don't own orgs) and interesting (given you might want to check your Github "profile" in-depth) use case is to try it on your own Github user. Is it easy enough to extend orgs to also cover individual users?

@marnovo marnovo added enhancement New feature or request help wanted Extra attention is needed labels Jun 26, 2019
@dpordomingo
Copy link
Contributor

About being possible, I'd say yes;
I'd maybe change "org" by "owner", being able to be either a "user" or an "org"; doing so we would also avoid problems if the user becomes an org at any point.

But:
with "org", we fetch metadata from its members.
with "user", we won't fetch that metadata.

But I'm not sure what's the purpose of getting the org members.
If the purpose is to assign the activity in the repos, to its members, then there will be some activity that won't be assigned (because it will belong to gh users that won't be members of that org, so they won't be imported; example: one issue opened in bblfsh by a non bblfsh member, won't be assigned to any user in our DB)

If we need to get the info about all the users contributing in a repo (like the example above), we should also fetch:

  1. all gh users having activity in that repo, and not being members of that repo org,
  2. try to find gh users from repo commits (to be able to assign commits to users, not only gh activity).

If we import also repos from users, as suggested by this issue, the activity in their repos won't be assigned to another user than the imported user, unles we also do (1) and (2).

@smacker
Copy link
Contributor

smacker commented Jun 27, 2019

@marnovo even technically it's not that different from org but the results might be very unexpected for users and we should do something about it. Problems I see:

  • half (or more) of the repos I have and any other dev in src-d are forks. Similar happens for external devs. The problem with forks: nobody updates master. Most of our charts rely on the HEAD so repos would produce results only to the moment when they were forked
  • there are no issues or pull requests in forks, all metadata charts will become useless

As a solution for user command, I would propose to resolve forks and download code/metadata for the original repo. Even in some cases (example) it would make more sense to download the fork, but such cases are exceptions.

@dpordomingo
Copy link
Contributor

dpordomingo commented Jun 27, 2019

I wouldn't do it automatically but maybe with options: --use-parent, to use the parent repo instead, or --add-parent to fetch both: original, and parent; or even fully ignore forks with --no-forks as requested by @warenlg at #109
Or also --exclude and pass a list of repos to be ignored (in case of repos causing konwn fails, o whatever other reasons)
This way everything would be more explicit, what I think would be better, and more flexible.

@se7entyse7en
Copy link
Contributor

I'd love to have this feature, and I also think that it would increase a lot the chance of being tried by people.

BTW regarding forks I agree that there could be different needs depending on the user. But in general I think that it's either --ignore-forks or not. If the user is interested in resolving forks with original repo then maybe it's more straightforward to just initialize sourced-ce with the owner (whether it is an org or a user) of that original repo and maybe provide some filtering capabilities such as init orgs apache --repositories=incubator-superset.

Also because the repositories that are most likely to be forked are popular ones, and including popular repos together with mine, I think that it will just hide a lot of insights as it will add a lot of noise.

@smacker
Copy link
Contributor

smacker commented Jun 27, 2019

Agree with Marvin for most of the points. Though I would want to remind that not everybody (I don't have numbers but most probably it's a majority of github users) don't have real repositories that aren't forks and aren't dump of some code (for a school or workshop or something like that). So analyzing only the profile doesn't make sense for them at all. Exploring the information about repositories they contributed to, on another hand, can be interesting.

@se7entyse7en
Copy link
Contributor

Though I would want to remind that not everybody (I don't have numbers but most probably it's a majority of github users) don't have real repositories that aren't forks and aren't dump of some code (for a school or workshop or something like that).

I don't know whether is the majority of the users, but you're absolutely right about this type of users, I didn't think about it. I'm just wondering how this type of users is likely to use a tool like this for their forked repos, but this is a different point.

@marnovo
Copy link
Member Author

marnovo commented Jun 27, 2019

All very good points. Effectively the underlying use case and technical questions for personal users may be quite different from orgs in the end vs. just a matter of conforming to the API…

@se7entyse7en se7entyse7en added triage/needs-product-input This needs input from product and removed help wanted Extra attention is needed labels Oct 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage/needs-product-input This needs input from product
Projects
None yet
Development

No branches or pull requests

4 participants