-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exclude company's own projects filter #67
Comments
The idea is pretty interesting. There are also a number of primary questions that arise before the implementation of this idea. Main question: How to identify the repositories in relation to the company (a company's own repository or not)? There is an option to use information about the organization (see OrgId). However, this is connected with the fact that you need to have a list of compliance of the company and the organization that belongs to it. It turns out that it is necessary to create such a list by hand for each company and constantly keep it up to date. And again, there is no certainty that this criterion is 100% valid. Do you have any ideas on this? |
|
I agree that at first glance, maintaining a list of repositories does not differ much from maintaining a list of companies. But the question arises about a significantly larger volume of repositories than companies and about a greater dynamics of the list of repositories than domains.
I didn't quite understand what it meant. Could you explain a little more broadly? You are suggested to think that the company's own repository is those repositories in which commits are only from the company, right? Is this a necessary and/or sufficient condition? |
It could happen that the amount of non-owned repositories that companies are committing to is non-significant.
The repo where all commits are from corporate emails are definitely owned by the company. That's a sufficient condition for a filter. ) |
Sorry for taking a while to respond, I simply don't have enough information on the workflow that OSCI uses (my bad) to elaborate further than what @abitrolly said. I would only ever consider a contribution to be in the company's full self-interest if the contribution landed on a repository that was owned by the company itself. Is this a trivial task? Very unlikely, I think a "repo where all commits are from corporate emails" is too specific of a scenario and wouldn't affect the dataset very much (especially for the top dogs which is where my interest lies the most)... We'd need a way to filter out contributions made from the organisation's own authors into the organisation's own repositories. |
I agree. That would be sufficient. |
... or at least you could start small and list at least the number of the repositories collaborators of the organizations contribute to. |
I suggest the way to move forward on this issue is:
It's important to understand that a perfect algorithm for this does not exist, just different directions to go, each with pros and cons. An empirical approach (if that's the right term) like I suggest above is necessary rather than defining a theoretical approach. Your goal has to be to iterate until you reach a logic which is "good enough" to show a general picture of activity across organizations. This was our experience defining the logic for OSCI itself. What looks easy at a high level gets very challenging when one tries to define the detail and algorithmize it. |
def outside_contributions():
employees_committed
contractors_committed
robots_committed
total_committed
if (total_committed - employees_committed - contractors_committed - robots_committed > 0):
return True |
Let's take company ACME. It creates and runs project X. This project is not under the ACME org on github, so programmatically not directly connectable to the company. The project has 100 contributors, 99 who work at ACME and 1 who is outside (perhaps it is an ex-employee who worked on this before leaving the company and continued after... I have seen such examples). Is this a company project? |
What could be the simplest and probably not the most accurate insight? While getting perfect stats sounds sweet, most likely we will not get there right away. So... what could be done right now to make the index by 1% better? |
Maybe the priority should be to publish the data that could make different kind of filters possible. Right now the site https://opensourceindex.io/ just links to this repo with no diagrams of the DB schema are no information if the Big Query datasets are being public. |
At our company, internally we gather public data on GitHub activity from employees who choose to opt-in regarding their GitHub activity and contributions, with the goal of identifying trends in contributions to projects outside of Microsoft's governance. Our data is skewed differently than this index, however, since we have an internal indicator of who our employees are on GitHub once they opt-in to tell us, vs having to determine it from profiles. Our numbers for December 2021, for example, are significantly higher for 'total community' and other figures as a result of so many people being e-mail private on GitHub... but of that specific month's contributions, I tried pulling equivalent data, and around a third of our actively-open-contributing employees contributed to projects not governed by our company, yielding a number higher than the index but not majorly larger. While the data is interesting, our key reason for differentiating "is it controlled by Microsoft or not" is to help encourage our employees' participation in communities to become eligible in our FOSS Fund and to evolve the culture. I agree slicing off a company's controlled projects is an interesting pivot, but a murky gray area, especially given foundations and cross-industry collaborations and so on. |
I think it would be pertinent to include a filter that excludes contributions to the company's own open source projects.
As much as I enjoy seeing the numbers I feel like it would be amazing to see which companies contribute outside of their own circle of influence the most, this could shift the rankings somewhat and showcase a bit more of the open source community on the top lists.
Throwing this out there as an idea, absolutely understand if this is not relevant to this project but maybe something worth thinking about!
The text was updated successfully, but these errors were encountered: