-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phosphor-networkd repo refactor has broken many basic usecases #25
Comments
I don't really see a "desired action" out of this. Is there something you had in mind for us to take action on?
I agree it would be helpful for the maintainer(s) to inform the community when they make major changes. It appears as though two employees of IBM (you and Adriana) are on the reviewers list, so you should have already been informed of issues.
I want to be clear on one thing: Maintainers are volunteer roles. They are not required to provide support for the world. They are absolutely not expected to fix every single issue reported on the repo themselves. Some of the better maintainers are more responsive to Github issues than others. We do not provide any sort of SLA requirement to maintainers on this. Ultimately, problems you encounter are on you to fix (if it doesn't catch the maintainer's interest). Discord is fine for "I'm running into this and do you have ideas on what might be wrong" kind of discussion. It is not useful for tracking issues longer than a 48 hour period. If you have known issues that aren't logged in Github for others to find, please do so.
I'm not sure what the concern is here. There is certainly no "wait 24 hours between each commit". If a series is deemed ready to go, the maintainer has the authority to merge it.
I looked through the commits-in-question and spot checked a good percentage of them. All of them I looked at were in Gerrit for minimum a week. 3 peers were automatically added to all of them and none of these peers gave input. I see absolutely nothing wrong here. I maintain a very widely used repository (openbmc/sdbusplus) and there are 8 people who are listed as owners or reviewers. I regularly can put up code, leave it for a week, and get no feedback. I'm going to merge that stuff too. You had opportunity to give feedback. Please be more proactive at doing so in the future if you have a concern.
"Tested" statements are optional. They are really more of hints to the maintainer as to what was done, so that the maintainer can properly dispose of the commit. Ideally, if all code is well-covered by unit tests, there really isn't any point in a "Tested" statement because the "Tested" is "CI runs all the unit tests anyhow and they have 100% coverage". Contributors are never expected to have tested on your platform with your configuration. That goes for a maintainer or anyone else. Again, if you had concerns with a contributor/contribution to a repository you were a reviewer on, you should have raised that during review.
I agree this appears to be an issue. There is already work underway to better define what an "inactive maintainer" looks like and what we do about it. See #20. Query of current commits: Half of these commits appear to be from the 2 current maintainers and haven't been reviewed by you. I would propose that if you are interested in helping this repository along that you volunteer to assist in the maintenance here.
It doesn't look like you've really done any contributions to the repository though. I think you should start by regularly reviewing and helping to sort out the issues that you feel the refactoring has done. It seems like it is plenty ripe for contributions. |
+1. What action are you looking for here? The intent of this repository is to be able to hold votes, and I see nothing to vote on in terms of action in your above. I don't believe it resolves all of your issues, but I've been attempting to try to put a framework around this problem for some time. If you could take a look at: https://gerrit.openbmc.org/c/openbmc/docs/+/60215 and see if it helps to resolve any of your comments above, it would be appreciated.
I didn't know this was a requirement on maintainers. Lots of refactors happen without mailings to the mailing list. What are you hoping for here? That every refactor is pushed both to Gerrit and to the mailing list? That seems onerous. Maybe there's some intermediate where we could tag refactors?
Are there details on this somewhere? Has this been discussed with the other maintainers? Has any resolution been attempted? Can you point to the specific reviews in question that broke things? Have you proposed reverting them? Were there tests present in the autobump to verify this functionality?
There are no requirements that maintainers make themselves available through either of these avenues, despite it being very common. github issues specifically take significant time to triage, given the amount of noise, and lack of responses they see. I see 11 open issues on phosphor-networkd. Which ones are you talking about specifically? I see:
Why is this a problem? In many cases, breaking up commits into many smaller commits is desired. The upshot of this is that many commits will be merged in batches.
This seems worrying, but there's no requirement that maintainers peer review their own code. Have you brought this up with the maintainers that did the merge? How much time was given for community input? Did you or your team give any input on the patches?
So long as the last commit in a series includes test details, they're not required on every commit. The goal is to make sure that the committer tested their code, not to impose extra work upon them for testing every commit along the way. If there were missing tested statements, were those mentioned in code review?
This is a constant problem on all repositories, not just networkd. This is about the only thing I see in which we could propose something actionable, like review minimums for contributors. This unfortunately conflicts with your desire that all commits be reviewed. Most of the project maintains a negative review ratio of commits to reviews, and most reviews get put on a small community of core maintainers. Overall, The actions we need to consider here are:
Micro:
|
Thanks @williamspatrick for sharing the views.
Yes. This issue is created after i discussed with my team at IBM, and one of the TOF member Andrew Jeffery @amboar . After our internal discussion, it seemed to be reasonable to bring this issue to the TOF's attention, that we have had a hit since there were too many things happening very fast on this repo. And basic functional use cases are broken.
Yes, Adriana and I are the reviewers. We are in the cc list. But too many commits in such a short timeframe makes it really difficult to look, when the commit message is not clearly telling what the change is for. I agree i should have given a -1 vote and asked for the details as you mentioned, but i assumed that the commit will not get merged, until there is at least a single +1 vote and a peer review. And the commits were merged before i got back to them. Usually, our commits stay at least for some months in the review.
I too agree. My team has pushed fixes for some of these refactor related bugs already.
Basic functions are expected to be tested before the code goes in. When basic functionalities are broken, we seek maintainers assistence. We have pushed some fixes - no reviews yet.
Here are the Issues created in the repo by different people. These cover major number of issues known so far. There are pings in the discord which are not concluded on some design clarifications.
I was not aware of this. But, if the change does not make a big impact, i think it should be OK. But, for major changes like making dynamic & static IPv4 co-exist, i think it is better to let the people explicitly that there is a big change coming. Unfortunately these had introduced many bugs as well. If there were no breakages, then it also should have been OK as other changes which keep happening at upstream repos.
I agree Pattrick. I will do so.
But it is not possible to catch these bugs until we test the commit. Some issues which got in with this refactor are not system specific. I believe other community members are also have observed issues.
Sure i will read this up.
Sure. I have been reviewing considerable commits upstream. I have also been involved in design discussions on various feature commits like IPv6. I have been added as reviewer recently, and I will work on reviewing more commits and contribute to share the maintainers load.
Yes. I have less networkd commits. I am guiding my team(Ravi Teja, Asmitha Karunanithi) in networkd contributions. We also have been involved in implementing & reviewing many of the network features.
As mentioned above we are actively working on fixing the refactor related issues. But appriciate if maintainers get involved in getting the critical fixes in faster and discussing the design changes. |
This doesn't answer my question. WHAT actions are you requesting? Please be specific. If your goal is to facilitate a discussion, the mailing list will reach more people than a TOF review. I STILL don't see any reference to which commits you're referring to with this refactor. In my own quick look at it, I see commits ranging from the last 6 months, all done progressively, and not pushed "quickly", and all of which seem to be pretty good cleanups that, had I reviewed them, I would've approved. Pulling out a random commit from the pile: https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/57674 This was in review for 20 days, almost 3 weeks with no input. What is your expectations for the maintainer to do in this scenario? It should be noted that this situation of a maintainer merging their own commits is not unique, so @wak-google has really done nothing wrong here that I can see, even if the result is less than optimal for some peoples development model. There are many repositories where maintainers commit their own code, which I agree, is unfortunate.
This is one action, but not something under the control of the TOF to get individuals to review code, so this should be discussed elsewhere.
The problem with your statement here is that any change can cause unexpected impact, including well reviewed and well tested commits. There's no way to know until we merge and test all use cases, which we can't expect every use case to be tested on every commit.
This bug steps literally starts by modifying an internal file. I don't see how we can expect committers to assume that users will mess with the filesystem from underneath them. It also doesn't list what platform it was submitted against. I see no indication that this one is a regression. Sure, it's a bug, but I don't see any report of what commit caused the issue or any attempt to bisect, so there's no good way to revert the changes. This bug ideally should've been closed a long time ago, but the bug submitter didn't retest. Yes, this was a regression, but it wasn't expected by the code submitter, and frankly, was something I personally would've missed in code review as well. The fix was submitted quickly, and the code is now functioning as expected. Mistakes will happen, and subtle changes to a dbus path structure might have unintended consequences. So far as I'm aware, the fix for this was in review before the issue was submitted. This has one sentence of description which is not nearly enough to triage effectively, which is likely why the maintainer is ignoring it.
It's really difficult to expect a maintainer to get involved in fixes, when the folks in question didn't get involved when the maintainers code needed reviewed. There's some hypocrisy here that we need to address, because as I'm reading it, there's some that expect maintainers to quickly review their own commits, but when a maintainers commits need review, letting them sit a significant amount of time is acceptable? That doesn't scale beyond a single case, and doesn't generalize to any sort of "rule" that this committee can make. It makes it seem as if you're asking the TOF to intervene on every commit, which, I don't think we have the time to do. |
Reiterating, we resorted to opening an issue in TOF based on the advice from the TOF member. a) We had concerns in the way a lot of commits were pushed & merged without enough time for review. b) This wasn't a one off slow response/no response from maintainers. There are enough evidences which we can point to in the discord conversations and on the commits. We do understand this is a voluntary work and will leave it to the TOF's wisdom and experience to define how we have to operate as a community. Our recommendation is for TOF to define the social contract and ensure the same gets applied to every other sub repository under OpenBMC project.
No doubt, William has made significant contributions by pushing these changes to the community, and we're also benefitted by the same. However, there were a number of basic things which broke after the refactor. What we expect is getting some design details when requested, and responses to the questions/issues on discord and reviews on the commits where we're pushing fixes for the identified problems. Some of our fixes =>
Right. That is our problem as well. Please note we're talking ~100+ commits with not much info in the commit messages. We are not finding the exact commits which broke the basic usecases, such that we can pick them and revert them.
Yes, this bmcweb fix https://gerrit.openbmc.org/c/openbmc/bmcweb/+/61189 is from my team member Ravi Teja.
That was never the ask nor the intention. We were informed that the right process is to raise an issue with TOF for discussion/deliberation. I understand we're still not in the same page. I will let @amboar chime in as we've shared all the evidence and concerns much prior to raising this issue. He could articulate the views better in your next TOF meeting. That said, we really appreciate all the excellent work all of you are doing for the OpenBMC community and we'll also strive to give back with meaningful contributions and participation. |
That's fine, and I'm glad people are evangelizing the TOF, but unless someone proposes an actual process change, there's very little we can do in the context of this bug.
What would you like to be done about them? Going out on a limb for a second, I'm guessing that William assumed that given nobody had reviewed the last 60 commits, nobody was going to review the next 60 commits, and didn't see a reason to slow down to wait for reviews that would never come. Having people regularly review the code for your use cases would've prevented this, but I think we already agree that's a change that needs to happen.
Considering that bisecting is a well-trodden strategy for identifying bugs, regardless of the number of commits, why hasn't this been done? 100s of commits were done over months that nobody showed an interest in. Would more automated test coverage help here? Despite several asks, I don't see anything actionable being requested. While I do agree there are things here that can be discussed, I don't see anything to be voted on in terms of a process change, or requests specific to phosphor-networkd, so I will be voting this as incomplete. My personal recommendation would be for @sunharis and team to get more involved in code reviews, and work with the networkd maintainers to come to an agreement about how code reviews are performed, and how changes to that repository are tested. If there are patchsets that need to be reverted because they broke previously functional things, lets get those identified. |
I'm going to try address some broad points as others have started to tackle the details. Firstly, @sunharis: I want to reiterate that while I work for IBM I do not represent IBM on the TOF. I represent myself and advocate for what I personally think is the best path forward on issues to maintain or improve project health. If the best path forward for the project is not well aligned with IBM's hopes or goals, IBM needs to work out how they adjust their direction in the face of any decisions made (often I help IBM work out how we adjust our direction too, but that is a separate and entirely internal responsibility). Particularly:
The wording here reads to me as if I have agreed to personally take responsibility for advocating your position to the TOF. While I'm sympathetic to some of the problems outlined this is not a responsibility to which I've agreed. These are issues that you have raised. You must continue the discussion through to its conclusion, and present all the specific evidence required to substantiate your position. In my role on the TOF I will discuss any provided evidence for the purpose of working towards consensus on an outcome, but I'm not here to bring new facts to light of my own accord. Second, while I had spoken with @sunharis previously, the recommendations I made were:
Others have already outlined why there might be immediate difficulties with 1. Open source is entirely driven by active engagement in technical contributions of features and fixes, and social contributions by developing consensus on technical direction in the community. Status in the community (such as becoming a maintainer for a repository) is a function of identity in the community, which is itself a function of trust in you among community members. Trust in you among community members is developed through technical and social contribution to the community. Addressing 2, the issue as opened does not concentrate on substantiating inactivity of the phosphor-networkd maintainers. While some evidence of problems in the repository has been drawn out in subsequent posts, the fundamental description makes several claims that lack support and doesn't request any particular action be taken. TOF issues need to be more succinct than a collection of complaints. As others have said, one specific, actionable request is required, and even then, only once there has been a failure to achieve consensus in the community. Again broadly, my current take on it all is:
This is an open source project, everyone must scratch their own itch. The license explicitly disclaims any warranty of fitness for a particular purpose. The software It is what it is, and if it isn't appropriate, we have the capacity to change it such that it suits our needs. Together, I think this leaves just the third point about poor responsiveness as the only significant issue that needs to be substantiated and addressed. |
Thank you @amboar and @edtanous. I know there is no company specific advocate while working as a community. I mentioned Andrew J can pitch in, since we had talked already about this. Things are better explained orally in a meeting, than typing big messages. So just wanted to say that concerns can be explained better if there is any TOF meeting happening. Ed, we did try a bisect and failed, since the commits were tightly coupled, and we could not find a logical bisect commit-sets to pick. It was a lot of effort for us, so we went ahead and picked all the refactor and started fixing the issues.
I agree. We have 2 maintainers for networkd, and William has done a ton of contributions. But, appreciate if there is enough document for other contributors to understand the changes well. Such as design doc/commit message/description/answers in the discord/email when asked.
We are consuming the latest networkd, found the bugs and we are working on fixing the use cases which are failing. My main take-away for me here is that i have to be more involved in reviews and giving feedback at commits. I will surely try my best to do that in future. My team is contributing considerable codebase to the community across repos, and that will continue and will improve in future. |
This may work within your company's culture, but this is not how open-source communities typically work. We need to quell the notion that "meetings are how things get solved" because you're, in effect, trying to project your company culture onto everyone else. Consider it this way. We have 7 TOF members. You want to raise a topic that is going to take, at least, 15 minutes of time, which is 2 people-hours (7 members + you). You could spend a lot of time writing a clear, succinct message and still come out ahead. |
Sure! I am OK with writing up. |
While I agree with the overall gist of what others have written above, regarding this:
I think I'd tend to agree with @sunharis that patches going from initial posting to merging in 15 minutes (which in a globally-distributed project is essentially zero) seems a bit aggressive. Yes, to @edtanous's point I can see why @wak-google could easily (and not unreasonably) come to the conclusion that no one else is reviewing his patches and thus accelerate the timeline on waiting around for any response before proceeding with a self-merge, but I think we should strive to always allow at least some opportunity for other reviewers to take a look. |
So I did try to read the thread thoroughly and digest it, but maybe I misinterpreted. Pushing for review and then merging inside 15 minutes is too aggressive, I agree. What I had understood from the thread was that a large number of patches were merged within 15 minutes of each other, but had been in Gerrit for review for a reasonable period of time (weeks?). However, that was based on what I read here, and not on actually investigating the situation in Gerrit. Let me look there. |
So I went back over the last 100-odd changes from @wak-google to phosphor-networkd in Gerrit. For the most part changes were published in Gerrit for a period that provided opportunity for review. 1 change was published and merged within 10 minutes:
2 further changes were published and merged within 20 minutes:
and one other change was published and merged within 8 hours:
A further 8 patches were published for review for 1-3 days before merging. The remainder were available for between a week and a month before merging. So 4 out of approximately 100 patches were plausibly controversial in terms of whether enough time was provided to allow for review. I don't think the evidence supports the argument for a pervasive pattern of problematic behaviour. The limited number of cases IMO falls under maintainer discretion. |
Agreed with the general conclusion, though in fairness to @sunharis, while after checking the repo I see they fell slightly past the ~100-commit window, expanding that window slightly to include the 8 patches linked above would put it at about 10% of the last 120, FWIW. I do think we should be careful about doing much of that if we want to foster increased participation and review, since it seems (as perhaps evidenced somewhat by the existence of this TOF issue) that even a relatively small amount of it can create a feeling of "patches get merged too fast for me to have a chance to review them" among others, which then tends to turn the lack of code review into a self-perpetuating problem. |
Hello! It's a bit late, but I would like to add my 15 cents to your issue. sunharis, thank you for raising this up. |
@Kitsok What would better “community involvement” look like to you? |
|
Is that something you'd be willing to contribute? I suspect the community would find high value in such an email that looked at the commits, and picked out the breaking changes into a separate category.
Given that even the slow, open for several month patches were not reviewed, it seems unlikely that the timeframe was the impediment to providing review comments, but I can't state that for certain. @sunharis, if those 8 commits had been in review for longer, do you believe they would've been reviewed by you or your team? |
Can you be a little more specific about what you mean here? Are we talking about breaking changes to user-facing interfaces on upstream platforms? Of the things reported on phosphor-networkd, I think only one fell into that category (the dbus path change), and it was an oversight in understanding combined with missing test cases, and I don't believe is the norm. The vast majority of the phosphor-networkd patches went through without any impact. |
It looks like the discussion has died down a bit here. @sunharis can you enumerate the specific changes you would like to see in community policy or behaviour? Ideally this would be through patches to openbmc/docs, but a list here in the comments might also work. With that information we can formulate a TOF vote on whether those are outcomes are worth pursuing. Alternatively, it would be great to see the issue closed and further efforts taken to address the identified problems in the community, through discussions and contributions in Gerrit, on the mailing list, or in Discord. |
I see below points are worth pursuing.
Everyone who was following this issue are welcome to add any points to this list. |
Thanks @sunharis. Regarding 2 I've started keeping a changelog for obmc-console as defined by https://keepachangelog.com/en/1.0.0/. It's a bit of an experiment, and I'm happy to report back on how it goes, but I think it can work. I'm hoping it will also encourage tagging and semantic versioning with how I intend to use it (but without disrupting our current auto-bump strategy in the bitbake metadata). I plan to do the same with libpldm as I have a bunch of API/ABI breaking changes in mind for that too. My feeling is the remainder (points 1 and 3-5) can be dealt with by some vigilance in code review. I hope that we can achieve the regular and active engagement required there to help drive these changes. |
Yes. But with the minimum 24 hours review window, I think it's tough to achieve. It should be followed as a ground rule by all contributors to the repo. |
I think the issues have been addressed on the basis that the conversation has come to a stop. I'm closing this as a result, but please re-open it if you feel otherwise. |
Yes @amboar Thank you |
IBM consumed the latest refactored phosphor-networkd code, and we have hit many regression issues. While refactor is a good thing to do to improve the code quality and performance, I am creating this issue to address the concern regarding below points.
The text was updated successfully, but these errors were encountered: