Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phosphor-networkd repo refactor has broken many basic usecases #25

Closed
sunharis opened this issue Apr 4, 2023 · 25 comments
Closed

phosphor-networkd repo refactor has broken many basic usecases #25

sunharis opened this issue Apr 4, 2023 · 25 comments

Comments

@sunharis
Copy link

sunharis commented Apr 4, 2023

IBM consumed the latest refactored phosphor-networkd code, and we have hit many regression issues. While refactor is a good thing to do to improve the code quality and performance, I am creating this issue to address the concern regarding below points.

  1. Refactor the basic framework without alerting the community.
  2. Refactor has broken the basic framework of handling the dynamic and static network.
  3. No response/very slow response from the maintainers via discord/github issues.
  4. Many commits are merged in a large bunch.
  5. Many commits are merged without peer review.
  6. Many commits are merged without a test details and description about why this change is done, and what are the impacts.
  7. No reviews for many upstream commits which are under review.
@williamspatrick
Copy link
Member

I don't really see a "desired action" out of this. Is there something you had in mind for us to take action on?

Refactor the basic framework without alerting the community.

I agree it would be helpful for the maintainer(s) to inform the community when they make major changes. It appears as though two employees of IBM (you and Adriana) are on the reviewers list, so you should have already been informed of issues.

Refactor has broken the basic framework of handling the dynamic and static network.
No response/very slow response from the maintainers via discord/github issues.

I want to be clear on one thing: Maintainers are volunteer roles. They are not required to provide support for the world. They are absolutely not expected to fix every single issue reported on the repo themselves.

Some of the better maintainers are more responsive to Github issues than others. We do not provide any sort of SLA requirement to maintainers on this. Ultimately, problems you encounter are on you to fix (if it doesn't catch the maintainer's interest).

Discord is fine for "I'm running into this and do you have ideas on what might be wrong" kind of discussion. It is not useful for tracking issues longer than a 48 hour period. If you have known issues that aren't logged in Github for others to find, please do so.

Many commits are merged in a large bunch.

I'm not sure what the concern is here. There is certainly no "wait 24 hours between each commit". If a series is deemed ready to go, the maintainer has the authority to merge it.

Many commits are merged without peer review.

I looked through the commits-in-question and spot checked a good percentage of them. All of them I looked at were in Gerrit for minimum a week. 3 peers were automatically added to all of them and none of these peers gave input. I see absolutely nothing wrong here.

I maintain a very widely used repository (openbmc/sdbusplus) and there are 8 people who are listed as owners or reviewers. I regularly can put up code, leave it for a week, and get no feedback. I'm going to merge that stuff too.

You had opportunity to give feedback. Please be more proactive at doing so in the future if you have a concern.

Many commits are merged without a test details and description about why this change is done, and what are the impacts.

"Tested" statements are optional. They are really more of hints to the maintainer as to what was done, so that the maintainer can properly dispose of the commit. Ideally, if all code is well-covered by unit tests, there really isn't any point in a "Tested" statement because the "Tested" is "CI runs all the unit tests anyhow and they have 100% coverage". phosphor-networkd has pretty decent unit test coverage (that is not to say that you can't have 100% code coverage and still have bugs).

Contributors are never expected to have tested on your platform with your configuration. That goes for a maintainer or anyone else.

Again, if you had concerns with a contributor/contribution to a repository you were a reviewer on, you should have raised that during review.

No reviews for many upstream commits which are under review.

I agree this appears to be an issue. There is already work underway to better define what an "inactive maintainer" looks like and what we do about it. See #20.

Query of current commits:
https://gerrit.openbmc.org/q/status:open+project:openbmc/phosphor-networkd+after:2022-12-1

Half of these commits appear to be from the 2 current maintainers and haven't been reviewed by you. I would propose that if you are interested in helping this repository along that you volunteer to assist in the maintenance here.

$ git log --author="Sunitha Harish" --oneline
c9cc1a9 Reviewers: Remove Nagaraju Goruganti
56f6aed Reviewers: Add Sunitha Harish

It doesn't look like you've really done any contributions to the repository though. I think you should start by regularly reviewing and helping to sort out the issues that you feel the refactoring has done. It seems like it is plenty ripe for contributions.

@edtanous
Copy link
Contributor

edtanous commented Apr 4, 2023

I don't really see a "desired action" out of this.

+1. What action are you looking for here? The intent of this repository is to be able to hold votes, and I see nothing to vote on in terms of action in your above.

I don't believe it resolves all of your issues, but I've been attempting to try to put a framework around this problem for some time. If you could take a look at: https://gerrit.openbmc.org/c/openbmc/docs/+/60215 and see if it helps to resolve any of your comments above, it would be appreciated.

Refactor the basic framework without alerting the community.

I didn't know this was a requirement on maintainers. Lots of refactors happen without mailings to the mailing list. What are you hoping for here? That every refactor is pushed both to Gerrit and to the mailing list? That seems onerous. Maybe there's some intermediate where we could tag refactors?

Refactor has broken the basic framework of handling the dynamic and static network.

Are there details on this somewhere? Has this been discussed with the other maintainers? Has any resolution been attempted? Can you point to the specific reviews in question that broke things? Have you proposed reverting them? Were there tests present in the autobump to verify this functionality?

No response/very slow response from the maintainers via discord/github issues.

There are no requirements that maintainers make themselves available through either of these avenues, despite it being very common. github issues specifically take significant time to triage, given the amount of noise, and lack of responses they see. I see 11 open issues on phosphor-networkd. Which ones are you talking about specifically? I see:
openbmc/phosphor-networkd#63 which has pinged wak a couple times, but includes basically no detail on what system it was observed on, what behavior was observed, and is a single sentence of actionable data for maintainers to triage.

Many commits are merged in a large bunch.

Why is this a problem? In many cases, breaking up commits into many smaller commits is desired. The upshot of this is that many commits will be merged in batches.

Many commits are merged without peer review.

This seems worrying, but there's no requirement that maintainers peer review their own code. Have you brought this up with the maintainers that did the merge? How much time was given for community input? Did you or your team give any input on the patches?

Many commits are merged without a test details and description about why this change is done, and what are the impacts.

So long as the last commit in a series includes test details, they're not required on every commit. The goal is to make sure that the committer tested their code, not to impose extra work upon them for testing every commit along the way. If there were missing tested statements, were those mentioned in code review?

No reviews for many upstream commits which are under review.

This is a constant problem on all repositories, not just networkd. This is about the only thing I see in which we could propose something actionable, like review minimums for contributors. This unfortunately conflicts with your desire that all commits be reviewed. Most of the project maintains a negative review ratio of commits to reviews, and most reviews get put on a small community of core maintainers.

Overall, The actions we need to consider here are:
Macro:

  • Do we impose any kind of new requirements on maintainers (not reviewing their own code, testing, community input, ect)?
  • How do we handle regressions? Is there any kind of timeline between a regression being submitted and a revert/fix decision being made that we can impose?
  • How do we encourage more community feedback?

Micro:

  • Do we revert the networkd repo back to its old state, and push the commits back to review for more input?
  • Do we have avenues to discourage regressions here?

@sunharis
Copy link
Author

sunharis commented Apr 4, 2023

Thanks @williamspatrick for sharing the views.

I don't really see a "desired action" out of this. Is there something you had in mind for us to take action on?

Yes. This issue is created after i discussed with my team at IBM, and one of the TOF member Andrew Jeffery @amboar . After our internal discussion, it seemed to be reasonable to bring this issue to the TOF's attention, that we have had a hit since there were too many things happening very fast on this repo. And basic functional use cases are broken.

I agree it would be helpful for the maintainer(s) to inform the community when they make major changes. It appears as though two employees of IBM (you and Adriana) are on the reviewers list, so you should have already been informed of issues.

Yes, Adriana and I are the reviewers. We are in the cc list. But too many commits in such a short timeframe makes it really difficult to look, when the commit message is not clearly telling what the change is for. I agree i should have given a -1 vote and asked for the details as you mentioned, but i assumed that the commit will not get merged, until there is at least a single +1 vote and a peer review. And the commits were merged before i got back to them. Usually, our commits stay at least for some months in the review.

I want to be clear on one thing: Maintainers are volunteer roles. They are not required to provide support for the world. They are absolutely not expected to fix every single issue reported on the repo themselves.

I too agree. My team has pushed fixes for some of these refactor related bugs already.
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/61997
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/62033
Apart from this we also have been pushing commits to fix other issues which are found in our testing.

Ultimately, problems you encounter are on you to fix (if it doesn't catch the maintainer's interest).

Basic functions are expected to be tested before the code goes in. When basic functionalities are broken, we seek maintainers assistence. We have pushed some fixes - no reviews yet.

If you have known issues that aren't logged in Github for others to find, please do so.

Here are the Issues created in the repo by different people. These cover major number of issues known so far.
openbmc/phosphor-networkd#60
openbmc/phosphor-networkd#61
openbmc/phosphor-networkd#62
openbmc/phosphor-networkd#63

There are pings in the discord which are not concluded on some design clarifications.

I'm not sure what the concern is here. There is certainly no "wait 24 hours between each commit". If a series is deemed ready to go, the maintainer has the authority to merge it.

I was not aware of this. But, if the change does not make a big impact, i think it should be OK. But, for major changes like making dynamic & static IPv4 co-exist, i think it is better to let the people explicitly that there is a big change coming. Unfortunately these had introduced many bugs as well. If there were no breakages, then it also should have been OK as other changes which keep happening at upstream repos.

You had opportunity to give feedback. Please be more proactive at doing so in the future if you have a concern.

I agree Pattrick. I will do so.

"Tested" statements are optional.

But it is not possible to catch these bugs until we test the commit. Some issues which got in with this refactor are not system specific. I believe other community members are also have observed issues.

I agree this appears to be an issue. There is already work underway to better define what an "inactive maintainer" looks like and what we do about it. See #20.

Sure i will read this up.

Half of these commits appear to be from the 2 current maintainers and haven't been reviewed by you. I would propose that if you are interested in helping this repository along that you volunteer to assist in the maintenance here.

Sure. I have been reviewing considerable commits upstream. I have also been involved in design discussions on various feature commits like IPv6. I have been added as reviewer recently, and I will work on reviewing more commits and contribute to share the maintainers load.

It doesn't look like you've really done any contributions to the repository though.

Yes. I have less networkd commits. I am guiding my team(Ravi Teja, Asmitha Karunanithi) in networkd contributions. We also have been involved in implementing & reviewing many of the network features.

I think you should start by regularly reviewing and helping to sort out the issues that you feel the refactoring has done. It seems like it is plenty ripe for contributions.

As mentioned above we are actively working on fixing the refactor related issues. But appriciate if maintainers get involved in getting the critical fixes in faster and discussing the design changes.

@edtanous
Copy link
Contributor

edtanous commented Apr 4, 2023

Yes.

This doesn't answer my question. WHAT actions are you requesting? Please be specific. If your goal is to facilitate a discussion, the mailing list will reach more people than a TOF review.

I STILL don't see any reference to which commits you're referring to with this refactor. In my own quick look at it, I see commits ranging from the last 6 months, all done progressively, and not pushed "quickly", and all of which seem to be pretty good cleanups that, had I reviewed them, I would've approved. Pulling out a random commit from the pile: https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/57674 This was in review for 20 days, almost 3 weeks with no input. What is your expectations for the maintainer to do in this scenario?

It should be noted that this situation of a maintainer merging their own commits is not unique, so @wak-google has really done nothing wrong here that I can see, even if the result is less than optimal for some peoples development model. There are many repositories where maintainers commit their own code, which I agree, is unfortunate.

I agree i should have given a -1 vote

This is one action, but not something under the control of the TOF to get individuals to review code, so this should be discussed elsewhere.

if the change does not make a big impact, i think it should be OK.

The problem with your statement here is that any change can cause unexpected impact, including well reviewed and well tested commits. There's no way to know until we merge and test all use cases, which we can't expect every use case to be tested on every commit.

Here are the Issues created in the repo by different people. These cover major number of issues known so far.
openbmc/phosphor-networkd#60

This bug steps literally starts by modifying an internal file. I don't see how we can expect committers to assume that users will mess with the filesystem from underneath them. It also doesn't list what platform it was submitted against.

openbmc/phosphor-networkd#61

I see no indication that this one is a regression. Sure, it's a bug, but I don't see any report of what commit caused the issue or any attempt to bisect, so there's no good way to revert the changes.

openbmc/phosphor-networkd#62

This bug ideally should've been closed a long time ago, but the bug submitter didn't retest. Yes, this was a regression, but it wasn't expected by the code submitter, and frankly, was something I personally would've missed in code review as well. The fix was submitted quickly, and the code is now functioning as expected. Mistakes will happen, and subtle changes to a dbus path structure might have unintended consequences. So far as I'm aware, the fix for this was in review before the issue was submitted.

openbmc/phosphor-networkd#63

This has one sentence of description which is not nearly enough to triage effectively, which is likely why the maintainer is ignoring it.

if maintainers get involved in getting the critical fixes in faster

It's really difficult to expect a maintainer to get involved in fixes, when the folks in question didn't get involved when the maintainers code needed reviewed. There's some hypocrisy here that we need to address, because as I'm reading it, there's some that expect maintainers to quickly review their own commits, but when a maintainers commits need review, letting them sit a significant amount of time is acceptable? That doesn't scale beyond a single case, and doesn't generalize to any sort of "rule" that this committee can make. It makes it seem as if you're asking the TOF to intervene on every commit, which, I don't think we have the time to do.

@sunharis
Copy link
Author

sunharis commented Apr 5, 2023

WHAT actions are you requesting? Please be specific. If your goal is to facilitate a discussion, the mailing list will reach more people than a TOF review.

Reiterating, we resorted to opening an issue in TOF based on the advice from the TOF member.

a) We had concerns in the way a lot of commits were pushed & merged without enough time for review.
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58180
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58181
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58182
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58159
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58158
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58157
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58156
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58155
I understand we did miss reviewing some of the commits on time - but does that justify merging 8 commits in 15 mins?

b) This wasn't a one off slow response/no response from maintainers. There are enough evidences which we can point to in the discord conversations and on the commits. We do understand this is a voluntary work and will leave it to the TOF's wisdom and experience to define how we have to operate as a community. Our recommendation is for TOF to define the social contract and ensure the same gets applied to every other sub repository under OpenBMC project.

@wak-google has really done nothing wrong here that I can see, even if the result is less than optimal for some people's development model.

No doubt, William has made significant contributions by pushing these changes to the community, and we're also benefitted by the same. However, there were a number of basic things which broke after the refactor. What we expect is getting some design details when requested, and responses to the questions/issues on discord and reviews on the commits where we're pushing fixes for the identified problems.
For eg: When the Static & DHCP IPv4 was made to be co-existent in the refactor - a mention of that in a commit message would have been desirable as its a major change. Has the equivalent changes being made to the interface's default gateway to handle their co-existence(currently it's at interface level and is single value) - answer is NO. So there are misses, but we do agree everything cannot be perfect. A better communication both ways would have helped here.

Some of our fixes =>
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/61997
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/62033
https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/61791
https://gerrit.openbmc.org/c/openbmc/bmcweb/+/53959

I don't see any report of what commit caused the issue or any attempt to bisect, so there's no good way to revert the changes.

Right. That is our problem as well. Please note we're talking ~100+ commits with not much info in the commit messages. We are not finding the exact commits which broke the basic usecases, such that we can pick them and revert them.
However, we will continue working on fixing the issues and pushing the fixes upstream.

So far as I'm aware, the fix for this was in review before the issue was submitted.
openbmc/phosphor-networkd#62

Yes, this bmcweb fix https://gerrit.openbmc.org/c/openbmc/bmcweb/+/61189 is from my team member Ravi Teja.

It makes it seem as if you're asking the TOF to intervene on every commit, which, I don't think we have the time to do.

That was never the ask nor the intention. We were informed that the right process is to raise an issue with TOF for discussion/deliberation.

I understand we're still not in the same page. I will let @amboar chime in as we've shared all the evidence and concerns much prior to raising this issue. He could articulate the views better in your next TOF meeting.

That said, we really appreciate all the excellent work all of you are doing for the OpenBMC community and we'll also strive to give back with meaningful contributions and participation.

@edtanous
Copy link
Contributor

edtanous commented Apr 5, 2023

Reiterating, we resorted to opening an issue in TOF based on the advice from the TOF member.

That's fine, and I'm glad people are evangelizing the TOF, but unless someone proposes an actual process change, there's very little we can do in the context of this bug.

We had concerns in the way a lot of commits were pushed & merged without enough time for review.

What would you like to be done about them? Going out on a limb for a second, I'm guessing that William assumed that given nobody had reviewed the last 60 commits, nobody was going to review the next 60 commits, and didn't see a reason to slow down to wait for reviews that would never come. Having people regularly review the code for your use cases would've prevented this, but I think we already agree that's a change that needs to happen.

We are not finding the exact commits which broke the basic usecases

Considering that bisecting is a well-trodden strategy for identifying bugs, regardless of the number of commits, why hasn't this been done? 100s of commits were done over months that nobody showed an interest in. Would more automated test coverage help here?

Despite several asks, I don't see anything actionable being requested. While I do agree there are things here that can be discussed, I don't see anything to be voted on in terms of a process change, or requests specific to phosphor-networkd, so I will be voting this as incomplete.

My personal recommendation would be for @sunharis and team to get more involved in code reviews, and work with the networkd maintainers to come to an agreement about how code reviews are performed, and how changes to that repository are tested. If there are patchsets that need to be reverted because they broke previously functional things, lets get those identified.

@amboar
Copy link
Member

amboar commented Apr 6, 2023

I'm going to try address some broad points as others have started to tackle the details.

Firstly, @sunharis: I want to reiterate that while I work for IBM I do not represent IBM on the TOF. I represent myself and advocate for what I personally think is the best path forward on issues to maintain or improve project health. If the best path forward for the project is not well aligned with IBM's hopes or goals, IBM needs to work out how they adjust their direction in the face of any decisions made (often I help IBM work out how we adjust our direction too, but that is a separate and entirely internal responsibility).

Particularly:

I understand we're still not in the same page. I will let @amboar chime in as we've shared all the evidence and concerns much prior to raising this issue. He could articulate the views better in your next TOF meeting.

The wording here reads to me as if I have agreed to personally take responsibility for advocating your position to the TOF. While I'm sympathetic to some of the problems outlined this is not a responsibility to which I've agreed. These are issues that you have raised. You must continue the discussion through to its conclusion, and present all the specific evidence required to substantiate your position. In my role on the TOF I will discuss any provided evidence for the purpose of working towards consensus on an outcome, but I'm not here to bring new facts to light of my own accord.

Second, while I had spoken with @sunharis previously, the recommendations I made were:

  1. Continue to directly engage the maintainers of phosphor-networkd and build trust, possibly gaining maintainership to help out with the workload and direction of the repository
  2. If there has been a regular pattern of unresponsiveness from the maintainers of phosphor-networkd, consider opening an issue against TOF repository to trigger discussion, specifically with respect to Unresponsive maintainers and unclear code review SLO #20 and the resulting documentation

Others have already outlined why there might be immediate difficulties with 1. Open source is entirely driven by active engagement in technical contributions of features and fixes, and social contributions by developing consensus on technical direction in the community. Status in the community (such as becoming a maintainer for a repository) is a function of identity in the community, which is itself a function of trust in you among community members. Trust in you among community members is developed through technical and social contribution to the community.

Addressing 2, the issue as opened does not concentrate on substantiating inactivity of the phosphor-networkd maintainers. While some evidence of problems in the repository has been drawn out in subsequent posts, the fundamental description makes several claims that lack support and doesn't request any particular action be taken. TOF issues need to be more succinct than a collection of complaints. As others have said, one specific, actionable request is required, and even then, only once there has been a failure to achieve consensus in the community.

Again broadly, my current take on it all is:

  1. Is it unfortunate that there are bugs? Yes.
  2. Could the maintainers invested more effort in testing? Maybe.
  3. Was there adequate time to review the patches? It seems so.
  4. Submitting many patches in a short space of time when those patches have been waiting for review over a reasonable period of time isn't something we should be concerned about.
  5. Should the current maintainers improve their responsiveness? Maybe.
  6. Are maintainers required to immediately address arbitrary concerns? Absolutely not. As @williamspatrick said, maintainers are volunteers and you can only do what you can to motivate them to work on your issues, not direct them.

This is an open source project, everyone must scratch their own itch. The license explicitly disclaims any warranty of fitness for a particular purpose. The software It is what it is, and if it isn't appropriate, we have the capacity to change it such that it suits our needs.

Together, I think this leaves just the third point about poor responsiveness as the only significant issue that needs to be substantiated and addressed.

@sunharis
Copy link
Author

sunharis commented Apr 6, 2023

Thank you @amboar and @edtanous.

I know there is no company specific advocate while working as a community. I mentioned Andrew J can pitch in, since we had talked already about this. Things are better explained orally in a meeting, than typing big messages. So just wanted to say that concerns can be explained better if there is any TOF meeting happening.

Ed, we did try a bisect and failed, since the commits were tightly coupled, and we could not find a logical bisect commit-sets to pick. It was a lot of effort for us, so we went ahead and picked all the refactor and started fixing the issues.

maintainers are volunteers, and you can only do what you can to motivate them to work on your issues, not direct them.

I agree. We have 2 maintainers for networkd, and William has done a ton of contributions. But, appreciate if there is enough document for other contributors to understand the changes well. Such as design doc/commit message/description/answers in the discord/email when asked.
I believe maintainers are at a higher responsible position than the contributors. They are the domain experts, and contributors will definitely look up to them for any help/expert solutions. They need not put this at top of their priority list and address all commits/questions immediately. But at least an acknowledgement within some days/weeks is appreciated.

This is an open source project, everyone must scratch their own itch. The license explicitly disclaims any warranty of fitness for a particular purpose. The software It is what it is, and if it isn't appropriate, we have the capacity to change it such that it suits our needs

We are consuming the latest networkd, found the bugs and we are working on fixing the use cases which are failing.

My main take-away for me here is that i have to be more involved in reviews and giving feedback at commits. I will surely try my best to do that in future. My team is contributing considerable codebase to the community across repos, and that will continue and will improve in future.

@williamspatrick
Copy link
Member

Things are better explained orally in a meeting, than typing big messages. So just wanted to say that concerns can be explained better if there is any TOF meeting happening.

This may work within your company's culture, but this is not how open-source communities typically work. We need to quell the notion that "meetings are how things get solved" because you're, in effect, trying to project your company culture onto everyone else.

Consider it this way. We have 7 TOF members. You want to raise a topic that is going to take, at least, 15 minutes of time, which is 2 people-hours (7 members + you). You could spend a lot of time writing a clear, succinct message and still come out ahead.

@sunharis
Copy link
Author

sunharis commented Apr 6, 2023

Things are better explained orally in a meeting, than typing big messages. So just wanted to say that concerns can be explained better if there is any TOF meeting happening.

This may work within your company's culture, but this is not how open-source communities typically work. We need to quell the notion that "meetings are how things get solved" because you're, in effect, trying to project your company culture onto everyone else.

Consider it this way. We have 7 TOF members. You want to raise a topic that is going to take, at least, 15 minutes of time, which is 2 people-hours (7 members + you). You could spend a lot of time writing a clear, succinct message and still come out ahead.

Sure! I am OK with writing up.

@zevweiss
Copy link

zevweiss commented Apr 6, 2023

While I agree with the overall gist of what others have written above, regarding this:

  1. Was there adequate time to review the patches? It seems so.
  2. Submitting many patches in a short space of time when those patches have been waiting for review over a reasonable period of time isn't something we should be concerned about.

I think I'd tend to agree with @sunharis that patches going from initial posting to merging in 15 minutes (which in a globally-distributed project is essentially zero) seems a bit aggressive. Yes, to @edtanous's point I can see why @wak-google could easily (and not unreasonably) come to the conclusion that no one else is reviewing his patches and thus accelerate the timeline on waiting around for any response before proceeding with a self-merge, but I think we should strive to always allow at least some opportunity for other reviewers to take a look.

@amboar
Copy link
Member

amboar commented Apr 17, 2023

I think I'd tend to agree with @sunharis that patches going from initial posting to merging in 15 minutes (which in a globally-distributed project is essentially zero) seems a bit aggressive.

So I did try to read the thread thoroughly and digest it, but maybe I misinterpreted. Pushing for review and then merging inside 15 minutes is too aggressive, I agree. What I had understood from the thread was that a large number of patches were merged within 15 minutes of each other, but had been in Gerrit for review for a reasonable period of time (weeks?). However, that was based on what I read here, and not on actually investigating the situation in Gerrit. Let me look there.

@amboar
Copy link
Member

amboar commented Apr 17, 2023

So I went back over the last 100-odd changes from @wak-google to phosphor-networkd in Gerrit. For the most part changes were published in Gerrit for a period that provided opportunity for review.

1 change was published and merged within 10 minutes:

  1. 58555: rtnetlink_server: Fix edge triggering | https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58555

2 further changes were published and merged within 20 minutes:

  1. 58476: ethernet_interface: Reduce error logspam | https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58476
  2. 58475: util: Remove unused strings | https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58475

and one other change was published and merged within 8 hours:

  1. 58519: network_manager: Make it possible to reload without refresh | https://gerrit.openbmc.org/c/openbmc/phosphor-networkd/+/58519

A further 8 patches were published for review for 1-3 days before merging. The remainder were available for between a week and a month before merging.

So 4 out of approximately 100 patches were plausibly controversial in terms of whether enough time was provided to allow for review. I don't think the evidence supports the argument for a pervasive pattern of problematic behaviour. The limited number of cases IMO falls under maintainer discretion.

@zevweiss
Copy link

So 4 out of approximately 100 patches were plausibly controversial in terms of whether enough time was provided to allow for review. I don't think the evidence supports the argument for a pervasive pattern of problematic behaviour.

Agreed with the general conclusion, though in fairness to @sunharis, while after checking the repo I see they fell slightly past the ~100-commit window, expanding that window slightly to include the 8 patches linked above would put it at about 10% of the last 120, FWIW. I do think we should be careful about doing much of that if we want to foster increased participation and review, since it seems (as perhaps evidenced somewhat by the existence of this TOF issue) that even a relatively small amount of it can create a feeling of "patches get merged too fast for me to have a chance to review them" among others, which then tends to turn the lack of code review into a self-perpetuating problem.

@Kitsok
Copy link

Kitsok commented Apr 18, 2023

IBM consumed the latest refactored phosphor-networkd code, and we have hit many regression issues.

Hello!

It's a bit late, but I would like to add my 15 cents to your issue.
I had no time to report on all issues in different parts of OpenBMC (at least bmcweb, dbus-sensors, phosphor-network, can't remember all), but the overall opinion is the same: too much breaking refactoring without community involvement and clear documenting of such a change.

sunharis, thank you for raising this up.

@williamspatrick
Copy link
Member

@Kitsok What would better “community involvement” look like to you?

@Kitsok
Copy link

Kitsok commented Apr 18, 2023

@Kitsok What would better “community involvement” look like to you?
I think it would be nice to have a digest mail to the list with the description of the breaking commit.

@edtanous
Copy link
Contributor

I think it would be nice to have a digest mail to the list with the description of the breaking commit.

Is that something you'd be willing to contribute? I suspect the community would find high value in such an email that looked at the commits, and picked out the breaking changes into a separate category.

"patches get merged too fast for me to have a chance to review them"

Given that even the slow, open for several month patches were not reviewed, it seems unlikely that the timeframe was the impediment to providing review comments, but I can't state that for certain. @sunharis, if those 8 commits had been in review for longer, do you believe they would've been reviewed by you or your team?

@edtanous
Copy link
Contributor

too much breaking refactoring

Can you be a little more specific about what you mean here? Are we talking about breaking changes to user-facing interfaces on upstream platforms? Of the things reported on phosphor-networkd, I think only one fell into that category (the dbus path change), and it was an oversight in understanding combined with missing test cases, and I don't believe is the norm. The vast majority of the phosphor-networkd patches went through without any impact.

@amboar
Copy link
Member

amboar commented May 2, 2023

It looks like the discussion has died down a bit here. @sunharis can you enumerate the specific changes you would like to see in community policy or behaviour? Ideally this would be through patches to openbmc/docs, but a list here in the comments might also work. With that information we can formulate a TOF vote on whether those are outcomes are worth pursuing.

Alternatively, it would be great to see the issue closed and further efforts taken to address the identified problems in the community, through discussions and contributions in Gerrit, on the mailing list, or in Discord.

@sunharis
Copy link
Author

sunharis commented May 5, 2023

@sunharis can you enumerate the specific changes you would like to see in community policy or behaviour?

I see below points are worth pursuing.

  1. Networkd repo must start following commit message template for each commits pushed. https://github.com/openbmc/docs/blob/master/CONTRIBUTING.md#formatting-commit-messages
  2. A change log document can be added in the repo similar to the README.md - to summarize if any major functionalities are removed/added/modified. Major functional changes can have this document updated in the respective commits.
  3. There are no inline comments in the code. Appreciate if high-level comments are added to major functions -or- a design document at the docs https://github.com/openbmc/phosphor-networkd/tree/master/docs
  4. A heads-up to be sent in the community (email/discord/design doc for review) while merging major refactors. eg: Making static & dynamic IPv4 co-exist.
  5. CI automation to add more tests covering static and dynamic network settings

Everyone who was following this issue are welcome to add any points to this list.

@amboar
Copy link
Member

amboar commented May 5, 2023

Thanks @sunharis. Regarding 2 I've started keeping a changelog for obmc-console as defined by https://keepachangelog.com/en/1.0.0/. It's a bit of an experiment, and I'm happy to report back on how it goes, but I think it can work. I'm hoping it will also encourage tagging and semantic versioning with how I intend to use it (but without disrupting our current auto-bump strategy in the bitbake metadata). I plan to do the same with libpldm as I have a bunch of API/ABI breaking changes in mind for that too.

My feeling is the remainder (points 1 and 3-5) can be dealt with by some vigilance in code review. I hope that we can achieve the regular and active engagement required there to help drive these changes.

@sunharis
Copy link
Author

sunharis commented May 7, 2023

My feeling is the remainder (points 1 and 3-5) can be dealt with by some vigilance in code review. I hope that we can achieve the regular and active engagement required there to help drive these changes.

Yes. But with the minimum 24 hours review window, I think it's tough to achieve. It should be followed as a ground rule by all contributors to the repo.

@amboar
Copy link
Member

amboar commented Oct 11, 2023

I think the issues have been addressed on the basis that the conversation has come to a stop. I'm closing this as a result, but please re-open it if you feel otherwise.

@amboar amboar closed this as completed Oct 11, 2023
@sunharis
Copy link
Author

Yes @amboar Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants