Weekly Meeting 2016 12 07

Agenda

roll call (1 minute)
opens (5 minutes)
bugs (10 minutes)
triage
- This week's new bugs - https://github.com/01org/ciao/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20created%3A%3E%3D2016-12-01
scrub
- P1-https://github.com/01org/ciao/issues?q=is%3Aopen+is%3Aissue+label%3Abug+label%3AP1

##Minutes

obedmr has started a project for his container based ciao development environment https://github.com/obedmr/docker-ciao
We are behind on our bug triage and never ever get to the scrub section of the agenda
- it was agreed that the project lead (kristenc) would become the bug triager and triage offline, freeing our meetings to discuss other things.
We will have no meetings for the rest of the year due to holidays.

###Full IRC Log

<kristenc> ok meeting is started.
<kristenc> #topic Role call
<jvillalo> o/
* anunez9 (86868b46@gateway/web/freenode/ip.134.134.139.70) has joined
<tpepper1> o/
<btwarden> o/
<mcastelino> o/
<markusry> o/
<rbradford> o/
<mrkz> o/
<kristenc> #topic Opens
<kristenc> anyone with opens please chime in.
* p4tux has quit (Remote host closed the connection)
<obedmr> a quick one, if someone wants to take a look on ciao on top of docker containers, I'm creating this      initial project https://github.com/obedmr/docker-ciao
<kristenc> obedmr, this is your developer mode setup?
<obedmr> kristenc: yes
<kristenc> ok, we have gotten behind in our bug triaging.
<kristenc> here's this week's issues.
<kristenc> https://github.com/01org/ciao/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aopen%20created%3A%3E%3D2016-12-01
<kristenc> we can do last weeks if we finish these.
* pixelgeek (cnorman@nat/intel/x-xbcbiwzvzngngycj) has joined
<kristenc> #910 - need bat tests for external IPs. I was thinking this was a p2. I'm not sure how to do this outside of singlevm.
<kristenc> it is tested currently in verify.sh
<tpepper1> it'd be useful to figure out, since that will inform us how an operator would use the feature in the real world
<markusry> So we couldn't run them on the release cluster
<markusry> right now
<markusry> ?
<mcastelino> kristenc, we should  be able to test with BAT on single VM
<kristenc> you need to have control over the pool I think.
<mcastelino> but if we run on a real cluster... there will be cluster specific setup for routing
<kristenc> mcastelino, so - we could potentially add them in, but disable them
<tpepper1> to test in single VM, what IP pool do you use?
<mcastelino> we use a virtual pool... we do that today
<mcastelino> and setup routes for that pool
<mcastelino> that pool is contained to be within the machine
<mcastelino> as the machine hosts the whole cluster
<tpepper1> i see
<kristenc> if our release cluster were moved to a private network, then we could add the tests.
<kristenc> and document how to do it for others.
<kristenc> we could as a first step add the tests and only run in singlevm.
* piet ([email protected]) has joined
<kristenc> ok if nobody objects, P2?
<kristenc> #912
<kristenc> this is true - but my feeling is it's a p3 for now.
<mcastelino> tpepper1, for reference      https://github.com/01org/ciao/blob/master/testutil/singlevm/verify.sh#L258
<mcastelino> this is how we setup the pool routes in single VM
<tpepper1> makes sense
<kristenc> #916 - I agree, I hate the error handling in ciao-cli.
<markusry> agree.
<kristenc> the stack trace useful if you are a programmer.
<markusry> It's not even that useful
<markusry> as ciao-cli is so small
<tpepper1> in this case I don't think it's useful
<tpepper1> the printf of the error says what's wrong.  the trace doesn't contribute
<markusry> each command corresponds to a function
<btwarden> The bigger issue is that the underlying service is returning 500 in that case, so there's not enough information for ciao-cli to formulate a reasonable response, right?
<markusry> so it's easy to figure out where the problem must be
<kristenc> well - you could very easily change the code to dump a stack trace with -v=2 or something, but otherwise just do nicer errors.
<tpepper1> btwarden: yes.  500 should lead to a message like "instance does not exist"
<kristenc> btwarden, our error response codes from the compute api leads much to be desired.
<kristenc> however, cli can respond more gracefully.
<markusry> I don't think the stack trace is useful in ciao-cli
<markusry> If you got the stack trace of controller that would be interesting
<btwarden> Seems like it should either be a 400- or 200- response in that case.
<markusry> But you don't get that
<mcastelino> markusry, we do concatenate errors when we generate errors...
<rbradford> we can shove a message into the 500 code
<kristenc> a not found should be a 400.
<rbradford> or whatever code
<kristenc> 40x
<kristenc> it's a separate bug though.
<btwarden> 404 not found, 410 gone...
<tpepper1> is 500 the not auth'd?
<kristenc> the error response should already contain an error code.
<kristenc> tpepper1, no 500 is the generic server error code.
<rbradford> tpepper1, "Internal service error"
<mcastelino> markusry, can we report the concatenated error as an event
<kristenc> i think the instance code in cli doesn't report the error message right either.
<mcastelino> is there a way to co-relate the API call to event
<btwarden> 500 usually suggests there's a problem internal to the underlying service
<tpepper1> hm.  so does the compute api return 500 when the instance is not found?
<kristenc> i'd have to check.
<rbradford> https://golang.org/pkg/net/http/#Error
<rbradford> we should shove our error message into the message
<tpepper1> well.  in this case I'm less concerned with the specific error than the ugly trace
<kristenc> rbradford, we do already.
<tpepper1> there may be 29 other issues in error handling besides the trace
<markusry> There are multiple issues here
<rbradford> kristenc, oh, i've seen some places where that's not the case, but in this case I see that it is there.
<kristenc> so - this issue specifically is about how unhelpful the stack trace is.
<tpepper1> yes that was my intent
<markusry> 916 is about the traces output from ciao-cli that don't have any useful purpose
<markusry> Right
<kristenc> rbradford, I noticed recently there's a bug in cli where they don't print the returned error message.
<tpepper1> i retitled the issues
<tpepper1> issue
* kristenc forgot to file though
<kristenc> thanks - so for this issue, I would vote for P2 or P3.
<markusry> At least a P2 for me
<markusry> It's quite a bad user experience
<kristenc> ok - P2.
<kristenc> #917
<kristenc> singlevm runs all services as root.
<kristenc> we used to be able to run most ciao services without root.
<kristenc> then we had to change this - it was because of needing to access /var, right?
<mcastelino> kristenc, we need root for networking netlink calls and qemu
<mcastelino> the qemu issue is easier to solve
<kristenc> for launcher.
<markusry> kristenc: we could make it work by just chown /var/lib/ciao
<kristenc> ?
<markusry> in single VM
<markusry> setup.sh I mean
<markusry> I entered a separate bug for launcher
<markusry> So for me we should be able to fix this by creating /var/lib/ciao with the correct permissions
<tpepper1> isn't there an older issue for having ciao components run as a ciao user/group with them owning /var/lib/ciao?
<markusry> and then running controller and scheduler as non-root
<btwarden> Or can we even create it under ~/local instead?
<markusry> That's the other option
<tpepper1> doing /var/lib/ciao right is good for real usage and singlevm
<tpepper1> I'd prefer that route
<tpepper1> otherwise you "fix" just singlevm
<markusry> controller now allows you to select your prefix
<kristenc> it's be nice to fix the real install to not need to run the services as root either.
<markusry> We could do this for the other components
<tpepper1> also if the perms are wrong and glog fails to open the specified file it falls back to /tmp
<tpepper1> which might constitute a leak of info depening on the +r mode bits there
* p4tux ([email protected]) has joined
<kristenc> ok - so for priority - p2?
<markusry> okay for me
<kristenc> #918 then would be the same priority?
<markusry> Sounds good to me
<kristenc> #920 - I don't think we'll be moving away from sqlite3 in the near term (next 3 months).
<kristenc> rbradford, what is the current code coverage for sqlite code?
<tpepper1> 72 ish
<rbradford> yup
<kristenc> I think it's at a sufficient level then to not prioritize this highly, as long as any new code comes with unit tests.
<rbradford> there is one area that has no coverage
<rbradford> which is instances
<rbradford> i mean, i don't think they're an important part....
<rbradford> :-)
<kristenc> heh. well- the good news on that is we do get a lot of code coverage with bat tests on that.
<kristenc> so a breakage would likely be caught.
<rbradford> sure
<kristenc> i think we should just be careful if we touch any of the instances related code in sqlite3 to add a unit test to make sure we don't regress.
<kristenc> so I am thinking of a p3 for this.
<tpepper1> I'd label janitorial and p2 it
<tpepper1> would be a nice area to bulk up coverage and get the nice line by line views the tools give you
<kristenc> we'd need to establish a criteria for closing it.
<tpepper1> could guide / prioritize further additions or pragmatic skips
<kristenc> 100% coverage probably isn't realistic.
<tpepper1> if the issue says pragmatically add coverage for key mutations of instances, I'm happy
<tpepper1> not sure if you consider that "criteria"
<kristenc> ok - let's make this one about adding unit tests specifically for instances?
<kristenc> well - we can leave it, I don't want to spend our time on it now.
<kristenc> #921 
* Patifa ([email protected]) has joined
<kristenc> this issue is pretty broad, but I would make it a p3 janitorial.
<kristenc> parts should probably be broken out into separate issues when someone has the bandwidth to work on it.
<rbradford> good plan
* Patifa has quit (Client Quit)
<kristenc> #924
<kristenc> albertom, would you say this is a p1 bug for you?
<obedmr> I'd say yes
<albertom> its a dup
<albertom> of
<albertom> https://github.com/01org/ciao/issues/943
<albertom> which already has a fix
<kristenc> ok - we'll leave it as a p1, when you confirm the fix for 943 fixes this problem you can close it.
<albertom> ok
<kristenc> #928 - markusry can we close this? I've not had any problems lately.
<kristenc> although sadly I can't just go check the status and see. I will tomorrow.
<markusry> Yes, let's close it
<kristenc> ok.
<markusry> I'll re-open it if it re-appears
<kristenc> #929
<kristenc> this is a continuation of a feature.
<mrkz> obedmr: ^
<obedmr> yesp
<obedmr> yes, next sprint task
<kristenc> yes - that is what I was thinking.
<kristenc> so I would say it's a p1, but a feature not a bug, and we'll schedule it into the next sprint.
<obedmr> kristenc: sure
<obedmr> agree
<kristenc> #940 this is a dup of one we used to have that we closed because we all agreed storage should be bat tests only.
* mrkz has to run
<tpepper1> I  must've forgotten we agreed that
<kristenc> ok - sorry, we are running long. let's wrap up #940 then end this meeting.
<markusry> I try to figure out if I can compute the coverage of the BAT tests
* anunez9 has quit (Quit: Page closed)
<markusry> I'll I mean
<tpepper1> if we can figure that I bet we get a nice boost in computed coverage
* p4tux has quit (Remote host closed the connection)
<tpepper1> versus the coveralls reported number
<kristenc> so - do we need to revist the decision to have ciao-storage be unit tested? or should we close #940 wontfix?
<rbradford> tpepper1 adds some unit tests in his PR
* p4tux ([email protected]) has joined
<tpepper1> I suppose things that require a server to function are either bat-only on need testutil fakery
<tpepper1> the noop driver is trivially easy to move to 100% coverage :)
<kristenc> yeah - that was why we opted for bat only.
<kristenc> I stand by the original decision :). it would be nice to confirm that our test coverage is good with bat though.
<tpepper1> the only issue with that is we don't get a gate on it
<tpepper1> add broken code, don't test it in bat, nothing notices
<kristenc> something notices eventually. release would fail.
<kristenc> the storage bats are run nightly.
<tpepper1> that insures we don't regress for things covered by bat
<tpepper1> we don't have a gate to insure bat coverage isn't small and shrinking
<kristenc> that part is true - if mark can get coveralls to report on our bat coverage then code coverage % can be enforced with travis.
<tpepper1> yes
<tpepper1> so close it and we'll see where we gert
<tpepper1> get
<kristenc> ok.
<kristenc> 3 things before we quit.
<kristenc> 1) no meetings for dec 22,29
<kristenc> will most of the team be here next week?
<kristenc> or should we cancel that too?
* p4tux has quit (Ping timeout: 260 seconds)
<kristenc> also - we are really not keeping up with the bug triage.
<kristenc> I think we need to schedule a special meeting to get caught up.
<tpepper1> I'm here next week.  and am up for an extra triage meeting
<kristenc> ok - perhaps next week we could go 2 hours?
<rbradford> no mark and I
<kristenc> ah.
<kristenc> that's going to make it hard.
<kristenc> so - how about in Jan we dedicate a chunk of time to get caught up.
<kristenc> there will likely be weeks worth of issues to triage anyway due to the no meetings.
<kristenc> i wonder also if this is sustainable long term.
<kristenc> to try to have us triage as a group.
<kristenc> it was easy when there were a few issues per week.
* p4tux ([email protected]) has joined
<tpepper1> it would be nice to have time to discuss architecture and not just bugs
<kristenc> it becomes harder and harder with more issues piling up. it's be simpler to have a single person triage all the stuff.
<tpepper1> i think we need two meetings
<tpepper1> conversation around these things is important
<tpepper1> but we could also have a bug master like we have a merge master
<kristenc> personally, I'd rather just delegate bug triage to a person.
<kristenc> and let people complain later if they disagree.
<kristenc> i like the idea of a bug master.
<kristenc> does anyone think this is a bad idea?
<tpepper1> some portion of the bug triage is prioritizing work assignments.  should the team lead always be one of the bug masters?
<kristenc> yes - I can do it by myself for now. this is fine and makes sense to me if everyone is ok with it.
<kristenc> people will need to review the priorities at some point though to make sure I'm not off.
<tpepper1> would that happen if we did buddies like the merging? lead and a rotating buddy do triages?
<tpepper1> still implies a meeting if there's a 2nd person
<kristenc> tpepper1, yes - that is my hestitation. I want to make this easier and easier means less needing to coordinate with everyone.
<kristenc> if I did it myself, I could do it in 5 minutes a day.
<tpepper1> then let's stop talking and get doing :)
<kristenc> thanks. I am assuming silence from the rest of the crew means agreement.
<kristenc> so I will make it so.
<kristenc> that's actually very exciting. our meetings will now become utterly different.
<rbradford> cool
<kristenc> so next week - should we cancel since UK is out?
* Teresita-Warrior has quit (Quit: Leaving.)
<kristenc> I think we should.
<rbradford> i would recommend that
<kristenc> so unless people object, let's cancel next week as well. I'll update the wiki.
<kristenc> ok, this meeting is over.
<kristenc> I'll have to cut and paste the logs into the minutes later. sorry for the confusion with meetingbot, I wish I'd noticed it in time to fix it.

Weekly Meetings

Development

Architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weekly Meeting 2016 12 07

Agenda

Clone this wiki locally