15:28:43 <gmcharlt> #startmeeting Evergreen Development Meeting, 17 February 2016
15:28:43 <pinesol_green> Meeting started Wed Feb 17 15:28:43 2016 US/Eastern.  The chair is gmcharlt. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:28:43 <pinesol_green> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:28:43 <pinesol_green> The meeting name has been set to 'evergreen_development_meeting__17_february_2016'
15:29:09 <berick> gmcharlt++
15:29:13 <gmcharlt> #link http://wiki.evergreen-ils.org/doku.php?id=dev:meetings:2016-02-17
15:29:25 <gmcharlt> #topic Introductions
15:29:32 <gmcharlt> #info Galen Charlton, ESI, release manager for 2.10
15:29:47 <dbwells> #info dbwells = Dan Wells, Hekman Library (Calvin College)
15:29:59 <DPearl> #info dpearl = Dan Pearl, C/W MARS Inc.
15:30:19 <berick> #info berick Bill Erickson King County Library System
15:30:32 <jlitrell> #info jlitrell = Jake Litrell, MassLNC
15:30:41 <kmlussier> #info kmlussier = Kathy Lussier, MassLNC
15:31:38 <gmcharlt> #topic OpenSRF release
15:32:23 <gmcharlt> #info release of 2.4.2 to occur later in February; main purpose will be fixing bug 1350457
15:32:23 <pinesol_green> Launchpad bug 1350457 in OpenSRF "method_lookup drops session info" [High,Fix committed] https://launchpad.net/bugs/1350457
15:33:03 <csharp> #info csharp Chris Sharp, GPLS
15:33:30 <gmcharlt> #info a 2.5 series has been started; main proposed enhancements are support for passing client timezone and example websockets proxy configs
15:34:51 <gmcharlt> and in particular, if the Evergreen client time zone enhancement makes it in (bug 1485374), that will mean that OpenSRF 2.5.0 will become the minimum required version for Evergreen 2.10
15:34:51 <pinesol_green> Launchpad bug 1485374 in Evergreen "Use client TZ in the database when supplied to the server" [Wishlist,New] https://launchpad.net/bugs/1485374
15:34:55 <jeff> #info jeff == Jeff Godin, Traverse Area District Library (TADL)
15:35:12 <gmcharlt> so, that's that for my OpenSRF update; any comments or questions
15:35:25 <gmcharlt> by the way, jeff++ for sleuthing the failed fork issue
15:35:35 * gmcharlt awards jeff a virtual golden spork
15:35:41 <Dyrcona> :)
15:36:05 <jeff> :-)
15:36:10 <Dyrcona> #info Dyrcona = Jason Stephenson, MVLC
15:36:20 <gmcharlt> actually, we might discuss it now: specifically, the question of whether to fail fast or attempt to retry the fork
15:36:58 <Dyrcona> I vote for fail fast, because when you can't fork(), you're typically, um, forked.
15:37:31 <jeff> I'd like to limit the scope of the bug to fixing the issue of "the listener becomes the drone" -- this breaks an otherwise working system.
15:39:00 <Dyrcona> Well, I think you could possibly achieve what berick suggested on the bug by limiting the check to == 0 for the child pid.
15:39:03 <jeff> And given the choice of "client gets an error" or "client waits a little longer for a response", I opt for the latter.
15:39:22 <Dyrcona> Then if it was undefined, the listener would go on and do nothing, but that could make things worse.
15:40:21 <jeff> I agree with JBoyer that once the OOM killer has fired once, the machine is first in line for a reboot, but we don't currently try to detect or handle that kind of thing within OpenSRF/Evergreen at present, so that's probably left for tools better suited for the task.
15:40:29 <Dyrcona> I'm in favor of die(), but would live with wait().
15:40:37 <Dyrcona> Err, sleep() rahter.
15:41:40 <jeff> Current behavior is the entire service is dead, and client requests time out.
15:42:16 <Dyrcona> Well, that and the listener turns into a drone. ;)
15:42:17 <gmcharlt> I'm also in favor of die(), but maybe we can compromise? e.g., to handle the case where the OOM is just caused by a bad Clark report, let it retry once, then die()
15:43:04 <jeff> I think (roughly) "detect failed fork, log error, sleep, defer to the next available drone" leaves us with good flexibility for recovery.
15:43:13 <Dyrcona> Did berick suggest a retry count and/or configurable sleep interval? Those could work.
15:43:47 <berick> no, I suggested basically what jeff said a line up.
15:44:17 <jeff> Overall, the bug has existed for a while, and likely isn't causing frequent problems. I was going to mark it a Low priority before adding my next comment on it. :-)
15:44:45 <jeff> I do like that it has us thinking a bit more about how to handle failures.
15:45:27 <gmcharlt> OK, I think we can move on
15:45:33 <gmcharlt> #topic Evergreen 2.10 update
15:46:40 <gmcharlt> #info Feature slush is end of day on 19 February; any enhancements should have a pullrequest on the by then (and be plausibly ready to merge, although there'll be leeway for further work the following work)
15:47:14 <gmcharlt> I'm leaving the definition of "end of day" intentionally squishy, but it won't stretch further than start of business on Monday, 22 February
15:47:55 <kmlussier> I'm sure late Monday is start of business somewhere in the world. :)
15:48:06 <gmcharlt> heh
15:48:14 <gmcharlt> start of business for MEEEEE
15:48:21 <kmlussier> How are things looking as far as having the web client ready for some production use. Does it still seem doable?
15:48:40 <berick> i'll have a pull request on the patron editor stuff soon
15:48:48 <kmlussier> berick++
15:50:25 <berick> before start of business monday in Hawaii
15:50:27 <gmcharlt> kmlussier: other than that... I think it will amount to a more solid beta for circ desk; I think some more Hatch work would be necessary
15:50:38 <gmcharlt> to get it more to fully production-ready
15:51:27 <gmcharlt> another update: I've started using the rm-to-write-notes tag in LP to signify that I'll provide release notes entries when I cut 2.10
15:51:47 <gmcharlt> these are meant for minor enhancements that can be described in one line
15:52:11 <gmcharlt> so, moving on to specific bugs
15:52:23 <gmcharlt> #topic Password manage bug 1468422
15:52:23 <pinesol_green> Launchpad bug 1468422 in Evergreen "Improve Password Management and Authentication" [Undecided,New] https://launchpad.net/bugs/1468422
15:52:43 <gmcharlt> dbwells: berick: do you think it's basically ready for a pullrequest tag?
15:53:13 <berick> i believe so, yes, maybe w/ some light code cleanup/squashing
15:53:13 <gmcharlt> the other question, per the agenda, is dealing with the additional time it will take to log in
15:53:25 <berick> right..
15:53:35 <dbwells> I think so.  We've been running it in production for a while through various iterations, and no problems in the last few weeks.
15:53:41 <gmcharlt> berick: (and it occurs to me that you're in a great position to quickly check how high-volume SIP2 clients would deal with that)
15:53:56 <berick> gmcharlt: that's exactly my concern :(
15:54:09 * gmcharlt had a feeling
15:54:09 <berick> and how staff will feel about unhappy patrons, etc.
15:54:22 <dbwells> Nobody here has mentioned the additional login time, but we've got no automation of that sort.
15:54:51 <berick> thinking out of the gate, we lower the iteration count some.  we still get the benefits of better encryption and better data protection.
15:54:53 <gmcharlt> I think the patron OPAC login wait can be dealt with by putting up a spinner if need be
15:55:04 <gmcharlt> berick: what's the current iteration coutn?
15:55:10 <berick> 10, I think
15:56:22 <berick> actualy, no, 14
15:56:43 <gmcharlt> berick: Koha uses a bcrypt variant with 8 iterations
15:56:55 <jeff> In your and dbwells' testing, what is the wallclock difference in time required to log in?
15:57:26 <berick> jeff: quick test shows .1 seconds to 1 second (roughly)
15:57:34 <berick> for just the auth calls combined
15:57:37 <kmlussier> berick: Is that an OPAC login?
15:57:40 <berick> i.e. via srfsh
15:57:47 <berick> opac will take a little longer w/ API overhead
15:57:57 <dbwells> maybe 1 second for us, I'd say
15:58:10 <berick> gmcharlt: good to know...
15:58:17 <Dyrcona> Does this require any client changes for logging in?
15:58:31 <berick> Dyrcona: no, it's all backwards compat.
15:58:48 <Dyrcona> berick: Thanks. I wasn't sure.
15:59:06 <gmcharlt> berick: dbwells: would it be relatively straightforward to add a way to tweak the iterations per password? in particular, to make it possible to lower it a bit for SIP2 accounts?
15:59:20 <gmcharlt> (if need be)
15:59:51 <berick> gmcharlt: hmm, as it stands, the iter count is configured by password type, not by individual  password.
16:01:09 <dbwells> Not totally sure how the SIP2 workflow works, but one possibility would be an AuthProxy.pm plugin (or similar) to bypass the internal password check entirely.
16:01:31 <berick> we could look at moving it to the password.  not sure what that would take OTTOMY
16:01:58 <gmcharlt> right
16:02:30 <dbwells> i.e. There are new methods to get you logged in without actually logging in the client-y way.
16:02:54 <gmcharlt> so, a suggestion: we put a pullrequest on it; the number of iterations can be tuned based on benchmarking before we cut 2.10
16:02:55 <jeff> i've been meaning to dust off that "stop requiring that the ILS password be the SIP2 password" branch, too. might relate.
16:03:21 <gmcharlt> and I'll be willing to accept a late PR for a bug to tweak things for SIP2 authentication
16:03:28 <berick> gmcharlt: sounds good.  I'll start by trying 8 and see how that feels.
16:03:36 <berick> and I'll add a pullrequest
16:03:53 <gmcharlt> OK
16:04:13 <jeff> berick: since your environment was cited as having "high-volume SIP2 clients", are they really logging in often?
16:04:55 <jeff> berick: i.e., once logged in once with an increased ~1s delay, aren't all all subsequent SIP2 messages unaffected?
16:05:06 <jeff> (per SIP2 client-server session)
16:05:44 <gmcharlt> patron requests with PINs would get checked (though that's not relevant to the particular workflow that I think berick and I have in mind)
16:05:51 <berick> jeff: we do have some sip clients that log in and out w/ every auth check.  (not many, but at least 2 I can think of).  I'm actually more concerned about the patron auth checks that occur via sip
16:06:01 <jeff> ah.
16:06:04 <jeff> kill them with fire.
16:06:12 * jeff grins
16:06:18 <jeff> (i know, out of scope)
16:06:50 * gmcharlt sings the stunnel/SSH port-forwarding/VPN song, since SIP2 is immortal!
16:06:53 * gmcharlt then weeps
16:07:01 <gmcharlt> OK, I think we can move on
16:07:11 <gmcharlt> #topic Squitch (bug 1521693)
16:07:11 <pinesol_green> Launchpad bug 1521693 in Evergreen "Investigate using Sqitch for database change management" [Wishlist,New] https://launchpad.net/bugs/1521693 - Assigned to Bill Erickson (berick)
16:07:38 <gmcharlt> so, I think where this stands is that berick presented this at the hackfest
16:07:46 <gmcharlt> got a lot of nods that this looks useful
16:07:55 <gmcharlt> and then... that's that
16:08:17 <gmcharlt> have folks played around with the branch at all?
16:08:41 <jeff> alas, not i.
16:09:07 * JBoyer finally can pay some attention.
16:09:18 <Dyrcona> No time. :(
16:09:49 <berick> and to clarify, the reason I'm pushing on this is that it will require a lot of effort to maintain the branch in limbo.  my strong pref. would be to move forward or kill it (for now).
16:10:15 <JBoyer> Hm. I haven't had a chance to look at the branch either. :( I do very much like the idea though; dependency based db building is great.
16:10:23 <berick> each DB upgrade has to be cross-ported, which I was doing for a while.  but now i've set it aside.
16:10:34 <berick> pending this discussion
16:10:47 <gmcharlt> my inclination (at the moment) would be to put it aside for 2.10, but merge it into master immediately after rel_2_10 is branched
16:11:04 <berick> agreed I was also assuming this would be post-2.10
16:11:22 <gmcharlt> that would then give us plenty of time at the beginning of the 2.11 cycle to see if we like it during dev
16:11:23 <JBoyer> I'd say let the existing branch sit as a proof of concept, unless it's so close to ready as to go in as gmcharlt mentioned.
16:11:47 <JBoyer> Talking it up a lot at the conference may get more exposure and opinoins.
16:11:52 <jeff> +1 to post-2.10 squitch merge
16:11:52 <berick> it could be made ready in a day or 2 if needed.  (can't say the same in 6 months, though)
16:12:18 <gmcharlt> and much easier to make ready immediately after 2.10 is cut, if I'm understanding things correctly
16:13:47 * berick should probably do another demo/discussion somewhere at the conf.
16:14:11 <jeff> berick: i'll drink to that.
16:14:13 <kmlussier> berick: I would appreciate seeing another demo
16:14:18 <berick> OK, I'll revisit after 2.10, spruce it up and slap a pullrequest on it then
16:14:28 <berick> jeff: kmlussier: OK.
16:14:31 <JBoyer> It would be a good state of evergreen thing too, "This is what we're thinking, speak now or etc."
16:14:37 <gmcharlt> OK, so one way or another, we have candidates for a set point in time for introducing Sqitch into master (either 2.10 release or right after the EG conference); I encourage folks to try out the proof-of-concept branch in the meantime
16:15:16 <gmcharlt> that brings us to the end of the agenda; are there any other topics that folks want to (briefly) raise before I end the meeting?
16:15:57 <gmcharlt> #action berick planning on making a pullrequest for sqitch support after 2.10; will also be a discussion item at conference
16:17:56 <gmcharlt> hearing no clamor for discussing another topic... I hereby
16:17:58 <gmcharlt> #endmeeting