10:00:59 #startmeeting 2016-03-08 - Evergreen focus group discussion on search 10:00:59 Meeting started Tue Mar 8 10:00:59 2016 US/Eastern. The chair is kmlussier. Information about MeetBot at http://wiki.debian.org/MeetBot. 10:00:59 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 10:00:59 The meeting name has been set to '2016_03_08___evergreen_focus_group_discussion_on_search' 10:01:24 #info Please read the ground rules for this discussion at http://wiki.evergreen-ils.org/doku.php?id=scratchpad:search:focus_groups 10:01:33 #topic Introductions 10:01:40 Please introduce yourselves, as follows 10:01:48 #info kmlussier is Kathy Lussier, MassLNC 10:02:06 #info jeff is Jeff Godin, Traverse Area District Library (TADL) 10:02:16 #info ScottThomas is Scott Thomas, Pennsylvania Integrated Library System 10:02:32 #infor alynn26 is Lynn Floyd, Anderson County Library, SCLENDS 10:02:42 #info rhamby is Rogan Hamby 10:03:01 #info jlundgren is Jeanette Lundgren, C/W MARS 10:03:55 #info JBoyer is Jason Boyer, Evergreen Indiana 10:04:06 OK, there are a couple of people who signed up for this one who haven't introduced themselves yet. If anyone else arrives, please feel free to introduce yourselves at any time. 10:04:52 To start, I just want to explain that the goal of this discussion is mainly to have a high-level discussion on what you all think an ideal search system looks like for Evergreen. 10:05:42 It really is just a time to brainstorm at this point. What I would like to do from these sessions is to pull all of the input together so that we're better able to paint a picture of what we ultimately want to see in Evergreen. 10:06:15 #info Yamil Suarez - Berklee College of music (mostly lurking) yboston 10:06:17 If we can eventually gain community consensus around a discussion, my hope is that we can ultimately do some development to make it happen. But that step is further down the road. 10:06:27 So I'm going to start with my first question. 10:06:36 #topic Strengths of current search 10:06:43 I want to start by asking people to identify what they like about the current Evergreen search. What are its strengths? 10:07:48 * kmlussier finds it easier to facilitate focus group discussions in person because she can stare people down. :) 10:08:18 * jeff laughs 10:08:18 Talking is also faster than typing (for better or worse, depending on the speaker ; )) 10:08:20 EG search can handle diacritics 10:08:27 faceted searching 10:08:36 yboston++ #Awesome, thanks for getting us started! 10:08:58 There is an attempt at auto complete. 10:09:26 EG can handle left anchored and right anchored search 10:09:26 ScottThomas: The autosuggest that tries to predict what you're typing? 10:10:06 left anchored and right anchored search: that means it can do partial matching on either side of a word 10:10:12 I believe the more correct term is autosuggest. Sorry. 10:10:34 ScottThomas: No, I think you can use either term. I just wanted to make sure we were talking about the same things. :) 10:11:08 EG can sort results in various ways: like date created 10:11:09 The reason I ask this question is because, if we make major changes to search, we want to make sure we don't lose the things that are already working. 10:11:31 So think of the question in terms of things you don't want to lose. I think all of the thoughts raised so far fall within that category. 10:12:01 EG allowed me to create a custom index on a bib 9xx tag (p45) that my library uses for song titles 10:12:49 I was able to include "song titles" as one of the choices on top of keyword/titl/author, etc 10:12:56 yboston: So it sounds as if you like the ability to create your own indexes, even if those indexes don't fall in lines with larger community needs? 10:13:06 kmlussier: yes 10:13:18 * kmlussier agrees that this is a very powerful feature of the current system. 10:13:41 Are there any other elements of search we want to make sure we keep before we move on to our next topic? 10:13:52 EG allowed me to specify a custom "format" of "electronic score" using the Multi Valued Field feature 10:14:12 It makes use of the MARC fixed fields when determining format. Our old ILS did not do that. 10:14:14 I'm hearing "flexibility" :-) 10:14:32 jeff: Yes, I think that's right. A lot of flexibility. 10:15:09 I would say depth also. It allows access to searching a lot of the MARC data that other ILSes often hide. It's usefulness depends on the quality of cataloging but that will vary. :) 10:15:40 rhamby: Great, thanks for throwing that out there. 10:16:01 I'm going to move on to the next topic now, but if you think of anything else while we continue the discussion, feel free to put it out there. 10:16:13 #topic Areas of improvement 10:16:22 Where are the areas where you would like to see improvement in Evergreen's search? 10:16:58 I think there's been a consensus among larger groups around speed. 10:17:28 as powerful and useful that "stemming" can be; currently the implementation can be useless for certain words like: music, musician, musical 10:17:30 Yes, speed was a big focus of the discussion the developers had at the hack-a-way. 10:17:36 (As in more of it, that was kind of a partial thought) 10:17:43 since they all stem to "music" 10:18:03 (is stemmign the right word in EG/postgres? can't remember) 10:18:14 yboston: yes. 10:18:26 One of the things we hear from patrons and staff is "unforgiving". 10:18:38 yboston: So it sounds like you're acknowledging that fuzziness is a desired feature, but that you would like to see more precision within that context? 10:18:48 Parallelization would also be helpful for this, and may allow other interesting improvements. 10:18:50 jeff: hmmm...can you elaborate on that? 10:19:09 (note: stemming is extremely configurable at the PG level, we just don't surface the pg features through the staff client) 10:19:18 kmlussier: yes, I kinda want it both ways to a degree 10:20:13 kmlussier: What, you want more than a single word? ;-) 10:20:21 JBoyer: Also, can you explain more what you mean by parallelization and where it could result in improvements? 10:20:32 jeff: How about 10? ;) 10:21:04 miker: thanks for bringing this up, I looked at the PG optiosn but was not sure what to try 10:21:33 kmlussier: "unforgiving" meaning that if you provide a search string that is not found in a record, barring stemming and possibly some other things, you won't get the results you're looking for. 10:22:05 kmlussier: it sounds silly when you say "i want to see matches for things i didn't search for", but... that's the user expectation. 10:22:19 kmlussier: "find what i mean, not what i searched for" 10:22:23 jeff: that is one thing I hear from around here. 10:22:25 jeff: And these are things that if they searched in Amazon or Google, they would probably get the match they're looking for, right? 10:22:28 kmlussier: or, in two words: "be fuzzier" 10:23:00 What I'm hearing, in conjunction with yboston's comments, is smarter about the fuzziness. 10:23:02 Currently the speed of a search is limited to how fast a single core can handle the results of a query limited to 10000 (configurable) records. If instead we were running 10 cores against the results of 1000 records it would be difficult not to be somewhat faster. The other interesting improvements part I hadn't fully formed, I think I mean that in a larger backend context than just search. 10:23:03 kmlussier: pretty much. 10:23:34 JBoyer: Thanks for the further explanation. 10:24:10 jeff, kmlussier: for better or worse the best example I can think of for that is searching for au:Meyer,stephanie (that's not how she spells it) and the the catalog saying that we don't have any of the Twilight books. 10:24:19 Since we're talking about the fuzziness vs. precision question, are there other ways the system could assist the user in finding what they need? 10:24:21 would fuzziness also address better search results for one word titles and alternate author names 10:24:21 re: jeff's point, that is. 10:24:42 I've always wondered how Amazon does their indexing and stemming because it is so much friendlier than any OPAC I've ever tried. 10:25:20 Here we get hurt by document density(?), when we have a great bib records with lots of extra 7xx and 945 (song title tags) for a musical album, but the top choices are tiny horrid MARC record with only a 245 and a 100 tag for elctronic records 10:25:25 ScottThomas: I could be wrong, but I think some of it is based on user behavior. 10:25:54 ScottThomas: If a title is offered and users click on it a lot, it's seen as a better match. 10:26:11 ScottThomas: a big part of amazon is human curation, and another is the fact that they link their stuff 10:26:28 (and what kmlussier said) 10:26:33 yboston: Yes, I hear that too. Shorter records seem to get higher relevance. 10:26:44 also, money. :-) 10:26:52 jlundgren: I don't know if fuzziness would address it, but could you describe the problem you're seeing? 10:26:53 jeff: also, data ;) 10:27:05 miker: also, data. agreed. :-) 10:27:15 jeff: Amazon has more money, but then those who don't have the money can learn from their lessons, right? :) 10:27:27 with one word titles, they are often not at the top of search results especially with common words (ex. The Blue) 10:28:18 yboston / kmlussier: re record quality, we already try to track that, and could easily make it more important. record size (by field count) is a huge component of that already... 10:28:21 OK, so better relevance for one-word titles (I'm guessing 245a 10:28:47 For the author names, our libraries get frustrated when the name on the book doesn't match the MARC and results aren't returned when they type for example Judith Jance vs JA Jance 10:29:08 Just a note, as we review the notes for these focus groups, we will follow up with what's doable and how to do it, so that we don't have to go into all the dirty details of what miker is mentioning now. 10:29:33 jlundgren: OK, thank you for the further clarification. 10:29:44 kmlussier: the algos that amazon/google use are 1) based on linked data 2) hard (as in, complicated, and cpu-expensive) ... ISTM that every other retailer would have good search if it was just a matter of learning from amazon 10:30:01 miker: gotcha 10:30:49 also, the search interface of large companies are not without error or fault. we shouldn't get hung up on imitating them. 10:30:51 So I'm hearing comments in terms of better speed, better balance between fuzziness and precision, some relevance issues. 10:30:55 kmlussier: gotcha (re details) ... I'll hold off, except where there may be a misunderstanding about what does exist (assuming I'm in the room -- which I'm not officially right now ;) ) 10:31:16 (It's a ghost!) 10:31:21 miker: You do a good job of not being in the room 10:31:21 BOO 10:31:50 kmlussier: ... 10:32:00 ScottThomas: In the things we like part, you mentioned attempt at autocomplete. The way you worded it made me think there was also a "room for improvement" piece to it. 10:32:02 i think that summarizing (at least the points i mentioned) as "be more like {google|amazon|whatever}" would be doing ourselves a disservice. 10:32:22 * dbs simul-attending a conf call, but wants to respond to kmlussier's "what do you want" with "speed - particularly with formats, copy locations, other options enabled" 10:32:25 jeff: No, I have no intention of summarizing it that way. 10:32:34 jeff++ 10:32:37 dbs: Great, thanks! 10:33:03 * csharp notes that the original PINES focus group flipchart sheets from 2004/2005 are full of "search more like google/amazon"-type items 10:33:12 jeff: I was just using it as a reference point as to why users are expecting to find records when the search terms don't match. 10:33:37 Let's remember that CPU power is less of an issue for Amazon to throw at problems than it is for libraries since they own what might be the largest server farms in the world. :) 10:33:58 "nice to haves" include: "show me EVERYTHING with copies in these locations at these libraries" 10:34:04 Autocomplete is a nice feature, but it seems wonky. When I type in "Harry" in the Keyword index, "Harriet" (Title) is the first thing that appears. 10:34:07 Autosuggest needs improvements 1) work for screen readers (though it worked fine recetnly with a blind Berklee professor) 2) be able to "highlight" in red lettters with diacritics 10:34:15 as opposed to "search for all records then filter them by a shelving location / etc" 10:35:10 jeff: use-specific api? IOW, does that need to be "search"? 10:35:16 rhamby: also, remember that there are ways to avoid needing prohibitive amounts of CPU. we're running most in-building search on a pretty modest VM or two. 10:35:18 it sounds like "browse" to me 10:36:19 Just a reminder that this is brainstorming for now to throw ideas out there. 10:36:20 miker: I take that to mean "Here are some additional constraints, I should use these to tighten up the core query rather than checking for them ATF." 10:36:25 yboston: that's great news (autosuggest working recently with a screen reader) 10:36:36 I attended a session on Blacklight yesterday and one of the presenters afterwards made a point about the goals of search vs the goals of data storage and that using the same mechanism for both can be problematic 10:36:45 One thing I find missing, is Item searching completely. I am thinking I want all items ( Not titles) that have a certain shelving location and a certain Circ Modifier at a certain library. 10:36:56 miker: yup. no objection here. it might be nice if it could be somewhat transparent -- the backend decides to use the use-specific API based on the search input. 10:37:20 JBoyer: I took it as "what's on this shelf (and maybe let me order by my choice of attribute/field)" ... 10:37:21 alynn26: OK, so you would like to do more searching at the item level. 10:37:50 yes, rather than having to run reports everytime. 10:37:54 alynn26: that suggests a reporting interface or use case to me -- do you think the item-level view is something mostly for staff, or patrons? 10:37:57 Since we're about 35 minutes in, I'm would like to move on to the next question. 10:38:06 alynn26: is that something you want to expose to patrons? 10:38:16 I meant to mention this earlier: Regularly there are requests to search Copy Notes. I would assume that could be staff only, unless you want to look up the items your grandfather left to the library or something similar. 10:38:17 jeff: staff mostly 10:38:23 * miker just lets jeff read his mind from now on ;) 10:38:59 alynn26: yeah, we have nice reports for those kinds of things. sometimes people then click through to the opac view for a given bib from the report, which is handy... 10:39:16 #topic Features from other search tools / features that make Evergreen unique 10:39:22 miker: True, re: the "EVERYTHING" part, but in a broader sense if I'm limiting to a specific library the CQ could in theory be limited to items they hold. 10:39:26 Are there features you see in other search tools, particularly those that, like Evergreen, are primarily searching metadata, that you would like to see implemented in Evergreen? Conversely, is there something Evergreen search has that makes it unique? 10:40:04 (This is fresh on my mind recently because there's a lib with a common keyword that won't return results until you up the core limit to 50K+...) 10:40:29 For clarification, I'm highlighting search tools that search metadata because I think tools like Google are an entirely different animal. Full-text searching is different than what we're doing. 10:42:19 kmlussier: while realizing that you don't want to steer/influence the conversation too much, do you have any examples of such tools in mind? it might help to understand where you're coming from. 10:42:55 In a previous ILS I again had created a local index for ib tag 945 (for song titels). Unlike EG, when I searched for a song it only matched words ina single 945 tag. I beleive EG matches on words that may appear in adjoining 945 tags? 10:42:57 Yeah, I have lots of ideas, but I don't like to steer the conversation. An example of something that comes up all the time here is "Did you mean?" 10:42:58 I'll also admit that 99%+ of my searching happens in a FTS context (Google, Bing) I'm actually a poor library/journal user. 10:43:14 facet for publication date to limit result set by year of publication 10:43:22 I think this "adjoining tag words" can happen with repeated tags??? 10:43:27 (n EG) 10:45:39 jlundgren: OK, I want to summarize that as facets that aren't based on data in MARC fields, but, in this case, the data is in the MARC fields. You would just want that facet to be treated differently than we currently do facets? 10:45:40 (yboston: yes, you can search across all 945's in any given record, if you choose) 10:45:53 miker: I want the opposite 10:45:54 jlundgren: IOW, I'm thinking you don't just want a facet with a long list of dates, but maybe a way to specify a range of dates. 10:46:11 yboston: you can have the opposite, too :) 10:46:17 miker: not sure how to do it 10:46:29 yboston: let's discuss later ;) 10:46:34 * miker dodges kmlussier 10:46:39 miker: thanks!!! 10:46:49 JBoyer: If there are features from those searches you think would be useful in Evergreen, go with it. Sorry, I didn't mean to keep you from bringing one up. :) 10:46:54 If you had one in mind 10:46:57 kmlussier: yes that is correct 10:48:04 I'm going to give this question a minute longer before I move on. 10:48:13 Maybe everything had already been covered by the previous topic. 10:48:29 jlundgren / kmlussier: "typed limiters" 10:48:36 yes? 10:48:44 kmlussier: That was me seconding jeff's request that you maybe give an example of another metadata search tool. :) My primary desire would be a Did you Mean, which would be most easily implemented by storing every Q in the db and looking for "similar" ones. Difficult. 10:49:04 or, "typed facets" I guess ... so that they're data type or context aware 10:49:12 "did you mean" can be "here, we tried your terms against some authority records" 10:49:19 "it's a year" ... "it's a size" ... etc 10:49:21 or lots of other options. 10:49:26 Yeah, I 10:49:42 jeff: that's actually what the jspac did, fwiw 10:49:48 authority records, I mean 10:49:49 All 3 are possible ways to do implement it. 10:50:14 I'm not particularly interested right now in how the "did you mean?" gets implemented, but just that it's an example of a feature. Other examples of metadata search tools would be other library catalogs or even Amazon, which isn't searching the full text of the book. 10:50:29 But I also don't want to re-enter the discussion of 'don't just make it like Amazon.' 10:51:00 * kmlussier is moving on for now 10:51:09 #topic Factors used in relevance 10:51:23 I'd like to see a "users that did this also did this" type of feature, but it's not necessarily search specific (and patron privacy concerns may lurk thereabouts) 10:51:46 We already addressed this a bit above, but the question comes from conversations I've had where we say, 'we want to improve relevance,' and the first question I get back is 'what do you mean by relevance?" 10:52:05 My question, then, is what factors do you think are important when ranking search results by relevance? 10:52:41 JBoyer: Thanks for putting that out there. It might not be primarily search, but I do think it's related because it helps people find what they're looking for. 10:53:40 Another way to phrase the question, when you type a search, what should determine which record comes up first? 10:53:48 Sorry to jump in late, but I think our biggest desire is also generally in the speed camp, but more specifically, consistency and predictability of that speed. People generally think (and rightly) that smaller result sets means quicker turnaround, but then get flummoxed when trying to reduce the set size by choosing a location or org_unit, and this actually makes their search take longer. 10:53:55 The obvious one is recency. Patrons seem to want to see new stuff first. 10:54:13 ScottThomas: So recency should play a role. Good. 10:54:14 Date published, holdings count (more = better), circ count in the last year (possibly as a percentage of circs across a dewey range or subject group, that's very fuzzy) 10:54:28 ScottThomas: But I assume recency isn't the only thing if those search words are buried in the notes, right? 10:54:51 as a coutner point, I don;t think for academics libraries there is such a strong desire for recency 10:54:53 JBoyer: Intersting, so you want to make sure activity/popularity is included. 10:54:59 "recency" can really bury relevant results for some users/searches. 10:55:23 we once had a request for "the default search sort should be pubdate newest to oldest" -- it was withdrawn rather quickly. 10:55:38 Of course, any one factor used can make one search better, and another search worse. Search is hard. 10:55:43 * jeff nods 10:56:23 i've drawn the comparison to at least one children's tool -- where you push down on this peg, and this other peg pops up... :-) 10:56:32 From the discussion up above, I know shorter records was seen as an issue with relevance, I'll read the logs to see if there are any others that came up. 10:56:42 s/tool/toy/ 10:56:43 jeff++ 10:56:51 Our demographics slew older so our patrons seem to like sorting results newest to oldest because they are used to pre-Google OPACs. 10:57:16 I already mentioned earlier that cover density has frustrated us, when we have really short (and sloppy) records always float to the top. but apparently there are ways that can be addressed 10:57:30 ScottThomas: When you say newer, are you referring to publication date or when the record was added? 10:57:50 * kmlussier will need to leave more time for the relevance question next time. 10:58:31 We have two minutes left, so I want to ask my last question. 10:58:40 also, we woudl want more relevance to certain tags like 1xx or 245, but currently it has a big performace hit to do that(?) 10:59:28 yboston: Relevance for specific MARC fields. I can answer your question regarding performance after we wrap up. 10:59:44 But I'm noting it down as something that is seen as important for relevance. 10:59:52 Last question 10:59:55 We talked about lots of improvements here, but if there was just one improvement that you would make to search, what would it be? 11:00:14 The one and only thing you would do to make search better. 11:00:23 speed 11:00:25 speed. 11:00:49 same. 11:01:28 (and hopefully goes without saying, but speed without degredation of quality) 11:01:52 jeff: I think it should be stated. 11:01:58 Anyone else? 11:02:33 A certain speed threshold is magic elixir for search. Users are much more willing to accept a bad search result if trying again is super fast. 11:02:53 OK, I'm going to end the meeting, but if anyone else who was actively participating wants to chime in on that question, feel free to let me know. I'll be here all day. 11:02:56 #endmeeting