13:01:23 <kmlussier> #startmeeting 2016-03-11 - Evergreen focus group discussion on search 13:01:23 <pinesol_green> Meeting started Fri Mar 11 13:01:23 2016 US/Eastern. The chair is kmlussier. Information about MeetBot at http://wiki.debian.org/MeetBot. 13:01:23 <pinesol_green> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 13:01:23 <pinesol_green> The meeting name has been set to '2016_03_11___evergreen_focus_group_discussion_on_search' 13:01:30 <kmlussier> #topic Introductions 13:01:44 <kmlussier> Please introduce yourselves, as follows 13:01:50 <kmlussier> #info kmlussier is Kathy Lussier, MassLNC 13:02:04 <tspindler> #info tspindler is Tim Spindler, C/W MARS 13:02:14 <Elaine> #info Elaine is Elaine Hardy PINES/GPLS 13:02:30 <Cybrarian> #info Cybrarian is Jennifer Walz from Asbury University. 13:03:40 <kmlussier> While we wait for more people to come in, I'm posting a link to some ground rules I wrote up. http://wiki.evergreen-ils.org/doku.php?id=scratchpad:search:focus_groups 13:03:59 <kmlussier> Basically, what they say is this is primarily a brainstorming discussion to discuss what we would like to see in search. 13:04:26 <berick> #info berick is Bill Erickson / KCLS 13:04:34 <Cybrarian> Will we talk about specifics of the software? Or in general about opacs and book searching? 13:04:41 <kmlussier> So it's really an opportunity to talk about ideas. Although it's tempting, especially for technical folks, to get in the nitty gritty of how to make those ideas happen, we don't want to get bogged down in the details today. 13:04:50 <kmlussier> Cybrarian: We're talking specifically about Evergreen. 13:04:55 <tspindler> kmlussier: the ground rules link returned a 404 error 13:05:04 <kmlussier> bah 13:05:13 <Elaine> I got the 404 error as well 13:05:17 <kmlussier> Oh, the line broke 13:05:33 <kmlussier> #link http://wiki.evergreen-ils.org/doku.php?id=scratchpad:search:focus_groups 13:05:37 <kmlussier> That should work. 13:05:51 <tspindler> kmlussier ++ 13:05:54 <Elaine> It does work, thanks! 13:06:08 <kmlussier> Also, as is the case with any brainstorming, there are no stupid ideas. Feel free to share whatever ideas you have. And please don't be critical of others ideas. 13:06:19 <kmlussier> However, I enourage you to build on each other's ideas. 13:06:28 <kmlussier> At this point, I think I've repeated everything that's in the link. :) 13:06:52 <kmlussier> I'm going to kick off the discussion, but if people wander in late, please feel free to introduce yourselves. 13:07:02 <kmlussier> #topic Strengths of current search 13:07:03 <Cybrarian> With regards to the evergreen interface for the public, I would really like a way to let them search by collection, not just media type. 13:07:17 <kmlussier> OK, we'll get to improvements soon :) 13:07:27 <kmlussier> I want to start by asking people to identify what they like about the current Evergreen search. What are its strengths? Is there anything that Evergreen search has that makes it unique? 13:07:43 <Cybrarian> It is really quick. 13:08:07 <Cybrarian> The interface is uncluttered for the most part. 13:08:24 <Elaine> Retrieval is comprehensive 13:08:25 <tspindler> I'm not sure this is related to search but the ability to customize the interface is a strong point 13:08:54 <Elaine> Good filtering 13:09:01 <kmlussier> tspindler: Well, there is also the ability to customize what you search, so in that respect, I would say it's related to search. 13:09:03 <tspindler> Filtering ++ 13:09:26 <kmlussier> Elaine: And is that the pre-set filters from the advances search screen, the facets, or all of the above? 13:09:32 <Cybrarian> I like the filtering, but also would like additional options to customize further. 13:09:38 <berick> agreed on filtering, specifically item level (and thereabouts) filtering is strong/flexible 13:09:44 <Cybrarian> I like the ability to customize the interface - to a point. 13:09:52 <Elaine> all of the above 13:09:52 <tspindler> filtering on advanced screen is very good 13:10:13 <Cybrarian> I like the browse by shelf feature - though it does confuse people some. 13:10:39 <kmlussier> Yes, the browse shelf feature is nice. 13:11:03 <Cybrarian> Advanced ++ 13:11:03 <kmlussier> The reason I ask this question is because, if we decide to make changes to search, we want to make sure we don't lose the things that are already working well for us. 13:11:38 <Elaine> I don't think the problem is the search but rather how it displays the results to you. 13:11:52 <Cybrarian> I always say that you should never change any functional parts, instead add new options for people to chose. 13:12:08 <Cybrarian> I will agree with Elaine on the display. 13:12:16 <kmlussier> Cybrarian: Yes, true. I agree. But sometimes, it's easier said than done. :) 13:12:36 <kmlussier> And if you're rebuilding (not necessarily saying that we are) it's easy to forget some things if they aren't explicitly stated. 13:12:50 <Elaine> I should say the main problem. I do have some problems when we get to that part of the discusson 13:13:05 <Cybrarian> Indeed. Because the function has become part of the language of what you do, you forget that it is essential. 13:13:17 <kmlussier> Are there other strengths people would like to bring up? Or anything that makes Evergreen's search unique when compared to other systems? 13:13:50 <phasefx> not that we'd ever go in the opposite direction, but it's nice having "sessionless" searches 13:13:59 <kmlussier> phasefx: YES! 13:13:59 <phasefx> so URL's are shareable 13:14:02 <Cybrarian> One feature? is how the subjects are searched. 13:14:15 <kmlussier> Cybrarian: Can you expand upon that? 13:14:30 <Cybrarian> Well, we also just did a re-index so now it is better. 13:14:55 <Cybrarian> But I still am not completely satisfied with how the LC subject headings get searched. 13:15:05 <Cybrarian> But I am still not sure that I understand why. 13:15:26 <Cybrarian> I think our re-index has helped some. 13:15:44 <kmlussier> OK, I think I'm going to move to the next topic, and maybe we can get more info on that. 13:15:54 <kmlussier> But if you think of anything else, feel free to shout it out later on. 13:15:57 <Cybrarian> I'm also a little "old school" when it comes to LCSH. :-) 13:16:06 <kmlussier> #topic Areas for improvement 13:16:14 <kmlussier> Where are the areas where you would like to see improvement? 13:16:24 <kmlussier> And we've already touched upon a few of these. 13:16:26 <Elaine> Authority links should be in advanced search 13:16:30 <tspindler> kmlussier has heard this from us but it is speed and relevance 13:16:42 <berick> 1. search speed. 2. indexing/ingest speed 13:16:56 <tspindler> ++ to indexing/ingest speed 13:17:20 <Elaine> When you search author should be keyword -- when I search for au Mary Jones I don't want to see titles with Mary smith and David Jones 13:17:27 <Cybrarian> More browse options? 13:17:36 <jihpringle> a "did you mean..." would be awesome (especially for the KPAC) 13:17:38 <Cybrarian> Searching by collection 13:18:09 <tspindler> improvements to failed searches, in particular I would like to see a 0 results search at least on title and author drop you into browse search 13:18:37 <Cybrarian> More hyperlinking 13:18:47 <kmlussier> Elaine: Are you saying that when you do an author search, the hits retrieved are based in words in other fields? 13:19:05 <Elaine> I would like to see an intermediate result screen so that if you searched for au mary jones you would get a list of authors not a list of titles. 13:19:06 <kmlussier> Cybrarian: What kind of browse options would you like to see? 13:19:07 <tspindler> including see references in author search index and subject search index 13:19:36 <Cybrarian> ++ for Elaine suggestion! 13:19:47 <Elaine> Kmlussier -- hits are based on words in all author fields for an author search, for example 13:20:08 <phasefx> * UI for hand-tweaking metarecords 13:20:09 <abneiman> from the lurk-gallery +1 on search by collection -- we have a sorta-kludge in place for this thanks to ESI but would be nice to have it native. Also +1 on "did you mean". 13:20:11 <kmlussier> tspindler: Can I rephrase that one in a different way? Maybe...finding good ways to make use of cross-references when keyword searches? It might be adding them to indexes, but maybe there are other ways to make use of them. 13:20:24 <Cybrarian> Browse - call number, publisher, date? 13:20:28 <tspindler> kmlussier: that's good 13:20:29 <kmlussier> Elaine: Ah, gotcha. Thanks for explaining. 13:20:34 <phasefx> #info phasefx is Jason Etheridge, ESI 13:20:54 <Elaine> tspindler -- that is what I mean by tying to authority file 13:21:15 <kmlussier> phasefx: I've just been delving into metarecords recently. Do you mean handtweaking for when there are bad groupings? Or do you envision other tweaking? 13:21:23 <tspindler> including prefixes in call number searches 13:21:42 <phasefx> kmlussier: putting records in and out of groupings for whatever reason 13:21:50 <kmlussier> phasefx: Excellent! Thanks! 13:22:11 <tspindler> phasefx: what kinds of groupings, not sure I understand 13:22:18 <kmlussier> Cybrarian: I'm also interested in what you mean when you say searching collections? Is this something outside of using copy locations? 13:22:30 <kmlussier> tspindler: For the group formats and editions searches 13:22:35 <Cybrarian> Browse by journal title would also be nice. :-) 13:22:42 <phasefx> * ability to display canned information for specific searches 13:23:12 <Cybrarian> I probably do mean copy locations, but right now we can't seem to do even that. 13:23:26 <Cybrarian> And I have not seen an implementation of it that is what we really need. 13:23:42 <tspindler> I know our cataloging head has issues with LC searches and subheadings but trying to remember details 13:23:47 <berick> phasefx: can you expand (on canned)? 13:23:48 <Elaine> Cybrarian: copy locations should appear at branch level searching 13:24:01 <Cybrarian> Maybe a way to do a combination of copy location and circ mod? I think we want to be too creative. :-) 13:24:19 <kmlussier> Cybrarian: There is no too creative for today. Too creative comes up later. 13:24:25 <phasefx> berick: well, not just canned, but perhaps non-bib linked data as well. For example, how Google shows non-search results for celebrities, etc. Wikipedia links, images 13:24:44 <kmlussier> Cybrarian: As I said, we don't want to dive too far into details, but I wonder if copy location groups might help. I would be willing to follow up with you later on that. 13:24:45 <Elaine> phasefx: linked data!! 13:24:53 <tspindler> phasefx++ 13:25:00 <Cybrarian> Thanks Kathy! 13:25:05 <berick> phasefx: ah, gotcha. getting into linked data (woohoo) 13:25:19 <Elaine> and BIBFRAME 13:25:23 <kmlussier> That's a great idea phasefx 13:25:25 <berick> Elaine: yep 13:25:25 <phasefx> could also be a way to promote certain things when on-topic 13:25:48 <phasefx> the opening of a new library feature, etc. that is pertinent to the search 13:25:49 <tspindler> better normalization also 13:25:50 <Elaine> That is where cataloging is heading but not there yer 13:26:48 <kmlussier> OK, we have a lot of improvements mentioned here. Are there features you've seen in other search systems that you would like to see implemented in Evergreen? 13:27:09 <kmlussier> A few have been mentioned so far. Like phasefx's canned searches idea and the did you mean? suggestion. 13:27:31 <Cybrarian> For our students, they would like to have more features like creating list of books and then saving / keeping the list. 13:27:37 <Elaine> No one else does this anymore -- cross references within an authority record should be implemented 13:27:38 <Cybrarian> But without having to login. 13:27:47 <dbs> I think it's important to separate canned searches from "related info cards" (which may or may not be based on linked data) 13:28:02 <phasefx> Cybrarian: I've always wanted a better way to "consume" and share bookbags/booklists 13:28:05 <Cybrarian> Or emailing. :-) 13:28:08 <phasefx> dbs++ 13:28:18 <dbs> Focus on what you want to have as a result, versus how it's accomplished under the covers 13:28:19 <tspindler> if we are talking pie in the sky, maybe discoverability so that instead of loading Overdrive we link to their database ala z39.50 13:28:31 <kmlussier> dbs: Yes, I think that's a very good point. 13:29:07 <Elaine> tspindler ++ I would rather not put all those bib records for a e-resource collection that might change next month 13:29:08 <tspindler> ..or link some other way 13:29:25 <dbs> Cybrarian: phasefx: it would be cool for searches to also turn up "Here are some lists that people put together on that topic" for example? 13:29:37 <Cybrarian> For the output, it would also be nice to be able to select the data you want to "keep". Not just the default stuff. 13:30:08 <kmlussier> Cybrarian: So you're saying people would be able to define which fields are kept when you save titles to a list? 13:30:18 <kmlussier> Cybrarian: Sorry, that was output. When printing, emailing, etc. 13:30:20 <phasefx> dbs: that would be awesome 13:30:23 <Cybrarian> kmlussier - yes. 13:30:28 <kmlussier> gotcha 13:30:47 <tspindler> better my lists management, ability to select multiple titles and add at once 13:31:11 <Elaine> When search caps out as 10,000, system should tell you 13:31:17 <Cybrarian> tspindler - yes! 13:32:14 <Elaine> Not have lists reload everytime you delete a title.... 13:32:19 <kmlussier> Elaine: OK, I think I saw something on that in an email too. So you're talking about the system not retrieving all search results with very broad searches? 13:33:11 <Elaine> kmlussier: yes -- user should no not all bib records retrieved 13:33:19 <Elaine> know!! 13:33:39 <kmlussier> Elaine: no worries, I understood. It happens in here all the time. :) 13:33:48 <kmlussier> Too difficult to type quickly. 13:34:13 <Elaine> And spell at the same time 13:34:15 <dbs> Something like "You searched for 'the', and that's not really cool, can you try adding some more specific keywords please?" 13:34:21 <kmlussier> I think I'm about ready to move on to the next topic. But I just want to give another minute in case anyone else has ideas on something they think would really bring search to the next level. 13:34:41 * phasefx was thinking canned info could help with broad searches as well 13:34:56 <Cybrarian> Just a question: can you do phrase searching? 13:35:05 <Cybrarian> I don't really know... 13:35:06 <kmlussier> yes 13:35:07 <Elaine> Not necessarily more keywords -- try filtering 13:35:09 <dbs> (right now we just immediately return 200 OK and pretend a search never happened based on some local Apache rewrites for broooad searches) 13:35:26 <Cybrarian> Does it work with quotes? 13:35:41 <Elaine> Cybrarian -- yes with quotes 13:35:41 <kmlussier> Cybrarian: Yes, if you wrap it in quotes, the search terms are searched as a phrase. 13:35:49 <Cybrarian> Thanks! 13:35:55 <kmlussier> Also, quoting also forces the system to search the exact terms, not a stemmed variation. 13:36:23 <kmlussier> I like these ideas of having more user assistance when we have overly broad searches. Excellent! 13:36:53 <kmlussier> OK, I'm going to move on to our next topic, which is specific to relevance ranking. 13:36:56 <kmlussier> #topic Defining relevance 13:37:27 <kmlussier> As tspindler mentioned above, relevance is something that comes up as an area for improvement. But relevance for one person might not be so relevant for another. 13:37:38 <kmlussier> When ranking search results, which factors should play a strong role in relevancy? 13:37:47 <kmlussier> Are there specific places where you find Evergreen is falling short on returning relevant results? 13:38:11 <Elaine> Relevance is much better than it was 13:38:13 <tspindler> proximity ranking (isn't this non existent right now?) 13:38:15 <Cybrarian> I try to avoid relevancy at all costs. 13:38:54 <kmlussier> tspindler: No, I'm pretty sure that it looks at proximity. 13:39:29 <tspindler> i don't have examples but it seemed that some search results i have had suggested it wasn't paying attention but I could be wrong 13:39:53 <Elaine> It does seem like proximity is not always adhered to 13:39:54 <kmlussier> But if you're looking at search results and don't think records with word proximity are ranking higher, it doesn't mean the system isn't paying attention to it. It might not be doing it at a level you expect it. 13:39:55 <dbs> proximity is part of the density scoring in postgresql's full text search 13:40:08 <phasefx> I haven't thought this through, but a pie in the sky feature may be to the let the patron give a hint for what is relevant, e.g. "for a paper", "for leisure", "from a fever dream" 13:40:25 <dbs> but how that plays out in practice may differ, so examples of where expectations are not met are welcome 13:40:28 <kmlussier> "fever dream" I like it! 13:40:30 <Cybrarian> Word count? 13:41:05 <kmlussier> Cybrarian: So you're saying the number of times a word appears should influence its relevance? 13:41:05 <dbs> http://www.sai.msu.su/~megera/postgres/fts/fts.pdf for a classic, dated, but still relevant (hah) 77-page intro to full-text search that Evergreen currently relies on :) 13:41:10 <tspindler> kmlussier: i think my staff have provided better examples than I can come up with right now, I know you have them 13:41:11 <Elaine> phasefx: but even if fever dream, user still wants life of a crazy cat lady to be near the top when they search 13:41:21 <kmlussier> tspindler: Yes, I do. And probably more. :) 13:41:43 <Cybrarian> Read their minds? 13:41:44 <phasefx> Elaine: we can put that up near the top for _every_ search :) 13:42:00 <kmlussier> Cybrarian: I'm working on that technology, but it's still a few years away. :) 13:42:01 <Elaine> phasefx: works for me. 13:42:04 <Cybrarian> yes, word count means the number of times the keyword entered appears in the record. 13:42:14 <kmlussier> gotcha 13:42:15 <dbs> Cybrarian: yes that's there 13:42:21 * Dyrcona is still working on the "read user's mind patch." Should be ready any day now. :) 13:42:34 <tspindler> d 13:42:35 <kmlussier> Dyrcona obviously works more quickly than I do. 13:42:41 <tspindler> Dyrcona++ 13:42:42 <phasefx> Cybrarian: word count that is, not mind reading :) 13:43:00 <tspindler> Dyrcona: the question is, do the spell correctly in their mind ;) 13:43:25 <kmlussier> Yes, and so the question is what should be a factor in relevance. So many of those things are probably available in Evergreen. 13:43:41 <kmlussier> And then the followup (which maybe should have been asked later), is if you see Evergreen falling short. 13:44:37 <kmlussier> Things that have been mentioned thus far, then, is that word count, and proximity should play a part in relevance. Did I miss anything? 13:45:04 <tspindler> i thin popularity has a role also 13:45:12 <tspindler> i know its coming 13:45:32 <phasefx> and maybe self-fulfilling :) 13:45:46 <Elaine> I don't necessarily see the value of word count -- words in a title might only appear once in a record' 13:45:50 <kmlussier> OK, so the amount of use a title gets should also play into relevance. 13:45:55 <dbwells> As many here know, the primary driver of relevance in current Evergreen is "cover density". It's a fairly complex algorithm which accounts for the most typical factors in "average" sets of text. I think any improvements we could make would involve knowledge of how our data is not average. 13:46:06 <kmlussier> Elaine: Well, I think that's a key element, then. Where the words are located, right? 13:46:35 <Cybrarian> Funny story - I just had to help a student find a book in the catalog. :-) 13:46:38 <kmlussier> dbwells: That's interesting. In what ways is our data not average? 13:46:49 <kmlussier> Cybrarian: Were your search results relevant? 13:46:51 <Elaine> kmlussier: yes -- 245 for title should always retrieve first, for example 13:47:06 <dbwells> As others suggest, if we could easily weight title/author matches higher, that would be a win. We know those words are more important than average. 13:47:14 <kmlussier> Elaine: OK, good, so we not only have word count, but more importantly, where they should appear. 13:47:23 <kmlussier> And we do have that ability in Evergreen, but it's good to note that it's important.' 13:48:08 <kmlussier> dbwells: I'm going to pick up on something you said there. 'easily' Because we can weight author/title, but is it easy to do so? Especially for those who are new to Evergreen? 13:48:12 <linuxhiker> Did the search discussion already end? 13:48:15 <dbwells> It's not a new idea, of course, so the harder question is how. Not sure if we are supposed to talk about that part yet. 13:48:22 <kmlussier> linuxhiker: It will end in about 10 minutes 13:48:23 <Elaine> I also think not having to first navigate a long list of titles would be beneficial to most users regardless of the type of search 13:48:25 <dbs> Easily and without impacting speed, of course :) 13:48:28 <Cybrarian> Relevance to me is "did the words appears in the subject heading". 13:48:32 <kmlussier> dbwells: No, I'm trying to avoid hows for now. 13:49:16 <kmlussier> Cybrarian: I think it's important for each library to define which fields they want to be relevant. Because what's relevant in an academic environment may differ from a public or a k-12 13:50:06 <kmlussier> OK, any other thoughts on where Evergreen may fall short on relevance? Because I'm about ready to ask my final question. 13:50:48 <kmlussier> #topic Highest priority for improvement 13:51:04 <kmlussier> If you could only improve two things in Evergreen search, what would those 2 things be? 13:51:10 <linuxhiker> speed 13:51:12 <kmlussier> And be sure to focus on search for this question. 13:51:17 <Cybrarian> true - relevance factors should be set for a location. 13:51:26 <Elaine> Better cross references and having that intermediate returns screen 13:51:33 <tspindler> speed and relevance 13:51:44 <tspindler> speed might be higher than relevance 13:51:55 <Cybrarian> I'm with Elaine 13:52:21 <kmlussier> Elaine: I'm having trouble keeping up. Can you remind me what you mean by intermediate returns screen. I'm sure it's up higher in the disucssion. 13:52:25 <Elaine> If I have a list of authors names Jones, Mary, rather than a list of several thousands of titles, I could drill down to what I want more readily 13:52:30 <Cybrarian> I'd vote for speed first though. 13:53:13 <kmlussier> Anyone else? There were way more people talking earlier who haven't answered this question. 13:54:09 <dbwells> We wrote a custom search engine in our pre-EG days. One thing we had then was a way to *lower* relevancy based on certain factors. For example, one loss of relevancy came from being in a certain special collection. Another decrement came from having titles greater than 200 characters (or some very long length). I should go back and look up if we had any novel ideas back then :) 13:54:45 <linuxhiker> I would note that due to speed AND relevancy, I know of at least one major library system that now outsources their evergreen search to a different technology 13:54:46 <tspindler> dbwells: were the long titles special collections also? 13:54:52 <kmlussier> Well, there are two more chats coming up with which to share those ideas if you find them. 13:56:01 <dbwells> tspindler: we just did a broad survey of search results, and found a lot of stuff from, for example, the 1800s which had multi-sentence titles, and were getting to the top of many lists based on title "matching". 13:56:21 <kmlussier> OK, I'm going to wrap things up then. But feel free to let me know if you get any other ideas. 13:56:24 <kmlussier> #endmeeting