#evergreen log

13:01:23 <kmlussier> #startmeeting 2016-03-11 - Evergreen focus group discussion on search
13:01:23 <pinesol_green> Meeting started Fri Mar 11 13:01:23 2016 US/Eastern.  The chair is kmlussier. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:23 <pinesol_green> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:23 <pinesol_green> The meeting name has been set to '2016_03_11___evergreen_focus_group_discussion_on_search'
13:01:30 <kmlussier> #topic Introductions
13:01:44 <kmlussier> Please introduce yourselves, as follows
13:01:50 <kmlussier> #info kmlussier is Kathy Lussier, MassLNC
13:02:04 <tspindler> #info tspindler is Tim Spindler, C/W MARS
13:02:14 <Elaine> #info Elaine is Elaine Hardy PINES/GPLS
13:02:30 <Cybrarian> #info Cybrarian is Jennifer Walz from Asbury University.
13:03:40 <kmlussier> While we wait for more people to come in, I'm posting a link to some ground rules I wrote up. http://wiki.evergreen-ils.org/doku.php?id=scratchpad:search:focus_groups
13:03:59 <kmlussier> Basically, what they say is this is primarily a brainstorming discussion to discuss what we would like to see in search.
13:04:26 <berick> #info berick is Bill Erickson / KCLS
13:04:34 <Cybrarian> Will we talk about specifics of the software?   Or in general about opacs and book searching?
13:04:41 <kmlussier> So it's really an opportunity to talk about ideas. Although it's tempting, especially for technical folks, to get in the nitty gritty of how to make those ideas happen, we don't want to get bogged down in the details today.
13:04:50 <kmlussier> Cybrarian: We're talking specifically about Evergreen.
13:04:55 <tspindler> kmlussier: the ground rules link returned a 404 error
13:05:04 <kmlussier> bah
13:05:13 <Elaine> I got the 404 error as well
13:05:17 <kmlussier> Oh, the line broke
13:05:33 <kmlussier> #link http://wiki.evergreen-ils.org/doku.php?id=scratchpad:search:focus_groups
13:05:37 <kmlussier> That should work.
13:05:51 <tspindler> kmlussier ++
13:05:54 <Elaine> It does work, thanks!
13:06:08 <kmlussier> Also, as is the case with any brainstorming, there are no stupid ideas. Feel free to share whatever ideas you have. And please don't be critical of others ideas.
13:06:19 <kmlussier> However, I enourage you to build on each other's ideas.
13:06:28 <kmlussier> At this point, I think I've repeated everything that's in the link. :)
13:06:52 <kmlussier> I'm going to kick off the discussion, but if people wander in late, please feel free to introduce yourselves.
13:07:02 <kmlussier> #topic Strengths of current search
13:07:03 <Cybrarian> With regards to the evergreen interface for the public, I would really like a way to let them search by collection, not just media type.
13:07:17 <kmlussier> OK, we'll get to improvements soon :)
13:07:27 <kmlussier> I want to start by asking people to identify what they like about the current Evergreen search. What are its strengths? Is there anything that Evergreen search has that makes it unique?
13:07:43 <Cybrarian> It is really quick.
13:08:07 <Cybrarian> The interface is uncluttered for the most part.
13:08:24 <Elaine> Retrieval is comprehensive
13:08:25 <tspindler> I'm not sure this is related to search but the ability to customize the interface is a strong point
13:08:54 <Elaine> Good filtering
13:09:01 <kmlussier> tspindler: Well, there is also the ability to customize what you search, so in that respect, I would say it's related to search.
13:09:03 <tspindler> Filtering ++
13:09:26 <kmlussier> Elaine: And is that the pre-set filters from the advances search screen, the facets, or all of the above?
13:09:32 <Cybrarian> I like the filtering, but also would like additional options to customize further.
13:09:38 <berick> agreed on filtering, specifically item level (and thereabouts) filtering is strong/flexible
13:09:44 <Cybrarian> I like the ability to customize the interface - to a point.
13:09:52 <Elaine> all of the above
13:09:52 <tspindler> filtering on advanced screen is very good
13:10:13 <Cybrarian> I like the browse by shelf feature - though it does confuse people some.
13:10:39 <kmlussier> Yes, the browse shelf feature is nice.
13:11:03 <Cybrarian> Advanced ++
13:11:03 <kmlussier> The reason I ask this question is because, if we decide to make changes to search, we want to make sure we don't lose the things that are already working well for us.
13:11:38 <Elaine> I don't think the problem is the search but rather how it displays the results to you.
13:11:52 <Cybrarian> I always say that you should never change any functional parts, instead add new options for people to chose.
13:12:08 <Cybrarian> I will agree with Elaine on the display.
13:12:16 <kmlussier> Cybrarian: Yes, true. I agree. But sometimes, it's easier said than done. :)
13:12:36 <kmlussier> And if you're rebuilding (not necessarily saying that we are) it's easy to forget some things if they aren't explicitly stated.
13:12:50 <Elaine> I should say the main problem. I do have some problems when we get to that part of the discusson
13:13:05 <Cybrarian> Indeed.  Because the function has become part of the language of what you do, you forget that it is essential.
13:13:17 <kmlussier> Are there other strengths people would like to bring up? Or anything that makes Evergreen's search unique when compared to other systems?
13:13:50 <phasefx> not that we'd ever go in the opposite direction, but it's nice having "sessionless" searches
13:13:59 <kmlussier> phasefx: YES!
13:13:59 <phasefx> so URL's are shareable
13:14:02 <Cybrarian> One feature? is how the subjects are searched.
13:14:15 <kmlussier> Cybrarian: Can you expand upon that?
13:14:30 <Cybrarian> Well, we also just did a re-index so now it is better.
13:14:55 <Cybrarian> But I still am not completely satisfied with how the LC subject headings get searched.
13:15:05 <Cybrarian> But I am still not sure that I understand why.
13:15:26 <Cybrarian> I think our re-index has helped some.
13:15:44 <kmlussier> OK, I think I'm going to move to the next topic, and maybe we can get more info on that.
13:15:54 <kmlussier> But if you think of anything else, feel free to shout it out later on.
13:15:57 <Cybrarian> I'm also a little "old school" when it comes to LCSH.  :-)
13:16:06 <kmlussier> #topic Areas for improvement
13:16:14 <kmlussier> Where are the areas where you would like to see improvement?
13:16:24 <kmlussier> And we've already touched upon a few of these.
13:16:26 <Elaine> Authority links should be in advanced search
13:16:30 <tspindler> kmlussier has heard this from us but it is speed and relevance
13:16:42 <berick> 1. search speed.  2. indexing/ingest speed
13:16:56 <tspindler> ++ to indexing/ingest speed
13:17:20 <Elaine> When you search author should be keyword -- when I search for au Mary Jones I don't want to see titles with Mary smith and David Jones
13:17:27 <Cybrarian> More browse options?
13:17:36 <jihpringle> a "did you mean..." would be awesome (especially for the KPAC)
13:17:38 <Cybrarian> Searching by collection
13:18:09 <tspindler> improvements to failed searches, in particular I would like to see a 0 results search at least on title and author drop you into browse search
13:18:37 <Cybrarian> More hyperlinking
13:18:47 <kmlussier> Elaine: Are you saying that when you do an author search, the hits retrieved are based in words in other fields?
13:19:05 <Elaine> I would like to see an intermediate result screen so that if you searched for au mary jones you would get a list of authors not a list of titles.
13:19:06 <kmlussier> Cybrarian: What kind of browse options would you like to see?
13:19:07 <tspindler> including see references in author search index and subject search index
13:19:36 <Cybrarian> ++ for Elaine suggestion!
13:19:47 <Elaine> Kmlussier -- hits are based on words in all author fields for an author search, for example
13:20:08 <phasefx> * UI for hand-tweaking metarecords
13:20:09 <abneiman> from the lurk-gallery +1 on search by collection -- we have a sorta-kludge in place for this thanks to ESI but would be nice to have it native.  Also +1 on "did you mean".
13:20:11 <kmlussier> tspindler: Can I rephrase that one in a different way? Maybe...finding good ways to make use of cross-references when keyword searches? It might be adding them to indexes, but maybe there are other ways to make use of them.
13:20:24 <Cybrarian> Browse - call number, publisher, date?
13:20:28 <tspindler> kmlussier: that's good
13:20:29 <kmlussier> Elaine: Ah, gotcha. Thanks for explaining.
13:20:34 <phasefx> #info phasefx is Jason Etheridge, ESI
13:20:54 <Elaine> tspindler --  that is what I mean by tying to authority file
13:21:15 <kmlussier> phasefx: I've just been delving into metarecords recently. Do you mean handtweaking for when there are bad groupings? Or do you envision other tweaking?
13:21:23 <tspindler> including prefixes in call number searches
13:21:42 <phasefx> kmlussier: putting records in and out of groupings for whatever reason
13:21:50 <kmlussier> phasefx: Excellent! Thanks!
13:22:11 <tspindler> phasefx: what kinds of groupings, not sure I understand
13:22:18 <kmlussier> Cybrarian: I'm also interested in what you mean when you say searching collections? Is this something outside of using copy locations?
13:22:30 <kmlussier> tspindler: For the group formats and editions searches
13:22:35 <Cybrarian> Browse by journal title would also be nice.  :-)
13:22:42 <phasefx> * ability to display canned information for specific searches
13:23:12 <Cybrarian> I probably do mean copy locations, but right now we can't seem to do even that.
13:23:26 <Cybrarian> And I have not seen an implementation of it that is what we really need.
13:23:42 <tspindler> I know our cataloging head has issues with LC searches and subheadings but trying to remember details
13:23:47 <berick> phasefx: can you expand (on canned)?
13:23:48 <Elaine> Cybrarian: copy locations should appear at branch level searching
13:24:01 <Cybrarian> Maybe a way to do a combination of copy location and circ mod?   I think we want to be too creative.  :-)
13:24:19 <kmlussier> Cybrarian: There is no too creative for today. Too creative comes up later.
13:24:25 <phasefx> berick: well, not just canned, but perhaps non-bib linked data as well.  For example, how Google shows non-search results for celebrities, etc.  Wikipedia links, images
13:24:44 <kmlussier> Cybrarian: As I said, we don't want to dive too far into details, but I wonder if copy location groups might help. I would be willing to follow up with you later on that.
13:24:45 <Elaine> phasefx: linked data!!
13:24:53 <tspindler> phasefx++
13:25:00 <Cybrarian> Thanks Kathy!
13:25:05 <berick> phasefx: ah, gotcha.  getting into linked data (woohoo)
13:25:19 <Elaine> and BIBFRAME
13:25:23 <kmlussier> That's a great idea phasefx
13:25:25 <berick> Elaine: yep
13:25:25 <phasefx> could also be a way to promote certain things when on-topic
13:25:48 <phasefx> the opening of a new library feature, etc. that is pertinent to the search
13:25:49 <tspindler> better normalization also
13:25:50 <Elaine> That is where cataloging is heading but not there yer
13:26:48 <kmlussier> OK, we have a lot of improvements mentioned here. Are there features you've seen in other search systems that you would like to see implemented in Evergreen?
13:27:09 <kmlussier> A few have been mentioned so far. Like phasefx's canned searches idea and the did you mean? suggestion.
13:27:31 <Cybrarian> For our students, they would like to have more features like creating list of books and then saving / keeping the list.
13:27:37 <Elaine> No one else does this anymore -- cross references within an authority record should be implemented
13:27:38 <Cybrarian> But without having to login.
13:27:47 <dbs> I think it's important to separate canned searches from "related info cards" (which may or may not be based on linked data)
13:28:02 <phasefx> Cybrarian: I've always wanted a better way to "consume" and share bookbags/booklists
13:28:05 <Cybrarian> Or emailing.  :-)
13:28:08 <phasefx> dbs++
13:28:18 <dbs> Focus on what you want to have as a result, versus how it's accomplished under the covers
13:28:19 <tspindler> if we are talking pie in the sky, maybe discoverability so that instead of loading Overdrive we link to their database ala z39.50
13:28:31 <kmlussier> dbs: Yes, I think that's a very good point.
13:29:07 <Elaine> tspindler ++ I would rather not put all those bib records for a e-resource collection that might change next month
13:29:08 <tspindler> ..or link some other way
13:29:25 <dbs> Cybrarian: phasefx: it would be cool for searches to also turn up "Here are some lists that people put together on that topic" for example?
13:29:37 <Cybrarian> For the output, it would also be nice to be able to select the data you want to "keep".   Not just the default stuff.
13:30:08 <kmlussier> Cybrarian: So you're saying people would be able to define which fields are kept when you save titles to a list?
13:30:18 <kmlussier> Cybrarian: Sorry, that was output. When printing, emailing, etc.
13:30:20 <phasefx> dbs: that would be awesome
13:30:23 <Cybrarian> kmlussier - yes.
13:30:28 <kmlussier> gotcha
13:30:47 <tspindler> better my lists management, ability to select multiple titles and add at once
13:31:11 <Elaine> When search caps out as 10,000, system should tell you
13:31:17 <Cybrarian> tspindler - yes!
13:32:14 <Elaine> Not have lists reload everytime you delete a title....
13:32:19 <kmlussier> Elaine: OK, I think I saw something on that in an email too. So you're talking about the system not retrieving all search results with very broad searches?
13:33:11 <Elaine> kmlussier: yes -- user should no not all bib records retrieved
13:33:19 <Elaine> know!!
13:33:39 <kmlussier> Elaine: no worries, I understood. It happens in here all the time. :)
13:33:48 <kmlussier> Too difficult to type quickly.
13:34:13 <Elaine> And spell at the same time
13:34:15 <dbs> Something like "You searched for 'the', and that's not really cool, can you try adding some more specific keywords please?"
13:34:21 <kmlussier> I think I'm about ready to move on to the next topic. But I just want to give another minute in case anyone else has ideas on something they think would really bring search to the next level.
13:34:41 * phasefx was thinking canned info could help with broad searches as well
13:34:56 <Cybrarian> Just a question:   can you do phrase searching?
13:35:05 <Cybrarian> I don't really know...
13:35:06 <kmlussier> yes
13:35:07 <Elaine> Not necessarily more keywords -- try filtering
13:35:09 <dbs> (right now we just immediately return 200 OK and pretend a search never happened based on some local Apache rewrites for broooad searches)
13:35:26 <Cybrarian> Does it work with quotes?
13:35:41 <Elaine> Cybrarian -- yes with quotes
13:35:41 <kmlussier> Cybrarian: Yes, if you wrap it in quotes, the search terms are searched as a phrase.
13:35:49 <Cybrarian> Thanks!
13:35:55 <kmlussier> Also, quoting also forces the system to search the exact terms, not a stemmed variation.
13:36:23 <kmlussier> I like these ideas of having more user assistance when we have overly broad searches. Excellent!
13:36:53 <kmlussier> OK, I'm going to move on to our next topic, which is specific to relevance ranking.
13:36:56 <kmlussier> #topic Defining relevance
13:37:27 <kmlussier> As tspindler mentioned above, relevance is something that comes up as an area for improvement. But relevance for one person might not be so relevant for another.
13:37:38 <kmlussier> When ranking search results, which factors should play a strong role in relevancy?
13:37:47 <kmlussier> Are there specific places where you find Evergreen is falling short on returning relevant results?
13:38:11 <Elaine> Relevance is much better than it was
13:38:13 <tspindler> proximity ranking (isn't this non existent right now?)
13:38:15 <Cybrarian> I try to avoid relevancy at all costs.
13:38:54 <kmlussier> tspindler: No, I'm pretty sure that it looks at proximity.
13:39:29 <tspindler> i don't have examples but it seemed that some search results i have had suggested it wasn't paying attention but I could be wrong
13:39:53 <Elaine> It does seem like proximity is not always adhered to
13:39:54 <kmlussier> But if you're looking at search results and don't think records with word proximity are ranking higher, it doesn't mean the system isn't paying attention to it. It might not be doing it at a level you expect it.
13:39:55 <dbs> proximity is part of the density scoring in postgresql's full text search
13:40:08 <phasefx> I haven't thought this through, but a pie in the sky feature may be to the let the patron give a hint for what is relevant, e.g. "for a paper", "for leisure", "from a fever dream"
13:40:25 <dbs> but how that plays out in practice may differ, so examples of where expectations are not met are welcome
13:40:28 <kmlussier> "fever dream" I like it!
13:40:30 <Cybrarian> Word count?
13:41:05 <kmlussier> Cybrarian: So you're saying the number of times a word appears should influence its relevance?
13:41:05 <dbs> http://www.sai.msu.su/~megera/postgres/fts/fts.pdf for a classic, dated, but still relevant (hah) 77-page intro to full-text search that Evergreen currently relies on :)
13:41:10 <tspindler> kmlussier: i think my staff have provided better examples than I can come up with right now, I know you have them
13:41:11 <Elaine> phasefx: but even if fever dream, user still wants life of a crazy cat lady to be near the top when they search
13:41:21 <kmlussier> tspindler: Yes, I do. And probably more. :)
13:41:43 <Cybrarian> Read their minds?
13:41:44 <phasefx> Elaine: we can put that up near the top for _every_ search :)
13:42:00 <kmlussier> Cybrarian: I'm working on that technology, but it's still a few years away. :)
13:42:01 <Elaine> phasefx: works for me.
13:42:04 <Cybrarian> yes, word count means the number of times the keyword entered appears in the record.
13:42:14 <kmlussier> gotcha
13:42:15 <dbs> Cybrarian: yes that's there
13:42:21 * Dyrcona is still working on the "read user's mind patch." Should be ready any day now. :)
13:42:34 <tspindler> d
13:42:35 <kmlussier> Dyrcona obviously works more quickly than I do.
13:42:41 <tspindler> Dyrcona++
13:42:42 <phasefx> Cybrarian: word count that is, not mind reading :)
13:43:00 <tspindler> Dyrcona: the question is, do the spell correctly in their mind ;)
13:43:25 <kmlussier> Yes, and so the question is what should be a factor in relevance. So many of those things are probably available in Evergreen.
13:43:41 <kmlussier> And then the followup (which maybe should have been asked later), is if you see Evergreen falling short.
13:44:37 <kmlussier> Things that have been mentioned thus far, then, is that word count, and proximity should play a part in relevance. Did I miss anything?
13:45:04 <tspindler> i thin popularity has a role also
13:45:12 <tspindler> i know its coming
13:45:32 <phasefx> and maybe self-fulfilling :)
13:45:46 <Elaine> I don't necessarily see the value of word count -- words in a title might only appear once in  a record'
13:45:50 <kmlussier> OK, so the amount of use a title gets should also play into relevance.
13:45:55 <dbwells> As many here know, the primary driver of relevance in current Evergreen is "cover density".  It's a fairly complex algorithm which accounts for the most typical factors in "average" sets of text.  I think any improvements we could make would involve knowledge of how our data is not average.
13:46:06 <kmlussier> Elaine: Well, I think that's a key element, then. Where the words are located, right?
13:46:35 <Cybrarian> Funny story - I just had to help a student find a book in the catalog. :-)
13:46:38 <kmlussier> dbwells: That's interesting. In what ways is our data not average?
13:46:49 <kmlussier> Cybrarian: Were your search results relevant?
13:46:51 <Elaine> kmlussier: yes -- 245 for title should always retrieve first, for example
13:47:06 <dbwells> As others suggest, if we could easily weight title/author matches higher, that would be a win.  We know those words are more important than average.
13:47:14 <kmlussier> Elaine: OK, good, so we not only have word count, but more importantly, where they should appear.
13:47:23 <kmlussier> And we do have that ability in Evergreen, but it's good to note that it's important.'
13:48:08 <kmlussier> dbwells: I'm going to pick up on something you said there. 'easily' Because we can weight author/title, but is it easy to do so? Especially for those who are new to Evergreen?
13:48:12 <linuxhiker> Did the search discussion already end?
13:48:15 <dbwells> It's not a new idea, of course, so the harder question is how.  Not sure if we are supposed to talk about that part yet.
13:48:22 <kmlussier> linuxhiker: It will end in about 10 minutes
13:48:23 <Elaine> I also think not having to first navigate a long list of titles would be beneficial to most users regardless of the type of search
13:48:25 <dbs> Easily and without impacting speed, of course :)
13:48:28 <Cybrarian> Relevance to me is "did the words appears in the subject heading".
13:48:32 <kmlussier> dbwells: No, I'm trying to avoid hows for now.
13:49:16 <kmlussier> Cybrarian: I think it's important for each library to define which fields they want to be relevant. Because what's relevant in an academic environment may differ from a public or a k-12
13:50:06 <kmlussier> OK, any other thoughts on where Evergreen may fall short on relevance? Because I'm about ready to ask my final question.
13:50:48 <kmlussier> #topic Highest priority for improvement
13:51:04 <kmlussier> If you could only improve two things in Evergreen search, what would those 2 things be?
13:51:10 <linuxhiker> speed
13:51:12 <kmlussier> And be sure to focus on search for this question.
13:51:17 <Cybrarian> true - relevance factors should be set for a location.
13:51:26 <Elaine> Better cross references and having that intermediate returns screen
13:51:33 <tspindler> speed and relevance
13:51:44 <tspindler> speed might be higher than relevance
13:51:55 <Cybrarian> I'm with Elaine
13:52:21 <kmlussier> Elaine: I'm having trouble keeping up. Can you remind me what you mean by intermediate returns screen. I'm sure it's up higher in the disucssion.
13:52:25 <Elaine> If I have a list of authors names Jones, Mary, rather than a list of several thousands of titles, I could drill down to what I want more readily
13:52:30 <Cybrarian> I'd vote for speed first though.
13:53:13 <kmlussier> Anyone else? There were way more people talking earlier who haven't answered this question.
13:54:09 <dbwells> We wrote a custom search engine in our pre-EG days.  One thing we had then was a way to *lower* relevancy based on certain factors.  For example, one loss of relevancy came from being in a certain special collection.  Another decrement came from having titles greater than 200 characters (or some very long length).  I should go back and look up if we had any novel ideas back then :)
13:54:45 <linuxhiker> I would note that due to speed AND relevancy, I know of at least one major library system that now outsources their evergreen search to a different technology
13:54:46 <tspindler> dbwells: were the long titles special collections also?
13:54:52 <kmlussier> Well, there are two more chats coming up with which to share those ideas if you find them.
13:56:01 <dbwells> tspindler: we just did a broad survey of search results, and found a lot of stuff from, for example, the 1800s which had multi-sentence titles, and were getting to the top of many lists based on title "matching".
13:56:21 <kmlussier> OK, I'm going to wrap things up then. But feel free to let me know if you get any other ideas.
13:56:24 <kmlussier> #endmeeting