User Tools

Site Tools


documentation:facet_idx_def_example

Example: Adding a Material Type (947$t) facet index

First, every database script should perform its work inside a transaction. This is to protect the consistency of the data and avoid partial implementation of desired changes in the case of an error.

BEGIN;

Now we need to add the indexing definition:

INSERT INTO config.metabib_field (field_class, name, xpath, format, label, search_field, facet_field) VALUES (
  'identifier',                                                -- The class to which this definition will belong
  'mattype',                                                   -- The symbolic name of the definition within the class
  $$//marc:datafield[@tag="947"]/marc:subfield[@code="t"]$$,   -- The XPath expression which will extract the desired data from the MARCXML
  'marcxml',                                                   -- The format in which we will process the XPath for this definition
  'Material Type',                                             -- The translatable label used for this definition in user interfaces
  FALSE,                                                       -- Mark this definition as NOT search-oriented
  TRUE                                                         -- Mark this definition as a facet-oriented index
);

Allowing the default id value to be set by the database gives us the ability to use the CURRVAL() function for the auto-incrementing sequence that fills in the id. We can use this when populating the normalization map and the index table with data from the MARC record:

INSERT INTO config.metabib_field_index_norm_map (FIELD, norm, pos)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          cin.id,                                            -- The ids of the normalizers we care about
          -3                                                 -- Ordering of normalizer application -- this one first
    FROM  config.index_normalizer AS cin  -- Table defining normalizer functions
    WHERE cin.name = 'Up-case';           -- The normalizers we care about for this field
 
INSERT INTO config.metabib_field_index_norm_map (FIELD, norm, params, pos)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          cin.id,                                            -- The ids of the normalizers we care about
          '["[",""]',                                        -- Parameters for the Replace function -- replace "[" with the empty string
          -2                                                 -- Ordering of normalizer application -- this one second
    FROM  config.index_normalizer AS cin  -- Table defining normalizer functions
    WHERE cin.name = 'Replace';           -- The normalizers we care about for this field
 
INSERT INTO config.metabib_field_index_norm_map (FIELD, norm, params, pos)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          cin.id,                                            -- The ids of the normalizers we care about
          '["]",""]',                                        -- Parameters for the Replace function -- replace "]" with the empty string
          -1                                                 -- Ordering of normalizer application -- this one third
    FROM  config.index_normalizer AS cin  -- Table defining normalizer functions
    WHERE cin.name = 'Replace';           -- The normalizers we care about for this field

In order to avoid reindexing the entire bibliographic data set for all index definitions when simply adding a new definition, it is best to backfill the index table for the new definition. This is done by pulling data from the existing, flattened MARC data:

INSERT INTO metabib.facet_entry (FIELD, SOURCE, VALUE)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          mfr.record,                                        -- The source MARC record from which the data will be extracted
          UPPERCASE(BTRIM(VALUE,'[]'))                       -- Normalize 947$t data using the same logic as the defined above for initial index population
    FROM  metabib.real_full_rec AS mfr        -- The internal table holding a flattened version of the bibliographic record
    WHERE mfr.tag = '947' AND subfield = 't'; -- Include only subfield t from 947 tags

If an alias is desired for targeting exactly this field in SRU and Z39.50 searches, that can be added as well:

INSERT INTO config.metabib_search_alias (alias, field_class, FIELD) VALUES (
  'eg.mattype',                                     -- Alias to accept in SRU/CQL queries, and to map in Z39.50 settings
  'identifier',                                     -- The search class to which the definition belongs
  CURRVAL('config.metabib_field_id_seq'::regclass)  -- Returns the id of the indexing definition inserted above
);

The only thing left to do is have the database commit the transaction wrapping all of these commands. The final script thus ends up as:

BEGIN;
 
INSERT INTO config.metabib_field (field_class, name, xpath, format, label, search_field, facet_field) VALUES (
  'identifier',                                                -- The class to which this definition will belong
  'mattype',                                                   -- The symbolic name of the definition within the class
  $$//marc:datafield[@tag="947"]/marc:subfield[@code="t"]$$,   -- The XPath expression which will extract the desired data from the MARCXML
  'marcxml',                                                   -- The format in which we will process the XPath for this definition
  'Material Type',                                             -- The translatable label used for this definition in user interfaces
  FALSE,                                                       -- Mark this definition as NOT search-oriented
  TRUE                                                         -- Mark this definition as a facet-oriented index
);
 
INSERT INTO config.metabib_field_index_norm_map (FIELD, norm, pos)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          cin.id,                                            -- The ids of the normalizers we care about
          -3                                                 -- Ordering of normalizer application -- this one first
    FROM  config.index_normalizer AS cin  -- Table defining normalizer functions
    WHERE cin.name = 'Up-case';           -- The normalizers we care about for this field
 
INSERT INTO config.metabib_field_index_norm_map (FIELD, norm, params, pos)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          cin.id,                                            -- The ids of the normalizers we care about
          '["[",""]',                                        -- Parameters for the Replace function -- replace "[" with the empty string
          -2                                                 -- Ordering of normalizer application -- this one second
    FROM  config.index_normalizer AS cin  -- Table defining normalizer functions
    WHERE cin.name = 'Replace';           -- The normalizers we care about for this field
 
INSERT INTO config.metabib_field_index_norm_map (FIELD, norm, params, pos)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          cin.id,                                            -- The ids of the normalizers we care about
          '["]",""]',                                        -- Parameters for the Replace function -- replace "]" with the empty string
          -1                                                 -- Ordering of normalizer application -- this one third
    FROM  config.index_normalizer AS cin  -- Table defining normalizer functions
    WHERE cin.name = 'Replace';           -- The normalizers we care about for this field
 
INSERT INTO metabib.facet_entry (FIELD, SOURCE, VALUE)
  SELECT  CURRVAL('config.metabib_field_id_seq'::regclass),  -- Returns the id of the indexing definition inserted above
          mfr.record,                                        -- The source MARC record from which the data will be extracted
          UPPERCASE(BTRIM(VALUE,'[]'))                       -- Normalize 947$t data using the same logic as the defined above for initial index population
    FROM  metabib.real_full_rec AS mfr        -- The internal table holding a flattened version of the bibliographic record
    WHERE mfr.tag = '947' AND subfield = 't'; -- Include only subfield t from 947 tags
 
INSERT INTO config.metabib_search_alias (alias, field_class, FIELD) VALUES (
  'eg.mattype',                                     -- Alias to accept in SRU/CQL queries, and to map in Z39.50 settings
  'identifier',                                     -- The search class to which the definition belongs
  CURRVAL('config.metabib_field_id_seq'::regclass)  -- Returns the id of the indexing definition inserted above
);
 
COMMIT;
documentation/facet_idx_def_example.txt · Last modified: 2022/02/10 13:34 by 127.0.0.1

Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
CC Attribution-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki

© 2008-2022 GPLS and others. Evergreen is open source software, freely licensed under GNU GPLv2 or later.
The Evergreen Project is a U.S. 501(c)3 non-profit organization.