Appendix A. MARC import extension

By default, attributes' values import from MARC 21 communication format is based on built-in configuration. It is also possible to use external configuration (defined in text files). The text files are simply property files.

A.1. marcImport.properties configuration file format

Title=245:${a} ${b} ${n};130;210;222;240;246;730;740;
Creator=100;110;111;
Subject=
Description=6XX;
Publisher=260a;260b;260f;
Contributor=700;710;711;
Date=260c;
Type=
Identifier=920;856u;
Source=
Language=041;546;008/35-37
Relation=250;534;440;490;800;810;811;830;
Coverage=
Rights=506;540;

Examplary configuration file is presented above. This file contains configuration which defines assignment of MARC elements to dLibra attributes.

Every line in the configuration file contains configuration for single attribute. Every line is composed of RDF attribute name, the equal sign and a list of MARC elements to assign to the attribute. RDF name of an attribute can be found in administrator application (editing panel for an attribute). MARC elements which may be imported are, inter alia, subfield value, characters from control fields etc. If attribute's RDF name is not specified no values will be assigned to the attribtue.

A record defining a MARC field number which value is to be imported into attribute value has the following basic syntax: AAAb;, where AAA is a three-digit number and b is a subfield code. It is also possible to combine MARC subfield or extract a range of characters from control fields. Note that the sign ; (semicolon) is a part of this syntax and is necessary for a proper configuration.

It is possible to omit the subfield code as well as use the multi-value code. Details and examples of it are presented below.

  • 100; - an example of field number.

    Such a record will either import the value of a the field (note that some fields in MARC format, for instance control fields which numbers are smaller than 010, never have subfields) or import the values of all subfields of this field into an attribute value. Every subfield value will be imported as a separate attribute value.

  • 260c; - an example of filed number with subfield code.

    Such a record will import just the value of a certain subfield into attribute value.

  • 6XX; - an example of multi-value code.

    Such a record will import the values of all fields and subfields at range 600 - 699. In this way you cannot specify certain subfield codes. It is also possible to define for instance such a record: 65X; , which will analogically import values from fields at range 650 - 699.

  • 245:${a} ${b} ${n}; - an example for combining MARC subfields into one value.

    We can split this entry into two parts which are separated by the “:” (colon) character:

    1. 245 - field number which subfields will be combined into a value

    2. ${a} ${b} ${n} - template which defines how to combine the subfields.

    The entry ${a} means that in its place value from “a” subfield should be placed. The subfield is a subfield of field number placed before the “:” character - in this case it is 245 field. So the 245:${a} ${b} ${n}; template will combine 245 field's subfields (a, b and n) in one value. These subfields will be separated with space (as specified in the template). For example if the subfield 245a has “first value” value, subfield 245b has “second value” value and subfield 245n has “third value” value then the result will be “first value second value third value”. If there is a need to separate these values with anothed character (not space) place them instead of the space in the template (e.g. 245:${a}-${b} subfield n: ${n};). There are few exceptions --- characters “;” (semicolon), “\” (backslash) and “$” - to interpret these characters correctly by the application two additional backslashes have to be placed before (e.g. 245:${a} ${b}\\;${n};).

  • 008/35-37 - concerns only control fields - it means extracting a range of chararcers from the control field.

    This template is combined from two parts separated by the slash character (“/”):

    1. 008 - the number of control field which range of characters will be extracted from

    2. 35-37 - this is the rance of characters which will be extracted from the filed number which is placed before slash (“/”).

    The entry means that the character on the position 35, 36 and 37 from the 008 control field will make the value. If the 008 control field, on the 35th position has an “e” character, on the 36th position has an “n” character and on the 37th position has a “g” character then the value of such a entry will be “eng”. If it is needed to extract only one character from a given position simply specify the character position after the slash character, e.g. 008/30.