Appendix A. MARC Import

Attributes' values import from MARC format in dLibra application is based on default configuration built-in inside the application. This configuration is unchangeable though it is possible to use the other configuration defined in text files outer to the application (see Section 3.4.6, “Editor's application configuration”). The files have a simple properties file format. This section describes this format and shows how to create these files to change a default application import configuration.

A.1. marcImport.properties configuration file format

Title=245:${a} ${b} ${n};130;210;222;240;246;730;740;
Creator=100;110;111;
Subject=
Description=6XX;
Publisher=260a;260b;260f;
Contributor=700;710;711;
Date=260c;
Type=
Identifier=920;856u;
Source=
Language=041;546;008/35-37
Relation=250;534;440;490;800;810;811;830;
Coverage=
Rights=506;540;

Above there is shown an examplary content of the MARC configuration file.

Each line in configuration file consists of attribute and its value which are divided with equation sign. Depending on which attributes have to be imported from MARC document there have to be RDF names of these attributes in separate lines. After equation sign there is kept an information about which MARC fields are to be written to certain attribute's value. It consists of field numbers written in a particular way described later in this section. All field numbers have to be written in the same line without any separation character. The lack of any attribute name means the same as leaving an empty set of field numbers by this attribute. In this situation no values will be imported to such an attribute.

A record defining a MARC field number which value is to be imported into attribute value has the following basic syntax: AAAb;, where AAA is a three-digit number and b is a subfield code. It is also possible to combine MARC subfield or extract a range of characters from control fields. Note that the sign ; (semicolon) is a part of this syntax and is necessary for a proper configuration.

It is possible to omit the subfield code as well as use the multi-value code. Details and examples of it are shown below.

  • 100; - an example of field number.

    Such a record will either import the value of a the field (note that some fields in MARC format, for instance control fields which numbers are smaller than 010, never have subfields) or import the values of all subfields of this field into an attribute value. Every subfield value will be imported as a separate attribute value.

  • 260c; - an example of filed number with subfield code.

    Such a record will import just the value of a certain subfield into attribute value.

  • 6XX; - an example of multi-value code.

    Such a record will import the values of all fields and subfields at range 600 - 699. In this way you cannot specify certain subfield codes. It is also possible to define for instance such a record: 65X; , which will analogically import values from fields at range 650 - 699.

  • 245:${a} ${b} ${n}; - an example for combining MARC subfields into one value.

    We can split this entry into two parts which are separated by the “:” (colon) character:

    1. 245 - field number which subfields will be combined into a value

    2. ${a} ${b} ${n} - template which defines how to combine the subfields.

    The entry ${a} means that in its place value from “a” subfield should be placed. The subfield is a subfield of field number placed before the “:” character - in this case it is 245 field. So the 245:${a} ${b} ${n}; template will combine 245 field's subfields (a, b and n) in one value. These subfields will be separated with space (as specified in the template). For example if the subfield 245a has “first value” value, subfield 245b has “second value” value and subfield 245n has “third value” value then the result will be “first value second value third value”. If there is a need to separate these values with anothed character (not space) place them instead of the space in the template (e.g. 245:${a}-${b} subfield n: ${n};). There are few exceptions --- characters “;” (semicolon), “\” (backslash) and “$” - to interpret these characters correctly by the application two additional backslashes have to be placed before (e.g. 245:${a} ${b}\\;${n};).

  • 008/35-37 - concerns only control fields - it means extracting a range of chararcers from the control field.

    This template is combined from two parts separated by the slash character (“/”):

    1. 008 - the number of control field which range of characters will be extracted from

    2. 35-37 - this is the rance of characters which will be extracted from the filed number which is placed before slash (“/”).

    The entry means that the character on the position 35, 36 and 37 from the 008 control field will make the value. If the 008 control field, on the 35th position has an “e” character, on the 36th position has an “n” character and on the 37th position has a “g” character then the value of such a entry will be “eng”. If it is needed to extract only one character from a given position simply specify the character position after the slash character, e.g. 008/30.