Skip to main content
OCLC Support

3. Data requirements for bibliographic collections

Find the data requirements for a bibliographic data sync collection in WorldShare Collection Manager.

Data synchronization allows for the processing of multiple file formats. In order to ensure your files are processed correctly, please follow the data requirements below.

 Note: MARC data is required for all bibliographic collections.

Leader and Directory

MARC records must contain valid Leader and Directory information in order for a file to be recognized and processed correctly. Please consult the Library of Congress MARC 21 Format for Bibliographic Data for valid Leader and Directory information.

Include

 Note: Certain Leader values will differ depending on whether the submitted data is MARC Bibliographic or MARC Holdings format:

Leader offset Leader element in MARC bibliographic format Valid values in MARC bibliographic format   Leader element in MARC holdings format Valid values in MARC holdings format
00-04 Record length Computer-generated, five-character number equal to the length of the entire record, including itself and the record terminator. The number is right justified and each unused position contains a zero.   Record length Computer-generated, five-character number equal to the length of the entire record, including itself and the record terminator. The number is right justified and each unused position contains a zero.
05 Record status a, c, d, n, p   Record status c, d, n
06 Type of record a, c, d, e, f, g, i, j, k, m, o, p, r, t   Type of record u, v, x, y
07 Bibliographic level a, b, c, d, i, m, s   Undefined character position blank space
08 Type of control blank space, a   Undefined character position blank space
09 Character coding scheme blank space, a   Character coding scheme blank space, a
10 Indicator count 2   Indicator count 2
11 Subfield code length 2   Subfield code length 2
12-16 Base address of data Computer-generated, five-character numeric string that indicates the first character position of the first variable control field in a record. The number is right justified and each unused position contains a zero.   Base address of data Computer-generated, five-character numeric string that indicates the first character position of the first variable control field in a record. The number is right justified and each unused position contains a zero.
17 Encoding level blank space, 1, 2, 3, 4, 5, 7, 8, u, z   Encoding level 1, 2, 3, 4, 5, m, u, z
18 Descriptive cataloging form blank space, a, c, i, n, u   Item information in record i, n
19 Multipart resource record level blank space, a, b, c   Undefined character position blank space
20 Length of the length-of-field portion 4   Length of the length-of-field portion 4
21 Length of the starting-character-position portion 5   Length of the starting-character-position portion 5
22 Length of the implementation-defined portion 0   Length of the implementation-defined portion 0
23 Undefined 0   Undefined 0
Additional MARC Leader information

For complete information about the MARC Leader, see:

Exclude

Please ensure that your submitted UTF-8 MARC data files do not begin with a byte order mark (BOM). MARC data files received with byte order marks will be rejected.

  • A byte order mark (BOM) is a sequence of bytes embedded in some Unicode files to help make sure they are read correctly by some web browsers and server environments. The Unicode Standard permits the BOM in UTF-8 (hexadecimal values EF BB BF) but does not require or recommend its use. Byte order has no meaning in UTF-8 and OCLC does not accept UTF-8 encoded files containing BOMS.

Fields and subfields

MARC records must contain valid field and subfield information in order for a file to be recognized and processed correctly. Please consult the Library of Congress MARC 21 Format for Bibliographic Data for valid Field and subfield information.

Include
Field/Subfield Field/Subfield name Requirement
035 System Control Number Required if available. If available, include an OCLC control number, with valid prefix, in every record. 
040 b Cataloging Source: Language of cataloging Include a language code if any cataloging data is in a language other than English. If this is not coded, our system will assume the item is cataloged in English.
040 e Cataloging Source: Description conventions Include a cataloging description MARC code for rare and archival materials only.
066 Character Sets Present Where this field exists, include 880 fields.
245 Title Statement This tag is mandatory. Include the title proper.
5xx Note Fields See ** below
880 Alternate Graphic Representation Where this field exists, include field 066.

**Use UTF-8 Unicode or MARC-8 character encoding. 
 Caution: For all tags, but especially when entering text into text fields (e.g., Comment, Description, etc.):

  • Do not add new lines. All text needs to be in one paragraph.
  • Do not copy and paste text from Microsoft Word, email, a web browser, or other sources. Pasted text can contain hidden formatting codes which can generate errors.
Tag order
Fields 001, 003, 005, and 035 should be in tag order.
Duplicate records
Each record sent for a single symbol should have a unique local system number in a consistent location. Although this list is not exhaustive, local system numbers are typically found in the following tags: 001, 049, 035, 9xx, and 852.

 Caution: When the system identifies duplicate records or multiple matches in your submitted data sync file, the file will be rerun in an attempt to resolve the issue. Multiple reports and files may be generated for each retry.