Skip to main content
OCLC Support

Catalog using Devanagari script

Discover how to catalog using Devanagari script in Connexion client.

Overview

Use Devanagari script data for cataloging items in languages that use the Devanagari script (for example, Hindi, Marathi, Sanskrit, Nepali, Sherpa). Use Devanagari script data the same way you use other non-Latin script data in the client.

See Work with international records and Guidelines for contributing non-Latin script bibliographic records to WorldCat for details specific to non- Latin scripts. See also general procedures describing how to:

Tools for using non-Latin scripts

  • Link/unlink (Edit > Linking Fields > Link [or Unlink]) - Visually link or unlink non-Latin script data fields with equivalent Latin script (romanized) data fields (bibliographic records only) 
  • Export options for data fields (Tools > Options > International) - Determine (for bibliographic records only): 
    • Whether to export both equivalent Latin script (romanized) data and non-Latin script data or only one or the other 
    • Position of data if you export both Latin and non-Latin script data 
    • Sort order

 Caution: MARC-8 character verification (Edit > MARC-8 Characters > Verify) is not appropriate for verifying Devanagari characters. There is no MARC-8 character set for Devanagari. Using this command for Devanagari results in marking all Devanagari characters as invalid. The OCLC system validates Devanagari characters when you validate a record. 

UTF-8 Unicode export and import required for Devanagari records

Because Devanagari script is not included in MARC-8 character sets, you must export and import records in the UTF-8 Unicode character set (settings for export are in Tools > Options > Export, click Record Characteristics, and settings for import are in File > Import Records, click Record Characteristics).  If you export or import using the MARC-8 character set, non-MARC-8 characters are retained in Numeric Character Reference (NCR) notation only.

About Unicode

Unicode is the universal character encoding scheme for written characters and text. It defines a consistent way of encoding multi-script text that enables the exchange of text data internationally.

Unicode provides for three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8, designed for use with ASCII-based systems).

Connexion client began supporting Devanagari script with Unicode version 4.0.0.

Devanagari script entry and character set

Script entry method

If your system default language is not Devanagari, you can install the Devanagari language in Windows. When you install Devanagari, Windows provides an input keyboard for entering Devanagari script. See more about input methods for languages that use non-Latin scripts.

Character set supported

Devanagari characters are defined in Unicode 4.0 (coded in the range U+0900 to U+097F).

Script identifier in records

The client adds the following data to ‡c of field 066 in Devanagari records to indicate the presence of Devanagari characters:

  • Deva

Romanized data

See the ALA-LC Romanization Table on the Library of Congress website for Sanskrit, Hindi, and Marathi romanization rules.

Indexing for Devanagari script searches

Notes on searching

  • Use word or phrase search indexes and browse indexes. 
  • Word searches find the data string you enter anywhere in the indexed field. Phrase searches find the data string starting with the first character in a field or subfield and including each character in exact order. Browsing scans an index for the closest match to the character string followed by any other data. 
  • If you use qualifiers to limit a search, type them in Latin script. 
  • Do not use derived searching. 
  • You can truncate searches (asterisk (*) at the end of a search term) or use browsing for automatic truncation (enter only as many characters as needed for a match without using an asterisk). 
  • If you want to retrieve all Devanagari script records or see sample records, use the "character sets present" WorldCat search index (label vp:) with the assigned code dev.
    • To find all Devanagari script records, enter vp:dev as a command line search in the Search WorldCat window (Cataloging > Search > WorldCat).
       Note: If a search for all Devanagari script records alone retrieves too many WorldCat records (limit 1,500 records), you must qualify the search or combine with another search and try again (e.g., vp:dev/1991-vp:dev and mt:bks; etc.).

See general procedures and search techniques for searching WorldCat.

Devanagari character indexing specifics

  • The following Devanagari signs are indexed: Candrabindu, Anusvara, Visarga, Nukta, Avagraha, Virama (Halant), and OM (sacred Hindu syllable). 
  • The Devanagari signs Anudatta (stress sign), Grave accent, and Acute accent are ignored for indexing. 
  • Independent vowels, dependent vowels, additional consonants,.and generic or Devanagari-specific character additions are all indexed as is.
     Note: The character for Devanagari vowel short A is unavailable in the Arial Unicode MS font (Connexion client default font).
  • Consonants are indexed as is, if attached with Virama (Hasant). Otherwise, they are indexed with the dependent vowel. 
  • Both Devanagari and Latin numbers are indexed (either may appear in Devanagari text). 

Notes on sorting WorldCat search results

  • Devanagari syllables with candrabindu or anusvara (nasalization signs) precede terms without those syllables. 
  • Non-conjunct forms of a consonant precede conjunct forms. 
  • The default sort order for search results–alphabetical sorting by Latin script–is recommended if romanized (Latin-equivalent) data is included in the record. The sort order option is in Tools > Options > International. 

 

  • Was this article helpful?