Skip to main content
OCLC Support

Catalog using Bengali script

Discover how to catalog using Bengali script in Connexion client.

Overview

Use Bengali script data for cataloging items in languages that use the Bengali script (for example, Bangla and Assamese). Use Bengali script data the same way you use other non-Latin script data in the client.

See Work with international records and Guidelines for contributing non-Latin script bibliographic records to WorldCat for details specific to non- Latin scripts. See also general procedures describing how to:

Tools for using non-Latin scripts

The client provides the following general tools to help you catalog using non-Latin scripts:

  • Link/unlink fields (Edit > Linking Fields > Link [or Unlink]) - visually link non-Latin script data fields with equivalent romanized data fields.
  • Export options for data fields (Tools > Options > International) - determine:
    • Whether to export both Latin-script-equivalent (romanized) data and non-Latin script data or only one or the other
    • Position of data if both
    • Sort order

     Caution: MARC-8 character verification (Edit > MARC-8 Characters > Verify) is not appropriate for verifying Armenian characters. There is no MARC-8 character set for Armenian. Using this command for Armenian results in marking all Armenian characters as invalid. The OCLC system validates Armenian characters when you validate a record.

UTF-8 Unicode export and import required for Bengali records

Because Bengali script is not included in MARC-8 character sets, you must export and import records using the UTF-8 Unicode character set(settings for export are in Tools > Options > Export, click Record Characteristics, and settings for import are in File > Import Records, click Record Characteristics). If you export or import using the MARC-8 character set, non-MARC-8 characters are retained in Numeric Character Reference (NCR) notation only.

About Unicode

Unicode is the universal character encoding scheme for written characters and text. It defines a consistent way of encoding multi-script text that enables the exchange of text data internationally.

Unicode provides for three encoding forms: a 32-bit form (UTF-32), a 16-bit form (UTF-16), and an 8-bit form (UTF-8, designed for use with ASCII-based systems).

Connexion client began supporting Bengali script with Unicode version 4.0.0.

Bengali script entry and character set

Script entry method

If your system default language is not Bengali, you can install the Bengali language in Windows. When you install Bengali, Windows provides an input keyboard for entering Bengali script. See more about input methods for languages that use non-Latin scripts.

Character set supported

Bengali characters are defined in Unicode 4.0 (coded in the range U+0981 to U+09FA).

Script identifier in records

The client adds the following data to ‡c of field 066 in Bengali records to indicate the presence of Bengali characters:

  • Beng

Romanized data

See the ALA-LC Romanization Table for Bengali on the Library of Congress website.

Indexing for Bengali script searches

Notes on searching WorldCat using Bengali script search terms

  • Use word or phrase search indexes and browse indexes. 
  • Word searches find the data string you enter anywhere in the indexed field. Phrase searches find the data string starting with the first character in a field or subfield and including each character in exact order. Browsing scans an index for the closest match to the character string followed by any other data. 
  • If you use qualifiers to limit a search, type them in Latin script. 
  • Do not use derived searching. 
  • You can truncate searches (asterisk (*) at the end of a search term) or use browsing for automatic truncation (enter only as many characters as needed for a match without using an asterisk). 
  • If you want to retrieve all Bengali script records or see sample records, use the "character sets present" WorldCat search index (label vp:) with the assigned code ben.
    • To find all Bengali script records, enter vp:ben as a command line search in the Search WorldCat window (Cataloging > Search > WorldCat).

       Note: If a search for all Bengali script records alone retrieves too many WorldCat records (limit 1,500 records), you must limit the search and try again (e.g., vp:ben/1991-vp:ben and mt:bks; etc.).

See general procedures and search techniques for searching WorldCat.

Bengali character indexing specifics

  • Bengali signs are indexed as is (Candrabindu, Anusvara, Visarga, Nukta, and Avagraha). 
  • Independent vowels, dependent vowels, two-part dependent vowels,.and generic or Bengali-specific character additions are all indexed as is. 
  • Consonants are indexed as is, if attached with Virama (Hasant); otherwise, they are indexed with the dependent vowel or consonant. 
  • Both Bengali and Latin numbers are indexed (either may appear in Bengali text). 

Notes on sorting search results

  • Bengali syllables with candrabindu or anusvara (nasalization signs) precede terms without those syllables. 
  • Non-conjunct forms of a consonant precede conjunct forms. 
  • The default sort order for search results–alphabetical sorting by Latin script–is recommended if romanized (Latin-equivalent) data is included in the record. The sort order option is in Tools > Options > International. 

 

  • Was this article helpful?