Skip to main content
OCLC Support

Catalog using Thai script

Discover how to catalog using Thai script in Connexion client.

Overview

Use Thai script data for cataloging items in the Thai language. Use Thai script data the same way you use other non-Latin script data in the client.

See Work with international records and Guidelines for contributing non-Latin script bibliographic records to WorldCat for details specific to non- Latin scripts. See also general procedures describing how to:

Tools for using non-Latin scripts

  • Link/unlink (Edit > Linking Fields > Link [or Unlink]) - Visually link or unlink non-Latin script data fields with equivalent Latin script (romanized) data fields (bibliographic records only) 
  • Export options for data fields (Tools > Options > International) - Determine (for bibliographic records only): 
    • Whether to export both equivalent Latin script (romanized) data and non-Latin script data or only one or the other 
    • Position of data if you export both Latin and non-Latin script data 
    • Sort order 

 Caution: MARC-8 character verification (Edit > MARC-8 Characters > Verify) is not appropriate for verifying Thai characters. There is no MARC-8 character set for Thai. Using this command for Thai results in marking all Thai characters as invalid. The OCLC system validates Thai characters when you validate a record.

UTF-8 Unicode export and import required for Thai records

Because Thai script is not included in MARC-8 character sets, you must export and import records in the UTF-8 Unicode character set (settings for export are in Tools > Options > Export, click Record Characteristics, and settings for import are in File > Import Records, click Record Characteristics). If you export or import using the MARC-8 character set, non-MARC-8 characters are retained in Numeric Character Reference (NCR) notation only.

About Unicode

Unicode is the universal character encoding scheme for written characters and text. It defines a consistent way of encoding multi-script text that enables the exchange of text data internationally.

Connexion client began supporting Thai script with Unicode version 4.0.0.

Thai script entry and character set

Script entry method

If your system default language is not Thai, you can install Thai, and Windows provides an input keyboard for entering Thai script. See more about input methods for languages that use non-Latin scripts.

Character set supported

Thai characters are defined in Unicode 4.0 (coded in the range U+0E00 to U+0E7F).

Script identifier in records

The client adds the following data to c of field 066 in Thai records to indicate the presence of Thai characters:

  • Thai

Romanized data

See the ALA-LC Romanization Table for Thai on the Library of Congress website.

Indexing for Thai script searches

For Thai script searches, the system treats the entire data string you enter as both a word and a phrase, since Thai text has no spaces between words. Search for Thai terms using word or phrase search indexes and word or phrase browse indexes.

You must use truncation to search a phrase index and retrieve the exact data string without having to enter the entire data string as it appears in a field or subfield. Truncation retrieves the data string followed by any other data.

Truncate Thai search terms to prevent having to enter the entire string in a field:

  • To truncate, enter an asterisk (*) at the end of the character string. You must enter a minimum of three Thai characters before truncating.
    Or
    Use browsing for automatic truncation. Enter only as many characters as needed for a match, without using an asterisk at the end.

Notes on searching

  • If you use qualifiers to limit a search, type them using Latin script. 
  • Do not use derived searching. 
  • For Thai, the system finds the search term as a word, a data string occurring anywhere in a field, or as a phrase, a complete data string starting with the first character in a field or subfield and including each character in the string in exact order, depending on whether you use word indexes (for example, ti:[search string]) or phrase indexes (for example, ti=[search string]). 
  • Browsing scans an index for the closest match anywhere in an indexed field for word indexes (for example sca ti:[search string]) or at the beginning of any indexed field, followed by any data, for phrase indexes (for example, sca ti=[search string]). 
  • If you want to retrieve all Thai script records or see sample records, use the "character sets present" WorldCat search index (label vp:) with the assigned code tha.
    • To find all Thai script records, enter vp:tha as a command line search in the Search WorldCat window (Cataloging > Search > WorldCat).
       Note: If a search for all Thai script records alone retrieves too many WorldCat records (limit 1,500 records), you must limit the search and try again (e.g., vp:tha/1991- (qualified by years of publication); vp:tha and mt:bks (limited to records in the Continuing resources format); etc.).

See general procedures and search techniques for searching WorldCat.

Thai character indexing specifics

The following table shows Thai characters indexed more than one way or indexed as a space:

Character Character name How character is indexed in OCLC system
Lo Ling Lo Ling Indexed, except:

If it appears between two Paiyannoi characters (Paiyannoi), the entire three-character string is indexed as a space.
Paiyannoi Paiyannoi Indexed as a space.
Sara A Sara A Indexed, except:

Not indexed if preceded by the character Angkhankhu (Angkhankhu).
Baht Baht Indexed as a space.
Maiyamok Maiyamok Indexed as a space.
Ymakkan Ymakkan Indexed as a space.
Fongman Fongman Indexed as a space.
Angkhankhu Angkhankhu Indexed as a space.
Khomut Khomut Indexed as a space.