Searching WorldCat indexes guidelines and requirements
Browsing
Browsing scans an index with the intent of finding a matched term or the closest matching term, rather than retrieving records. Selecting a term in a browse results list then retrieves the relevant record(s).
Each index description notes whether the index supports searching only or both searching and browsing.
Browse WorldCat using either:
- A word that appears anywhere in indexed fields and subfields.
- An exact phrase (complete subfield) or whole phrase (complete field), starting with the first word and including all words (but excluding initial articles in titles). The phrase you enter is matched character by character, from left to right, against the characters of the phrase in the index you specify.
The system returns a list of terms showing a match or the closest match, along with terms that precede and follow the matching term. When you open an entry on the list, you see the record or a list of records retrieved for that term.
Capitalization
Index labels and search terms can be upper- or lowercase or a combination.
Default index
If you do not include an index label, the system uses the Keyword index (kw:) as the default.
Derived searches
Derived searching reduces the number of keystrokes you enter.
A derived search uses a specific number of initial characters from sequential words in a name or title.
- The "derived" segments of the words are separated by commas.
- A word is defined the same as for keyword searching (any character(s) between two blank spaces).
- The number and pattern of letters and commas tells the system which derived index to search.
- In a Connexion command line search or a FirstSearch expert search, using the derived search index label and punctuation is optional if it is the first or only element of the search. Always use index labels and punctuation when combining a derived search with a search in a different index.
Initial articles (such as "a", "an", and "the")
Internet URLs and the 856 field
When searching for Internet URLs in relation to the 856 field, use mt:url to see if the 856 field exists in an Internet-only resource. It is also possible to search mt:web to retrieve any record with any kind of 856 field. Also, see Access Method for information about searching URLs and the 856 field.
Levels of searching
To give flexibility in search strategy and control over the results, OCLC provides various levels of searching, from simplified to complex.
Examples of searches in this guide are given in full search syntax (most complex format). From full syntax examples, you can extrapolate the parts of a search you would enter or select in boxes and lists to construct a basic or guided form of the search.
Non-Latin/non-Roman scripts
The Connexion client and FirstSearch interfaces support all UTF-8 Unicode defined characters for non-Latin script search terms, which includes the following non-Latin, MARC-8 scripts: Arabic, Chinese, Cyrillic, Greek, Hebrew, Japanese, and Korean, as well as all of the UTF-8 Unicode character sets. For a complete list of supported scripts, see the Unicode character code charts.
Modern Standard Arabic
WorldCat Discovery allows you to search and sort using Modern Standard Arabic.
Diacritics
WorldCat Discovery returns the same results regardless of whether users enter search terms with or without diacritics such as hamza “ء” or madda “آ”. For example, we treat the following as equivalent:
- “ى” and ”ي”,
- “ه” and ” ة ”
- “آ” “ا” “إ” “أ”
Definite articles and prefixes
For title searches, results are the same with or without preceding definite articles or prefixes. For example:
- Definite articles are dropped: “ال“
- Prefixes are ignored: “ب” “ك” “ف” “لل”
Kashida
WorldCat Discovery treats characters the same whether elongated or not. For example:
- With kashida: “مـديـنــــــــــــــــة “
- Without kashida: “مدينة”
Standard Thai
WorldCat Discovery allows you to search and sort using Standard Thai.
Normalization
Normalization is not applied to any Thai indexes. All Thai vowel symbols and diacritics are treated as significant and never ignored.
- Exception: Characters with tone marks: composed/decomposed
Thai characters that have tone marks in both composed and decomposed forms are indexed.
Example:
When entered in composed form in a search term, the character will match both composed and decomposed forms in the index and vice versa. If searched in decomposed form it will match both the decomposed and composed form.
Tokenization
Because Thai script phrases are written without spaces between words, to recognize and index individual Thai words, tokenization is applied to build all Thai word indexes and to parse word index queries. We do not apply tokenization for phrase indexes or phrase index queries.
Indexing of individual Thai words enables word index searching whereby records containing the query terms anywhere in the appropriate indexed fields are retrieved.
|
Thai |
English translation |
---|---|---|
Query |
ti:สนทนาภาษาจีน |
Chinese conversation |
Tokenized query |
· สนทนา · ภาษา · จีน
|
· talk/converse · language · China |
Matching record title |
สนทนา 3 ภาษา ไทย-อังกฤษ-จีน โต้ตอบอย่างมั่นใจ พิชิตงานบริการในโรงแรม |
Conversation in three languages: Thai-English-Chinese. Respond confidently and conquer service jobs in hotels. |
Tokenized record title |
สนทนา 3 ภาษา ไทย อังกฤษ จีน โต้ตอบ อย่าง มั่นใจ พิชิต งาน บริการ บริการ_ใน ใน ใน_โรงแรม โรงแรม |
Talk/Converse 3 language Thai England China respond at/manner confident conquer work service service_in in in_hotel hotel |
Common words
Referring to the list of common Standard Thai words below, rather than treating them as stop words whereby they would be ignored for indexing and matching, we combine them with adjacent words when we build word indexes and parse word index queries.
When a common Thai word is ignored (treated as a stop word), an adjacent word that remains can have a different meaning than when combined with/adjacent to a common word. This different meaning can lead to the retrieval of irrelevant records. In cases where the meaning of a word would have changed had we removed the adjacent common word, combining it with the adjacent common word helps to disambiguate its meaning, providing greater search precision by reducing retrieval of irrelevant records.
We apply the above processing of common words when building and searching the following indexes:
- se: Series
- ti: Title
- kw: Keyword
Common word treated as a stop word
- Query ti:มาตราการ (measures/procedure)
- Tokenized into มาตรา and การ
- The common word การ is removed as a stop word leaving only มาตรา
- มาตรา has a different meaning (section or clause of law) from มาตราการ (measures/procedure) and therefore retrieves records that are not relevant to the query ti:มาตราการ
Common word combined with an adjacent word
- Query ti:มาตราการ (measures/procedure)
- Tokenized into มาตรา_การ (because การ is defined as a common word)
- Records with title fields containing มาตราการ
- Titles are tokenized into มาตรา มาตรา_การ การ
- Only records containing มาตราการ are retrieved.
Thai common word list:
กว่า กับ การ ก็ ขณะ ของ ความ คือ จะ จึง |
ซึ่ง ด้วย ตั้งแต่ ต่างๆ ถึง ถ้า ทั้ง ทั้งนี้ ที่ นั้น |
นี้ ว่า หรือ หาก อะไร อาจ อีก เช่น เนื่องจาก เป็นการ |
เพื่อ เมื่อ เลย เอง แต่ และ แล้ว โดย ใน ไว้ |
Sort
We sort Standard Thai author, title, and call number fields using the default collation order of the Unicode collation algorithm that we apply for all scripts and languages.
Alphabetical sorting is available in WorldCat Discovery when using the following features:
Sort search results:
- Author (A-Z)
- Title (A-Z)
The Author search filter expanded to show more:
- The Author search filter initially displays authors sorted by matching record count, highest first. Selecting the Show More option to expand the filter sorts the authors alphabetically.
- If the expanded and alphabetically sorted view includes author names in multiple scripts, names in Latin script are presented first followed by those in other scripts.
Browse the Shelf from the item details page:
- Browse the Shelf uses sorting of call numbers. Call number sorting commonly differentiates items with the same call number using an alphabetical suffix. Thus, QV772 ร451 would sort before QV772 ล148ย because ร sorts before ล.
Search for local holdings record (LHR) data
The following table outlines the data elements and fields available to search for LHR data in Connexion, FirstSearch, WorldShare, and WorldCat Discovery.
Spacing
In all searches, do not enter spaces between the index label and punctuation or between punctuation and the search term. Example: kw:software
Special characters in Latin script searches
The following table of punctuation, diacritics, and special characters describes how to treat each character when you construct WorldCat search or browse terms.
Stemming
Note:
- This feature is not enabled in Connexion, FirstSearch, and WorldShare Collection Manager query collections.
- This feature works with keyword searches only.
Stemming is where each term in a query is treated as a logical OR of the various word forms of the term, so that all records that contain the alternate word forms are included in the result set. For example, the query acceleration would be treated as if it were acceleration OR accelerations OR accelerating.
Search in WorldCat Discovery | Search type | Stemming Applied | Equivalent to Searching |
---|---|---|---|
acceleration | keyword | Yes | acceleration OR accelerations OR accelerating |
kw:acceleration | keyword | Yes | acceleration OR accelerations OR accelerating |
"acceleration" | exact phrase | No | acceleration |
kw: "acceleration | exact phrase | No | acceleration |
Stop words
Stop words (also called Common word exclusions) are common words that the system ignores in some types of searches. You can omit them from search items. To use any of these words as search terms, enclose them in quotation marks.
Stop words in WorldShare and WorldCat Discovery
The lists of stop words in WorldShare and WorldCat Discovery are specific to the following indexes:
Stop words in Connexion, FirstSearch, and WorldShare Collection Manager query collections
The list of stop words in Connexion, FirstSearch, and WorldShare Collection Manager query collections is specific to the following text-rich indexes: