Skip to main content
OCLC Support

WorldCat Discovery release notes, Arabic language search and sort

 

Release Date: November 2023

Introduction

The following release notes are for Arabic language searching and sorting support in WorldCat Discovery, completed NOVEMBER 2023

WorldCat Discovery now includes the following enhancements to searching and sorting for the Modern Standard Arabic language that a native speaker expects:

Search

  • We normalize terms so they are treated the same with and without diacritics, definite articles, prefixes and kashida.
  • We maintain lists of protected words (that are not normalized), stop words and word stems (for automatic, default matching on variant forms of a word).

Sort

  • We remove leading articles and normalize hamza (especially with letter “ا “) before applying Unicode collation.

These Arabic language searching and sorting improvements complement WorldCat Discovery’s Arabic user interface, introduced in May 2021.

Modern Standard Arabic search and sort features 

Search

Your library’s users searching for words or phrases in Modern Standard Arabic now get search results that meet the expectations of native Arabic speakers. These include:

Diacritics

WorldCat Discovery returns the same results regardless of whether users enter search terms with or without diacritics such as hamza “ء” or madda “آ”.  For example, we treat the following as equivalent:

  • “ى” and ”ي”,  
  • “ه” and ” ة ”
  • “آ” “ا” “إ” “أ”

Definite articles and prefixes

For title searches, results are the same with or without preceding definite articles or prefixes. For example:

  • Definite articles are dropped: “ال“
  • Prefixes are ignored: “ب” “ك” “ف” “لل”

Kashida

WorldCat Discovery treats characters the same whether elongated or not. For example:

  • With kashida: “مـديـنــــــــــــــــة “
  • Without kashida: “مدينة”

The following table provides further examples:

Arabic Query Input

Comments

Meaning in English

Expected Results

انسان

word without diacritics and leading article

human

4 queries with same output

إنسان

word with diacritics and no leading article

human

الانسان

word without diacritics and with leading article

The human

الإنسان

word with diacritics and leading article

The human

 

 

 

 

مدينة  

Same word meaning written in 3 variations

City

3 queries with same output

مَدِينَة

City

مدينه  

City

 

 

 

 

مـديـنــــــــــــــــة   

With and without the "Kashida"

City

2 queries with same output

مدينة

City

 

 

 

 

آنسة

With and without "Madda" on the first letter A

Ms.

2 queries with same output

انسة

Ms.

 

 

 

 

المدينة

Word with leading article

The city

2 queries with same output

والمدينة

Word with leading article + and "و"

And the city

 

 

 

 

بالمدينة  

3 different variations with prefixes

In the city

3 queries with same output

فالمدينة  

Then the city

للمدينة   

For the city

Protected words

OCLC maintains a list of Modern Standard Arabic words that are protected from the above normalization processes and are preserved unchanged. Normalization, such as removing definite articles or prefixes, would either change the meaning of these words or render them meaningless. 

Example:

  •  المانيا      (Germany)                   Without the leading article, the word مانيا has no meaning.

Stop words

OCLC maintains a list of Modern Standard Arabic stop words that WorldCat Discovery ignores for search matching because they occur so commonly across records that they do not help users select or distinguish between records.

Stemming

OCLC maintains a list of Modern Standard Arabic word stems that assist with automatic, default matching on variant forms of a word. 

Sort

Alphabetic sorting of Modern Standard Arabic search results now meets native Arabic speakers’ expectations.

Alphabetic sorting is used on WorldCat Discovery search results for:

Sort

  • Author (A-Z)
  • Title (A-Z)

Facet

  • Author/Creator

Alphanumeric sorting of call numbers is used on the item details page for:

  • Browse the Shelf

Before applying sorting to Modern Standard Arabic, we remove leading articles and normalize hamza, especially with letter “ا “. We then sort Modern Standard Arabic author, title, and call number fields using the default collation order of the Unicode Collation Algorithm that we apply for all scripts and languages.

Title (A-Z) sorting example

Searching for the term المقالات (the articles) in the title ti: index

Record no.

Title

 

1

الاجتماعي وعالمه الممزق: مقالات في فلسفة اجتماعية‎

Sorting by dropping the leading article and the hamza

2

الاجتماعي وعالمه الممزق: مقالات في فلسفة اجتماعية .‎

3

أدب الحياة‎

4

الأدبيات العصرية في سبيل التاج: مقالات في الأدب والثقافة والحياة‎

5

استنطاق النص: مقالات في السرد العربي‎

6

الإسلام والغرب: مقالات ودراسات مختارة‎

7

التنبيهات والحقيقة مقالات إضافية حول الفلسفة والديموقراطيّة‎

8

التنبيهات والحقيقة مقالات إضافية حول الفلسفة والديموقراطيّة‎

9

التنبيهات والحقيقة مقالات إضافية حول الفلسفة والديموقراطيّة‎

10

حاء: مقالات‎

11

حماة الوطن: مقالات مختارة 2002‎

12

الخواص الأسلوبية في مقالات أحمد بهاء الدين‎

13

عرض كتاب: التنبيهات والحقيقة؛ مقالات إضافية حول الفلسفة و الديمقراطية‎

14

العلاقات الدولية وجائحة كورونا: قصة قصيرة وأربع مقالات‎

15

فن النثر الحديث: تحليل مقالات وقصص قصيرة‎

Important links

Product website

More product information can be found here.

Support websites

Support information for this product and related products can be found at:

 
If you have additional questions, please contact OCLC Customer Service by calling 1-800-848-5800 or 1-614-793-8682 Monday – Friday 7 a.m. – 9p.m. ET, or email support@oclc.org. For support enquiries in the UK and Ireland, please contact the Support Desk by calling +44-(0)114-281 60 42 or e-mailing support-uk@oclc.org. Support is available between the hours of 09:00 and 17:30 (UK Time).

Include Request ID with problem reports

When reporting an issue with WorldCat Discovery, it is extremely helpful to include the Request ID. The Request ID is found at the bottom of the screen on which the issue occurred. Including this information allows us to directly trace what happened on the request we are troubleshooting.

clipboard_efd430fa7e00f875499f4c54320a66f4c.png