Skip to main content
OCLC Support

An ingested PDF does not have transcript metadata even though the collection has a Full Text Search metadata field.

Applies to
  • CONTENTdm
  • Project Client
Answer

This is happening because there is no embedded text in the PDF being ingested. PDF files created by scanning physical documents often behave this way because the scanning process creates a digital representation of the page and does not automatically process the scanned images using Ocular Character Recognition.

Text can be embedded in the PDF by processing it using OCR before ingesting the file into Project Client. Alternatively, the PDF can be exported as a series of image files and those image files can be ingested as a compound object to be processed with the Project Client's integrated OCR functionality.

For more on adding PDF files in CONTENTdm see Work with PDF files.

Page ID
49187