Can I use MarcEdit to dedupe records in a delivered file?
Symptom
- Occasionally vendors (WCP aka WorldCat Cataloging Partners vendors) may send an invoice more than once, resulting in duplicate (or even more) copies of records within a file delivered for download by the user. OCLC’s system does not dedupe the files received from vendors, OCLC’s system is working as designed. The provider is sending the duplicates. The user will need to de-dupe the files before uploading them into their ILS.
Applies to
- WorldCat Collection Manager
Resolution
Many OCLC member libraries use MarcEdit* which is a free utility that they can download to use for de-duping and many other purposes. OCLC staff do not support MarcEdit but the steps below may help a user through this process.
Deduping MARC files
Windows machines:
MarcEdit Record DeDuplication with a MARC File for Windows
- Open MarcEdit.
- Open the Tools drop-down menu in the Toolbar at the top of the Window.
- Select MARC Processing Tools
- Select Find Duplicate Records.
- Specify the file you need to deduplicate.
- Specify the path to the file where you want to save the duplicates.
- Click on the Save Button.
- Click on the Next Button.
- Select the OCLC Number for the Control Field.
- Leave Dedup Keeping set to the default of "First Record".
- Click on the Process button.
And the save file contains the deduplicated records.
MACs:
This workflow will help libraries de-dupe their WorldCat Cataloging Partners record files USING MACs.
- Use MarcEdit to export OCLC numbers (MARC field 001) into a tab-delimited text file that can be opened in Excel. You can optionally include title field (245 $a) so the title is in the list.
- Use MARCsplit to separate the records into individual files
- Have records split into their own folder on the local computer
- Set “Records per file to 1”
- Leave “Number of files box unchecked”
- Review the list of OCNs, duplicates should be grouped. Use conditional formatting in Excel to highlight duplicate values if they are not grouped.
- Move one of the copies of each record into a new folder
- Use MARCjoin to merge the single copies into one file for easy loading, or load the records one at a time.
Download MarcEdit
*MarcEdit is not owned by OCLC, it is a free utility used by many libraries.