Skip to content

Digital Preservation Processing

Use this workflow for item-level Community Archives digital preservation processing on Digital Preservation Lab Workstations. It covers local processing, checksums, virus scanning, JP2 conversion, verification of existing VText and ArchivesSpace records, BagIt packaging, ZIP verification, and final transfer to DPLAB storage.

All scripted workflow functions are available in FreeCommander under Favorite Tools. The Favorite Tools button uses the star icon.

Warning

Work from the local processing copy. Do not delete local processing files until the completed ZIP has been copied to dplab_storage/_completed_bags and the copy verification reports PASS.

Workflow Overview

Step Section Purpose
Setup Workflow Location Confirm the local processing path before starting.
1 Copy Files to Local Processing Copy the source item folder from dplab_storage to C:\DPLAB\processing.
2 Verify the Local Copy Run copy verification and continue only if it reports PASS.
3 Run the Virus Scan Scan the local item folder and save clam.log.
4 Create the SIP Manifest Document the copied source-state files before conversion or reorganization.
5 Create Original File Technical Metadata Capture original file EXIF metadata before moving or converting files.
6 Verify Item Contents Confirm required files, filenames, cover page, PDF metadata, and scan quality.
7 Convert TIFF and PNG Page Scans to JP2 Convert TIFF or PNG scans to JP2 preservation files when required.
8 Create the Item Finding Aid Create and fill in finding-aid.md for the item.
9 Verify VText and ArchivesSpace Records Confirm the VText item, ArchivesSpace record, UUID, and UUID folder already exist before preservation packaging.
10 Download VText Metadata XML Download the VText Dublin Core XML for the item metadata folder.
11 Build the Bag Directory Structure Create the UUID-level objects and metadata directories.
12 Move Files to the Correct Directories Place preservation, access, and metadata files in their final pre-bag locations.
13 Create the Final Manifest Generate manifest.csv after files are in their final directories.
14 Generate the UUID MD5 File Create the UUID-level MD5 checksum file before BagIt packaging.
15 Create the BagIt Bag Create and validate the BagIt package.
16 Zip the Completed Bag Zip the validated _bag folder.
17 Update ArchivesSpace Extent with Zipped Bag Details Add the completed ZIP details to the ArchivesSpace extent record.
18 Verify the ZIP Opens Correctly Confirm the ZIP contains the expected top-level bag structure.
19 Copy the Completed ZIP to DPLAB Storage Copy and verify the completed ZIP in dplab_storage/_completed_bags.
20 Mark the Source Copy Complete Rename the source folder with _complete after successful ZIP verification.
21 Delete Local Processing Files Delete local processing files only after the completed ZIP copy is verified.
Check Completion Checklist Confirm the final bag includes all required files and documentation.
Output Expected Output Review the expected final ZIP and bag structure.

Workflow Location

Copy the source item folder from dplab_storage to:

C:\DPLAB\processing

Example local item folder:

C:\DPLAB\processing\ca009-007-001_quitman-high-reunions_1996

1. Copy Files to Local Processing

Copy the item folder from dplab_storage to C:\DPLAB\processing.

Work only from the local processing copy. The source copy in dplab_storage remains the fallback if processing needs to be restarted.

2. Verify the Local Copy

In FreeCommander, select the copied local item folder.

Run:

Favorite Tools > 00 Verify Copy

This produces:

copy-verification.log

Continue only if the script reports PASS. This step verifies the source files copied fully and without errors.

3. Run the Virus Scan

Select the local item folder.

Run:

Favorite Tools > 01 Virus Scan

This scans the folder and produces:

clam.log

Continue only if the scan reports no infected files.

4. Create the SIP Manifest

Select the local item folder.

Run:

Favorite Tools > 02 SIP Manifest

This produces:

sip-manifest.csv

The SIP manifest documents the copied source-state files before conversion, final organization, or bag creation.

5. Create Original File Technical Metadata

Select the local item folder.

Run:

Favorite Tools > 02b Capture Original Exif Metadata

This produces:

exiftool-original-metadata.csv

Create this file before converting scans or moving files. It records technical metadata from the original scan files or original source image files while they are still in their source-state folder structure.

Keep exiftool-original-metadata.csv with the item metadata.

6. Verify Item Contents

Before processing, confirm that the expected files are present.

The local item folder should usually include:

  • a folder of page scans
  • one reduced-size access PDF
  • one master PDF/A
  • ocr.txt (Rename OCR text file if neccessary)
  • metadata or supporting files, if present

For in-house Community Archives digitization, each item needs preservation and access copies in the required formats:

  • JP2 preservation master files for TIFF or PNG page scans
  • original JPG or JPEG page scans retained as-is when those are the source scan files
  • one reduced-size access PDF
  • one master PDF/A

TIFF and PNG source scan files may be deleted after the JP2 files have been created and checked, because the source copy remains in dplab_storage. Do not convert original JPG or JPEG scans to JP2; retain the JPG or JPEG files as the page scan preservation files for that item. If there are more than two PDFs, keep only the required two PDF files unless staff identify a project-specific reason to retain another copy.

Also verify:

  • filenames use the correct item ID
  • page and order numbering are accurate
  • page scans are cropped appropriately
  • the access PDF cover page is attached and accurate
  • ocr.txt is present
  • both required PDF files are present
  • both PDF files have embedded metadata
  • the master PDF/A opens successfully in Adobe Acrobat
  • Adobe Acrobat confirms the master PDF/A is a PDF/A file
  • the master PDF/A has embedded metadata
  • no pages are missing
  • no unrelated files are present

If any expected VText, ArchivesSpace, cover page, or PDF metadata work is missing, return to the earlier workflow guide and complete that work before continuing.

Check the Master PDF in Acrobat

Open the master PDF in Adobe Acrobat before continuing.

Confirm:

  • the master PDF opens successfully
  • Acrobat shows the PDF/A compliance banner
  • the file has embedded metadata in File > Properties
Adobe Acrobat PDF/A compliance banner stating the file claims compliance with the PDF/A standard
The Acrobat PDF/A banner confirms that the master PDF claims compliance with the PDF/A standard.

Check Page Scan Cropping

Review page scans before converting TIFF or PNG files to JP2. Look for wasted white space around the item content, especially small newspaper clippings, loose notes, or other small items surrounded by a large blank scan area.

If a TIFF or PNG scan has excessive blank space, crop it in Photoshop before JP2 conversion. Keep the full item visible and leave about 1/4 inch of margin around the content so it is clear the whole item is present and no edges have been cropped out.

Only crop when there is a lot of blank space. Cropping is time consuming, so do not spend time trimming scans that already have a reasonable margin. The goal is to remove obvious waste that would make preservation bags unnecessarily large over time.

Verify Filename Patterns

See the FreeCommander rename instructions if you need help applying filename patterns or counters.

PDF filenames should begin with the item ID, followed by a short item title, followed by the date when known. File and folder names should be ALL LOWERCASE and contain no spaces or special characters. See File/Folder Naming Guidelines.

The item ID string identifies the collection, series, subseries when applicable, and item number. The prefix identifies the collection area: ca means Community Archives, ms means Manuscripts for non-VSU records, and ua means University Archives for Valdosta State records. The first prefix-number pair, such as ca-013, is the collection number. The next three-digit string is the series number. The next three-digit string is the subseries number when applicable. The final number is the item number.

Example item ID:

ca-013-001-002-001

Use all lowercase. Use only letters, numbers, dashes, underscores, and the period before the file extension. Use dashes inside a string of words, such as the item title, and use underscores to separate information types.

Use this pattern:

item-id_short-title_date.pdf

The master PDF/A uses the base filename. The reduced-size access PDF usually uses the same filename with a added after the date.

Examples:

ca-013-001-002-001_hahira-gold-leaf_1973-01-04.pdf
ca-013-001-002-001_hahira-gold-leaf_1973-01-04a.pdf

If there is no date, use _nd for the master PDF/A and _nd_a for the access PDF.

Examples:

ca-013-001-002-001_hahira-gold-leaf_nd.pdf
ca-013-001-002-001_hahira-gold-leaf_nd_a.pdf

For dates:

  • use ISO format YYYY-MM-DD when the full date is known
  • use YYYY-MM when only the month and year are known
  • use YYYY when only the year is known
  • use cYYYY for circa dates, such as c1976
  • use YYYY-YYYY for date spans
  • use cYYYY-YYYY for circa date spans
  • use _nd when there is no date at all
  • for no-date access PDFs, use _nd_a.pdf

Truncate long titles enough to keep the filename readable while preserving the main identifying words. Separate title words with dashes.

Example bag filename:

ca009-007-001_quitman-high-reunion_1996-06-08_bag.zip

Page scan filenames should use the item ID plus a page number.

Use this pattern:

item-id_p000.ext

Use .jp2 for converted TIFF or PNG page scans. Keep the original .jpg or .jpeg extension for JPG or JPEG page scans that are not converted.

Example page scan filenames:

ca009-007-001_p001.jp2
ca009-007-001_p001.jpg

7. Convert TIFF and PNG Page Scans to JP2

Convert page scans to JP2 only when the original scan files are TIFF or PNG.

Do not convert original JPG or JPEG scans to JP2. Leave JPG and JPEG page scans as-is and keep them as the page scan preservation files.

For TIFF and PNG source scans, JP2 files are the preservation master files for this workflow.

In FreeCommander:

  1. Select the TIFF or PNG page scan image files.
  2. Go to Tools > Image Converter.
  3. Set output format to JP2 - JPEG 2000 Format.
  4. Use the Uncompressed setting.
  5. Make sure these boxes are checked:
    • Keep original date/time attributes
    • Preserve Metadata
    • Preserve color profile
  6. Make sure Delete original is not checked.
  7. Run the conversion.
  8. Confirm the JP2 files were created successfully.
  9. Open 2-3 random JP2 files and verify they open correctly.

Do not delete the original TIFF or PNG working files during conversion. They may be deleted only after the full conversion process is complete, the JP2 files have been created, and the random JP2 file check confirms they open correctly. Retain the JP2 files as the preservation master image files.

Save the XnConvert Log

After conversion, if JP2 files were created:

  1. Right-click in the XnConvert log or output window.
  2. Choose Save log as...
  3. Save the log to the item directory as:
migration.log

8. Create the Item Finding Aid

Select the local item folder.

Run:

Favorite Tools > 03a Create Item Finding Aid

This produces:

finding-aid.md

Begin filling out the finding aid and continue updating it as the rest of the workflow is completed.

Fill in the following fields:

  • Title (at top and under description)
  • Identifiers
  • UUID
  • Collection ID - The CA number
  • Acquisition ID - Type "N/A" for not applicable.
  • Item ID - Use the full item id
  • Bag Filename - Final zipped folder filename. Should autopopulate on creation. Verify it is correct.
  • Online Access
  • VText URL - Add the Vtext Item Handle URL
  • ArchivesSpace URL - Add URL to Archival Object item page
  • ArchivesSpace Digital Object: Add the URL to the Digital Object page.
  • Access PDF Download Link: Copy the PDF download URL from Vtext and paste here
  • Description
  • Title
  • Creator
  • Date
  • Scope and contents
  • Extent
  • Total Files: Count scan pages plus 2 the two pdf files
  • Total Size: Total file size of 2 PDF files and Scans folder. See Record File Sizes.
  • Formats - List formats. Default is: PDF, PDF/A, JP2
  • Extent Statement - 1 electronic record (PDF), and X page scans. Total File Size. E.g. 1 electronic record (PDF), and 8 page scans (JP2). 1.30 GB (1,403,526,758 bytes).
  • Subjects: List the subject headings in bullet list
  • Rights and Access - Rights Statement: Creator/Donor, © YEAR of item. E.g. Hahira Historical Society, © 1972
  • Citation (APA Bibliography) - Add citation
  • Processing notes - fill in processing notes.

Example APA-style bibliography citation:

Hahira Historical Society. (1973, January 4). The Hahira Gold Leaf (No. 51). Hahira Gold Leaf (newspapers) series, Hahira Historical Society Collection (CA-013). Valdosta State University Archives and Special Collections. https://hdl.handle.net/10428/7661

Record File Sizes

Record file sizes in the finding aid for the scan folder, access PDF, and master PDF/A.

In Windows Explorer or FreeCommander:

  1. Select the scan folder, access PDF, and master PDF/A.
  2. Right-click the selection and choose Properties.
  3. Copy and paste the displayed size information into finding-aid.md.

Use the full Windows file size statement when available.

Example:

17.1 MB (17,994,854 bytes)

Finding Aid Markdown Notes

The finding aid is a Markdown file. See the Markdown Guide basic syntax tutorial for a quick reference.

For this workflow:

  • wrap filenames and paths in backticks, such as finding-aid.md or UUID/metadata/manifest.csv
  • if you add a new subsection, use a third-level heading with three number signs, such as ### Processing Notes
  • keep notes concise and readable for staff who may need to review the file later

9. Verify VText and ArchivesSpace Records

Digital preservation processing assumes the earlier Community Archives workflows are already complete. Do not create the VText item, upload the access PDF, attach the VText cover page, embed PDF metadata, or create the ArchivesSpace record in this workflow.

Before packaging, verify:

  • the access PDF has the VText cover page attached
  • the access PDF has embedded PDF metadata
  • the access PDF has been uploaded to VText and set as the primary bitstream
  • ocr.txt has been uploaded to the VText metadata bundle
  • the VText Handle URL is recorded in finding-aid.md
  • the ArchivesSpace item record and digital object already exist
  • the ArchivesSpace URL is recorded in finding-aid.md
  • the item UUID has been generated and recorded on the VText cover page
  • the item folder contains a folder named with the UUID

If any of these are missing, return to Add to VText, especially Generate UUID and Create UUID Folder, or Add to ArchivesSpace and complete those workflows before continuing.

10. Download VText Metadata XML

Select the item folder, UUID folder, or metadata folder.

Run:

Favorite Tools > 04 Download Vtext Metadata

Paste either the VText item URL or the Handle URL.

Example:

https://hdl.handle.net/10428/7696

The script downloads Dublin Core XML and saves it using the handle number.

Example output:

10428-7696.xml

Keep this file with the item metadata.

11. Build the Bag Directory Structure

Run:

Favorite Tools > 05 Build Bag Directory

This script move all files into the UUID directory, builds the bag structure and moves files to the appropiate directories:

UUID/metadata/bag-documentation.txt

Expected pre-BagIt structure:

item-folder/
+-- UUID/
    +-- objects/
    |   +-- nearline/
    |   +-- online/
    +-- metadata/

The UUID folder should already exist from the Add to VText UUID step. If it does not, return to that workflow and create it before continuing. If the active folder is already a UUID folder, the script creates the required subfolders inside it.

12. Move Files to the Correct Directories

Move files into the UUID folder structure.

Preservation master files:

  • place JP2 page scans in UUID/objects/nearline
  • place original JPG or JPEG page scans in UUID/objects/nearline if they were not converted to JP2
  • place the master PDF or PDF/A, if applicable, in UUID/objects/nearline

Access files:

  • place access PDFs in UUID/objects/online

Metadata files:

  • place metadata and documentation files in UUID/metadata

Metadata files may include:

  • copy-verification.log
  • clam.log
  • sip-manifest.csv
  • exiftool-original-metadata.csv
  • migration.log
  • finding-aid.md
  • vtext-cover.docx
  • final cover PDF
  • ocr.txt
  • VText XML, such as 10428-7696.xml
  • any other supporting metadata files

13. Create the Final Manifest

Open the UUID folder in FreeCommander and confirm it contains:

objects/
metadata/

Run:

Favorite Tools > 06 Create Manifest

This creates the final Siegfried manifest:

UUID/metadata/manifest.csv

Create this manifest only after files are in their final locations.

Verify the Master PDF/A Format

Open UUID/metadata/manifest.csv and find the row for the master PDF/A file in UUID/objects/nearline.

Confirm that the master PDF/A row has this PRONOM ID:

fmt/477

fmt/477 identifies PDF/A-1b. If the master PDF/A is missing from the final manifest, or if its PRONOM ID is not fmt/477, stop and correct the PDF/A file before continuing.

14. Generate the UUID MD5 File

In FreeCommander, select the UUID folder and press Ctrl + K.

Use these settings:

  • Checksum Method: MD5
  • Store Checksums: checked

Run the checksum tool.

This produces a file named:

UUID.md5

Example:

34AF668A-2745-4EBC-AEDE-C6D7A862BB10.md5

Before BagIt runs, the .md5 file should be beside the UUID folder:

item-folder/
+-- UUID/
+-- UUID.md5

When BagIt runs, both are moved into the data folder:

item-folder/
+-- data/
    +-- UUID/
    +-- UUID.md5

15. Create the BagIt Bag

Open the top-level item folder, the UUID folder, or the UUID metadata folder in FreeCommander.

Run:

Favorite Tools > Bagit > 07a Create Bag

Do not manually add _bag to the top-level folder name before running the script. The script creates the BagIt structure, validates the bag, and appends _bag to the top-level folder name only if validation passes.

Bag-info autofill

The BagIt script looks for a VText XML file in the bag metadata folder. If a VText XML file is present, the script can autopopulate bag-info.txt with the correct values. If no VText XML file is present, enter the values manually when prompted. Optional fields can be skipped. At minimum, the description should include the source or collection, item name, date, and VText URL.

When prompted for the Bag Description, use the item citation.

Example description:

Valdosta State University. "Title," Valdosta State University Archives and Special Collections. Valdosta, Georgia. United States. Retrieved from https://hdl.handle.net/10428/####

The script creates the BagIt structure in place.

Expected final bag structure:

item-id_bag/
+-data/
+-- bagit.txt
+-- bag-info.txt
+-- manifest-sha256.txt
+-- tagmanifest-sha256.txt
+-- data/
    +-- UUID.md5
    +-- UUID/
        +-- objects/
        |   +-- nearline/
        |   +-- online/
    +-- metadata/

Run:

Favorite Tools > Bagit > 07b Verify Bag

Verify the bag after it is created and before zipping. Continue only if BagIt validation reports PASS.

If you update metadata or add or remove files after bag creation, run:

Favorite Tools > Bagit > 07c Update Bag Manifests

Run 07c Update Bag Manifests after every change to the bag. Then run 07b Verify Bag again and continue only if validation reports PASS.

16. Zip the Completed Bag

Select the completed top-level _bag folder.

Run:

Favorite Tools > 08 Zip Bag

The ZIP should contain the top-level bag folder.

Append _bag to the end of the ZIP filename.

Example:

name-of-bag_bag.zip

Expected ZIP structure:

item-id_bag.zip
+-- item-id_bag/
    +-- bagit.txt
    +-- bag-info.txt
    +-- manifest-sha256.txt
    +-- tagmanifest-sha256.txt
    +-- data/

17. Update ArchivesSpace Extent with Zipped Bag Details

After the bag ZIP has been created, return to the existing ArchivesSpace item record and update the extent information with the completed ZIP package details in the "Physical Details" field.

Record the final ZIP details from:

  • the completed ZIP filename
  • the completed ZIP file size
  • any required bag details from bag-info.txt

In FreeCommander:

  1. Select the completed bag ZIP file.
  2. Click the red file path icon drop-down in the FreeCommander path button.
  3. Select Copy details to clipboard.
  4. Paste the copied ZIP file details into the ArchivesSpace extent record.
  5. Update the ArchivesSpace file size and extent information so it matches the completed bag ZIP. Use the ZIP file size in bytes and select bytes as the extent type.
FreeCommander file path icon menu with Copy details to clipboard selected
Use Copy details to clipboard to capture the completed bag ZIP filename and file size for ArchivesSpace.
ArchivesSpace Extents section with ZIP file size entered as bytes and bag ZIP details in Physical Details
Update the extent number and type with the ZIP size in bytes, then paste the completed bag ZIP details into Physical Details.

18. Verify the ZIP Opens Correctly

Open the ZIP file and confirm it contains:

  • the top-level _bag folder
  • bagit.txt
  • bag-info.txt
  • manifest-sha256.txt
  • tagmanifest-sha256.txt
  • data/
  • data/UUID/
  • data/UUID.md5

If the ZIP opens incorrectly or does not contain the top-level bag folder, recreate the ZIP.

19. Copy the Completed ZIP to DPLAB Storage

Select the final bag ZIP file.

Run:

Favorite Tools > 09 Copy Completed Bag to DPLAB Storage

This copies the ZIP file to:

dplab_storage/_completed_bags

The script verifies the copy by file size and SHA-256 checksum.

Continue only if the script reports PASS.

Expected Output

The finished ZIP should be copied and verified in:

dplab_storage/_completed_bags

The final bag should include:

item-id_bag/
+-- bagit.txt
+-- bag-info.txt
+-- manifest-sha256.txt
+-- tagmanifest-sha256.txt
+-- data/
    +-- UUID.md5
    +-- UUID/
        +-- objects/
        |   +-- nearline/
        |   +-- online/
        +-- metadata/
            +-- manifest.csv
            +-- finding-aid.md
            +-- clam.log

Other metadata files may also be present, including copy-verification.log, sip-manifest.csv, exiftool-original-metadata.csv, migration.log, ocr.txt, vtext-cover.docx, and the downloaded VText XML file.

20. Mark the Source Copy Complete

After the final ZIP has been copied to dplab_storage/_completed_bags and copy verification reports PASS, rename the original source item folder in %DPLAB_STORAGE% so it ends with:

_complete

Example:

ca009-007-001_quitman-high-reunions_1996_complete

This marks the source copy as processed and ready for staff quality control review. After staff confirm the final bag is complete and backed up, the _complete source copy may be deleted by staff.

Do not rename the source copy if:

  • the completed ZIP copy verification failed
  • the completed ZIP is missing from dplab_storage/_completed_bags
  • the final bag still needs quality control review or correction
  • staff have not confirmed that the source copy can be removed

21. Delete Local Processing Files

If the completed ZIP copy verification reports PASS, the local processing files may be deleted.

Do not delete local files if:

  • the ZIP copy verification failed
  • the completed ZIP is missing from dplab_storage/_completed_bags
  • additional edits are still needed

Staff-Only Final Preservation Storage

This section is for Archives staff only. Student workers should stop after the completed ZIP has been copied and verified in dplab_storage/_completed_bags, unless staff give different instructions.

After staff quality control is complete, the completed bag ZIP must be copied or uploaded to:

  • Amazon Glacier
  • ArchBak external hard drive
  • Dark Archives storage

After the storage copies are complete, add the completed bag to the Microsoft Access Bag Database:

V:\librarydata\archives_store\admin\Databases and Indexes\BagDatabase\BagDatabase.accdb

After the completed bag is backed up to all required preservation storage locations and recorded in the Bag Database, remove duplicate or unprocessed copies that no longer need to remain on local or working storage.

Delete only after confirming the preservation storage copies and Bag Database entry are complete:

  • the completed ZIP from dplab_storage/_completed_bags
  • duplicate bag ZIP copies in temporary or working locations
  • duplicate source or working folders in DPLAB_STORAGE
  • unprocessed or leftover local processing files

Do not delete anything if staff quality control is incomplete, if any preservation storage copy is missing, or if the Bag Database entry has not been created.

Completion Checklist

Before considering the item complete, confirm the final bag includes:

  • bagit.txt
  • bag-info.txt
  • manifest-sha256.txt
  • tagmanifest-sha256.txt
  • data/
  • data/UUID/
  • data/UUID.md5
  • data/UUID/objects/nearline/
  • data/UUID/objects/online/
  • data/UUID/metadata/
  • data/UUID/metadata/manifest.csv
  • data/UUID/metadata/finding-aid.md
  • data/UUID/metadata/clam.log
  • the master PDF/A appears in data/UUID/metadata/manifest.csv with PRONOM ID fmt/477
  • ArchivesSpace extent record has been updated with the completed ZIP details

Also confirm, if applicable:

  • data/UUID/metadata/copy-verification.log
  • data/UUID/metadata/sip-manifest.csv
  • data/UUID/metadata/exiftool-original-metadata.csv
  • data/UUID/metadata/migration.log
  • data/UUID/metadata/ocr.txt
  • data/UUID/metadata/vtext-cover.docx
  • data/UUID/metadata/[handle].xml

The final ZIP must be copied and verified in dplab_storage/_completed_bags.

Staff-only final storage:

  • completed bag ZIP has been copied or uploaded to Amazon Glacier
  • completed bag ZIP has been copied to the ArchBak external hard drive
  • completed bag ZIP has been copied to Dark Archives storage
  • completed bag has been added to the Microsoft Access Bag Database
  • duplicate completed bag ZIPs and unprocessed working copies have been removed after preservation storage backup is complete