Digitization Guidelines
The Massachusetts Archives offers procedures and best practices for digitizing textual documents, graphics, and photographs using a scanner. These recommendations do not cover procedures for digitizing bound volumes, rare materials, or reformatting audiovisual media.
For assistance digitizing such materials, please contact the Digital Archives staff. This page conforms to current standards established by the Federal Agencies Digital Guidelines Initiative (FADGI) for image capture and storage.
1. Survey Documents Designated for Scanning
Examine the physical characteristics of the documents, especially the following elements:
- Font size
- Whether or not any handwriting is present
- Whether or not any color is present
- The presence of staining, fading or discoloration
- The presence of images, graphs, or tables
Also assess the fragility of the document. Some documents may have rips, tears, creases, or crumbling edges that prevent certain types of scanning, requiring specialized handling. Feed scanners are unsuitable for fragile items and may require the use of a flatbed scanner.
If you have concerns about the state of your records, and whether they can withstand the scanning process, reach out to your scanning service representative or the Digital Archives team.
2. Determine the Type of Digital Record to be Created
Digital records are created for various purposes. The most common examples of digitized materials include:
- Record Copy – The single copy of a document, often the original but sometimes a digital surrogate, which is designated as the official copy for reference and preservation. If you plan to scan the record and destroy the physical original, the digital surrogate would become the record copy and must be Archival/Preservation/Master quality.
- Archival/Preservation/Master Copy – These are the highest quality files by resolution. They represent, as closely as possible, the information contained in the original. They are unedited and either uncompressed or stored using lossless compression. They serve as the long term, sustainable copy, and can also serve as a digital surrogate for a non-digital object. Machines tend to take longer to produce preservation copies due to their large file sizes.
- Access Copy – These are lower quality copies of the original scan, intended for daily use, web access, and sharing. They are smaller images that fit more easily on a monitor, and can be downloaded quickly without a fast network connection. Access copies should be created after archival copies are made.
3. Establish Suitable Image Capture Mode
After the initial survey of documents designated for scanning, decide on which type of image capture (or Bit-Depth) to use:
- Black-and-White (bi-tonal or monochrome): Best suited for high-contrast documents, such as printed black-and-white text with no fading or discoloration.
- Grayscale: Best suited for older documents with poor legibility, diffuse characters (faded, discolored, carbon copies, etc.), handwritten text, low contrast between text and background, and halftone illustrations or photographs that accompany any text. Grayscale is typically 8-bit depths.
- Color: Best suited for documents already in color, or where color information is important to the understanding of the text. Typically comes in 16-bit through 36-bit depth.
Black-and-white images use the least amount of storage, while color images use the most. Color scanning also takes longer for machines to process, so it is not always necessary to scan everything in color. Bi-tonal mode can be acceptable assuming no loss of information, and when there are no unique benefits to scanning at higher bit-depth.
4. Determine Resolution Settings
Determine an appropriate resolution, or PPI/DPI (pixels per inch/dots per inch) for the image. The higher the resolution or PPI/DPI, the clearer the image and the larger the file size will be. An appropriate resolution should depend on the type of record (B & W printed text, handwritten text, color image, etc.) and the type of copy being created.
Ideal resolutions for digital surrogates depend on several factors, including:
- Font size (smaller fonts may require higher resolutions)
- Faded text
- Handwriting
Higher resolutions can also offer increased accuracy for Optical Character Recognition (OCR) processing. The Spatial Resolutions table below lists a range of recommended settings.
Archival Record | Spatial Resolution | Bit-Depth | Optimal Format | Acceptable Format |
---|---|---|---|---|
B & W printed text documents | 300-400 PPI | B & W | PDF/A | |
Handwritten text documents | 300-400 PPI | Grayscale | PDF/A | |
Damaged text documents | 300-600 PPI | Grayscale | PDF/A | |
B & W photographs | 300-600 PPI | Grayscale | TIFF | JPEG |
Color photographs | 300-600 PPI | Color | TIFF | JPEG |
Oversized materials | 300-600 PPI | Grayscale or Color | TIFF | JPEG |
5. Choose a Final Format
The table above lists the optimal and acceptable formats for scanned materials. Optimal formats create larger file sizes but are less likely to degrade. Optimal formats also follow standards that assist with archiving, and provide easier transfers and openings across devices.
PDF vs PDF/A
PDF/A sets stricter standards for embedding, referencing, and encrypting content. These standards ensure that a document will be accessible and unchanged long into the future. Because PDF/A forbids dynamic content, whatever exists in the document today will persist for years to come. PDF/A files also conform to ISO standards for accessible PDF technology.
TIFF vs JPEG
While both are digital image formats, TIFFs are uncompressed files with no image degradation or compression artifacts. TIFFs can preserve higher bit depths than JPEGs. However, due to their large size, TIFFs can’t always be displayed on websites—and require more storage space than JPEG files. While TIFFs can be converted to JPEGs for accessibility, JPEGs converted to TIFFs gain nothing in terms of quality or resolution.
6. Quality Check
Perform a quality check to ensure that post-conversion documents meet acceptable standards. Scanning contractors typically perform this step themselves, but it is good practice to sample a number of items from their work to confirm successful conversions. For smaller batches, a minimum of 10 images or 10%, whichever is higher, should be inspected. Images should be viewed at 100% magnification on the monitor.
Inspect all images for the following defects:
- Non-compliance with original digital imaging specifications
- Incorrect size
- Incorrect resolution
- Incorrect file format/name
- Incorrect color/bit depth (color image in grayscale, grayscale in B & W, etc.)
- Loss of detail
- Contrast too low or too high
- Blurriness or lack of sharpness
- Improper image orientation (upside down, sideways, backwards, etc.) or skewed images
- Noise on portions of the image, scanning artifacts, banding, etc.
- Incomplete or cropped images
- Missing pixels
Rescan any images with errors. If many images (over 1%) have errors, the entire batch should be rescanned.