Skip to main content

Digitization Guidelines

The Massachusetts Archives offers procedures and best practices for digitizing textual documents, graphics, and photographs using a scanner. These recommendations do not cover procedures for digitizing bound volumes, rare materials, or reformatting audiovisual media.

For assistance digitizing such materials, please contact the Digital Archives staff. This page conforms to current standards established by the Federal Agencies Digital Guidelines Initiative (FADGI) for image capture and storage.

1. Survey Documents Designated for Scanning

Examine the physical characteristics of the documents, especially the following elements:

  • Font size
  • Whether or not any handwriting is present
  • Whether or not any color is present
  • The presence of staining, fading or discoloration
  • The presence of images, graphs, or tables

Also assess the fragility of the document. Some documents may have rips, tears, creases, or crumbling edges that prevent certain types of scanning, requiring specialized handling. Feed scanners are unsuitable for fragile items and may require the use of a flatbed scanner.

If you have concerns about the state of your records, and whether they can withstand the scanning process, reach out to your scanning service representative or the Digital Archives team.

2. Determine the Type of Digital Record to be Created

Digital records are created for various purposes. The most common examples of digitized materials include:

  • Record Copy – The single copy of a document, often the original but sometimes a digital surrogate, which is designated as the official copy for reference and preservation. If you plan to scan the record and destroy the physical original, the digital surrogate would become the record copy and must be Archival/Preservation/Master quality.
  • Archival/Preservation/Master Copy – These are the highest quality files by resolution. They represent, as closely as possible, the information contained in the original. They are unedited and either uncompressed or stored using lossless compression. They serve as the long term, sustainable copy, and can also serve as a digital surrogate for a non-digital object. Machines tend to take longer to produce preservation copies due to their large file sizes.
  • Access Copy – These are lower quality copies of the original scan, intended for daily use, web access, and sharing. They are smaller images that fit more easily on a monitor, and can be downloaded quickly without a fast network connection. Access copies should be created after archival copies are made.

3. Establish Suitable Image Capture Mode

After the initial survey of documents designated for scanning, decide on which type of image capture (or Bit-Depth) to use:

  • Black-and-White (bi-tonal or monochrome): Best suited for high-contrast documents, such as printed black-and-white text with no fading or discoloration.
  • Grayscale: Best suited for older documents with poor legibility, diffuse characters (faded, discolored, carbon copies, etc.), handwritten text, low contrast between text and background, and halftone illustrations or photographs that accompany any text. Grayscale is typically 8-bit depths.
  • Color: Best suited for documents already in color, or where color information is important to the understanding of the text. Typically comes in 16-bit through 36-bit depth.

Black-and-white images use the least amount of storage, while color images use the most. Color scanning also takes longer for machines to process, so it is not always necessary to scan everything in color. Bi-tonal mode can be acceptable assuming no loss of information, and when there are no unique benefits to scanning at higher bit-depth.

4. Determine Resolution Settings

Determine an appropriate resolution, or PPI/DPI (pixels per inch/dots per inch) for the image. The higher the resolution or PPI/DPI, the clearer the image and the larger the file size will be. An appropriate resolution should depend on the type of record (B & W printed text, handwritten text, color image, etc.) and the type of copy being created.

Ideal resolutions for digital surrogates depend on several factors, including:

  • Font size (smaller fonts may require higher resolutions)
  • Faded text
  • Handwriting

Higher resolutions can also offer increased accuracy for Optical Character Recognition (OCR) processing. The Spatial Resolutions table below lists a range of recommended settings.

Archival Record Spatial Resolution Bit-Depth Optimal Format Acceptable Format
B & W printed text documents 300-400 PPI B & W PDF/A PDF
Handwritten text documents 300-400 PPI Grayscale PDF/A PDF
Damaged text documents 300-600 PPI Grayscale PDF/A PDF
B & W photographs 300-600 PPI Grayscale TIFF JPEG
Color photographs 300-600 PPI Color TIFF JPEG
Oversized materials 300-600 PPI Grayscale or Color TIFF JPEG

5. Choose a Final Format

The table above lists the optimal and acceptable formats for scanned materials. Optimal formats create larger file sizes but are less likely to degrade. Optimal formats also follow standards that assist with archiving, and provide easier transfers and openings across devices.

PDF vs PDF/A

PDF/A sets stricter standards for embedding, referencing, and encrypting content. These standards ensure that a document will be accessible and unchanged long into the future. Because PDF/A forbids dynamic content, whatever exists in the document today will persist for years to come. PDF/A files also conform to ISO standards for accessible PDF technology.

TIFF vs JPEG

While both are digital image formats, TIFFs are uncompressed files with no image degradation or compression artifacts. TIFFs can preserve higher bit depths than JPEGs. However, due to their large size, TIFFs can’t always be displayed on websites—and require more storage space than JPEG files. While TIFFs can be converted to JPEGs for accessibility, JPEGs converted to TIFFs gain nothing in terms of quality or resolution.

6. Quality Check

Perform a quality check to ensure that post-conversion documents meet acceptable standards. Scanning contractors typically perform this step themselves, but it is good practice to sample a number of items from their work to confirm successful conversions. For smaller batches, a minimum of 10 images or 10%, whichever is higher, should be inspected. Images should be viewed at 100% magnification on the monitor.

Inspect all images for the following defects:

  • Non-compliance with original digital imaging specifications
  • Incorrect size
  • Incorrect resolution
  • Incorrect file format/name
  • Incorrect color/bit depth (color image in grayscale, grayscale in B & W, etc.)
  • Loss of detail
  • Contrast too low or too high
  • Blurriness or lack of sharpness
  • Improper image orientation (upside down, sideways, backwards, etc.) or skewed images
  • Noise on portions of the image, scanning artifacts, banding, etc.
  • Incomplete or cropped images
  • Missing pixels

Rescan any images with errors. If many images (over 1%) have errors, the entire batch should be rescanned.