Using Checksum Manifests

Jump to: navigation, search

Checksum manifests are text files that contain the file name and corresponding checksum of each file in a group of files. Checksum manifests can be a helpful digital preservation tool when attempting to manually check fixity on a large group of files. For example, one way to ensure that a corpus of files are successfully copied from one location to another is to create a checksum manifest of the files before transfer, and then compare this manifest to the checksums of the new copies in their new location. If all has gone well, the copies will be identical and therefore the checksum manifests will also be identical. That same manifest can then be stored and used to ensure that no changes have occurred to the files since they were created. It should be noted that individual stand alone applications exists for automating checksum comparison. For regularly scheduled fixity checks, consider the application Fixity. Checksum manifests offer a more manual methodology for intermittent checks, or for incorporation into automation scripting.

Creating Checksum Manifests

There are a variety of tools that can be used for creating checksum manifests. In this article the command line application md5deep will be used as an example. Md5deep has a recursive function built in to the application, which allows the user to generate checksums for all of the files within a particular directory or volume. For more information on using the md5deep's built in recursive function, view the application's "man page" hosted here:

However, when using md5deep's built in recursive function the application processes multiple files simultaneously. On smaller files that are stored on hard drives (either HDD or solid state) this is not an issue (in this author's experience). When creating a checksum manifest on a group of large files (approximately 1 gb or larger), or on a group of files stored on magnetic media (i.e. LTO tape) it is not recommended to use the application's built in recursive function, but rather to programmatically implement a command recursively through a different scripting language. This can be accomplished in a variety of ways, below is an example command line script for Windows:

for /r %%A in ("*") do md5deep64 -b -e "%%A" >> "checksum_manifest.txt"

Here is an example of a bash script for working in a Mac environment:

while read INPUTFILE ; do md5deep -b -e "${INPUTFILE}" >> "checksum_manifest.txt" ; done < <(find . -iname "*.*")

Note that the ">" function prints the results of the command into a text file, and that the ">>" function adds new results to the same text file, as opposed to writing over the text file with each new checksum. The "-b" and "-e" flags are recommended when creating checksum manifests. The "-b" flag lists only the file name, and not the full file path, which is essential when comparing checksum manifests for copies in two different locations. The "-e" flag causes the application to estimate the time remaining on each file.

The md5deep application does not necessarily process a group of files in a uniform manner. In order to be able to compare a set of checksum manifests, first sort the manifest alphanumerically. This function can be built into the Windows command listed above, like so:

for /r %%A in ("*") do md5deep64 -b -e "%%A" >> "checksum_manifest.txt" sort /O "checksum_manifest_sorted.txt"

Or this process can simply be preformed on its own, after the fact:

sort checksum_manifest.txt > checksum_manifest_sorted.txt

Comparing Checksum Manifests

Once a checksum manifest has been created for a group of files, it can be used as a reference for the state of those files moving forward. The manifest can be used to check fixity, to ensure that files have not changed over time, or the manifest could be used to ensure successful file transfer to a new location. Using the manifest as a reference calls for repeating the steps above, and comparing the resulting manifest to the original.

Just as with the Creating Checksum Manifest section, there are a variety of methods for comparing manifests. The simplest approach may be to compare the checksums of the two manifests. Two text files containing the same information, both sorted alphanumerically, will yield the same checksum. However, this method will only result in a binary response, the manifests either match, or they do not, which makes troubleshooting a mismatch difficult. Fortunately there are commands already built in to both the Windows and Mac operating systems to compare text strings, which can be helpful when attempting to determine which files, if any, have changed since the first manifest was created.

In a Windows environment, one option is to use the File Compare or "FC" command. To use this command type "FC" and the file names of the two checksum manifests into the command prompt, or alternatively type "FC" and drag and drop the checksum manifest text files into the command prompt. Either way, the result should look like this:

FC C:\file\path\checksum_manifest.txt C:\file\path\checksum_manifest_2.txt

If all goes well the command prompt should return the text FC: no differences encountered

If differences are encountered, FC will report these differences by quoting lines from the two manifests to demonstrate the difference. Like this:

***** C:\file\path\Checksum_manifest_for_RAID.txt

d41d8cd98f00b204e9800998ecf8427e vid_1.mp4

d41d8cd98f00b204e9800998ecf8427e vid_2.MXF

***** C:\file\path\Checksum_manifest_for_LTO.txt

d41d8cd98f00b204e9800998ecf8427e vid_1.mp4

f68c6f37590345154499bd39423dfc0f vid_2.MXF


In the example above the checksum for the "vid_2.MXF" file has changed. See how the second line of text from “Checksum_manifest_for_RAID.txt” does not match the second line of “Checksum_manifest_for_LTO.txt." See the following article for more help with the FC command:

In a Mac environment, the "diff" command functions very similarly. Call the command by typing "diff" and then list the two file names of the two manifests. For example:

diff /Users/file/path/checksum_manifest.txt /Users/file/path/checksum_manifest_2.txt

If differences are encountered, diff will report these differences by quoting lines from one manifest or the other. Diff uses a "<" to indicate which file the string is from, referencing the order they were listed in the original command. The letters and numbers before the "<" or ">" indicate which line the differences occurs. First number references the first file called in the command, then either a letter, "d" for deletion, "a" for added, or "c" for changed, and finally the corresponding line in the second files called in the command. Like this:

diff checksum_manifest_sorted.txt checksum_manifest_sorted2.txt


< 1d8e2648ad9c0c34fb13e6626420a7aa vid_1.MXF


> 7853f84fa4302a740499b7e0ec03a190 vid_2.MXF

Again, in the example above the checksum for the "vid_2.MXF" file has changed. The "5c5" line indicates that a change was found on line 5 of each file, and the following three lines list the two mismatched strings, with "---" separating the results from the two different files.

Personal tools
MediaWiki Appliance - Powered by TurnKey Linux