MD5
Sums: What they are
and How to use them
(with Links to Free
MD5 Programs)
Copyright©2003,2004,2006
by Daniel B. Sedory
[Do not reproduce
in any form without permission from the author.]
An MD5 sum (or Message Digest 5 checksum) is a 16-byte (128-bits) Hexadecimal number ( written as 32 characters using the digits 0-9 and A-F or a-f ) that results from performing a series of calculations on digital data according to a mathematical algorithm devised by Ronald Rivest of MIT; documented in detail in RFC 1321. The amount of data that you can make an MD5 sum from has no limit; this means it can be used to make an MD5 sum from every byte on a hard drive, individual files or nothing at all. That's right, you can calculate the MD5 sum of a file which has no data. These are often called "zero-length" files, and they will all produce the same exact MD5 sum on any computer; that sum being:
d41d8cd98f00b204e9800998ecf8427e
The reason MD5 sums are so useful is because they are practically unique* and can be used as a digital fingerprint (or signature) for whatever file or data they were created from. Therefore, you can use MD5 sums to verify that the files you just downloaded from some web site far away are exactly the same as the ones the author made. The only other way to know if two files are exactly the same (or if you want to know why their MD5 sums don't agree) is by comparing every byte of the files using an FC (File Compare) DOS command or some type of byte by byte comparison program. It's also important to know that the appearance of an MD5 sum is no indicator of the nature of the data it was created from (such as a file's contents). For example, two MD5 sums with completely different hex bytes in every one of their 16-byte locations could easily have come from files that differ by only a single binary bit! So, an MD5 sum that's completely different than one given for say, some huge .ZIP file you just downloaded does not necessarily mean that the whole file is a total waste; it might simply mean that you had a problem with only a few bytes near the very end (and could easily recover most of the files inside it by using a .ZIP-repair program).
Once DOS collectors have created and verified the MD5 "fingerprints" for all the files on an old DOS diskette, those MD5 Sums can also be used to test for any degradation in both their digital and physical copies. As a matter of fact, it's a good idea to make an "image file" of all your original diskettes and create MD5 sums for these image files too; this is what Forensics experts do to make sure that nothing has changed on the disks they examine. Music collectors use MD5 sums all the time to make sure that their MP3 files are exactly the same as an original after downloading it. The Operating System files of an important server for a company or military organization are often checked against the known good MD5 checksums (stored elsewhere) from the time the OS was first installed or last upgraded. And if you burn your own CDs, it's an easy way to check for any physical changes in the disc media (remember, only factory pressed CDs will last for many decades, but CD-R discs rely on dyes and are bound to fail much sooner!).
If you have a *nix
OS (such as Linux), there should already be an md5sum program
on your system. For example, if you enter
md5sum
at
a command prompt, you should see a blank line appear on your screen. If you
then type in the character for that system's "end-of-file" marker
(on a Linux machine, you'd normally press the CTRL + D keysto create
the "end-of-file" character), the MD5 sum for a zero-length
file should appear on your screen; followed by two spaces and a "-".
To find out more about how to use the program, enter
md5sum --help
at
your Linux or UNIX prompt, or read its man page.
If you have a Windows OS on your computer, there are a variety of programs available for both the GUI and Command-Line user, but many of them either have no means of checking a file against stored MD5 sums, or fail to produce an output in the accepted format; without at least some editing on your part. Some programs which do produce *.md5 files in the accepted format will be discussed shortly (or see links under MD5 Tools below).
The Commonly Accepted Format for an *.md5 File
The format for .md5 files (the commonly accepted extension for files containing MD5 sums) has been more or less agreed upon by the majority of those who use MD5 sums for verifying file downloads on the Internet. But this format should be quite acceptable to DOS Collectors (especially Linux users), since it's basically the same output you get from the UNIX md5sum command :
[
NOTE: Although technically all DOS or Windows filenames output by the
md5sum command are supposed to have an asterisk in front of them
(in column 34), all of the Windows MD5 programs I tested don't seem to
care if it's a blank space instead; as a matter of fact, I've recently been
informed that this is an archaic carryover that possibly never
should have been placed into the Linux documentation! Here
are the relevant lines from my RedHat Linux
md5sum --help output screen ( emphasis
shows the parts that should be reviewed by those responsible for such matters
):
"
-b, --binary read files in binary mode (default
on DOS/Windows) "
and
" When
checking , the input should be a former output of this program. The default
mode is to print a line with checksum, a character indicating type ( '*'
for binary, ' ' for text ) and name for each FILE."
Furthermore, I tested the checking function of the Linux md5sum
program with *.md5 files having many comment lines in them (lines
beginning with a semicolon), and it ran just the same with or without
them; so comment lines are quite acceptable under Linux as
well as some newer Windows MD5 programs. ]
The order of the lines inside an *.md5 file shouldn't matter for any program which compares the sums to the files themselves; the order in these files is often the same as that output by a simple directory listing. Sorting them alphabetically by filenames or numerically by md5sums might be helpful if a person needs to look at the data for some reason. Comment lines can be used to identify, for example, which version of a DOS OS the collection of MD5 sums corresponds to (since the file names are often exactly the same for many different versions)! Here's an example of an *.md5 format with a number of comment lines at the beginning of the file: Windows 98 SE Startup Disk MD5 Text File.
How to use hkSFV with .md5 Files
Since most of you probably use a Windows OS, let's take a look at a fairly new GUI program that can be used to both create and check .md5 files: It's called hkSFV (reflecting the fact that it was first made for .sfv files which use only CRC checksums). Upon installation, it will associate any file having an *.MD5 extension with itself, so clicking on an *.md5 filename not only opens the program, but causes it to immediately begin checking the MD5 sums of anything that's listed in the file. Clicking on one of the .md5 files (see link to .ZIP file below) with a Windows 98 SE Startup Disk in your A: drive gives the following output:
hkSFV's help file is fairly comprehensive, but for any unanswered questions there are forums at the web site for discussing them! ( NOTE: At the time of this writing, hkSFV was unable to create an .md5 file from a CD or any write-protected disks. Yet, it's quite easy for hkSFV to check an .md5 file against such media using a Pathway! The authors said they will fix this problem in the next release, but I don't know if it will work yet. I made the file you see above (W98SEDSK.md5) by using my own perl script (with MD5 support) and editing its output to include the "A:\" pathway for each file on the diskette.)
Here are my MD5 files for the Windows 98 and Windows 98 SE Startup Disks in a .ZIP file (don't forget to insert the Startup Disk you want to check before clicking on the .md5 file): MD5 Files of Win98 Boot Disks, or for the Windows ME and Windows XP Startup Disks: MD5 Files of WinME/XP Startup Disks.
How to use the md5sum.exe Program
At this time, I was able to find only one Command-Line tool that outputs sums in the 'required format' without adding any extra comments such as copyright notices, ads, etc. Simply download md5sum.exe into your C:\WINDOWS, C:\WINNT or other directory that's in your PATH. Unfortunately, this program must be run with the DOS prompt at the directory you want to obtain the MD5 sums for, or you'll get some really frustrating error messages which state: "No such file or directory" right after the filename it says doesn't exist! I'm still looking for a better program, or may even write one of my own. But for now, you should note the following steps (and example) for using this program:
Step 1: Change
the DOS prompt to the directory you wish to create the .md5 file from. (This
is much easier to do if you install Microsoft's
"Command Prompt Here" Powertoy for Win2000/XP!
Similar Registry/OS functionality is available for Win9x as well if you look
for it.)
For example, let's say your DOS box prompt is at C:\WINDOWS when you open it.
If you want to create an '.md5' file from all the files in the temp\dos
directory of your D: drive, you'd first have to switch
to that drive: D: (ENTER) and then 'cd'
to its temp\DOS
directory. Using the "Command Here" addition, you just
right-click on the directory and select the "Command (Prompt) Here"
item!
Step 2: At the prompt, enter: md5sum followed by a filespec such as *.* to create sums for all the files in that directory, and finally finish the entry with a re-direct symbol (">") pointing to the full-path and filename of your new .md5 file. For example, here's the command you'd enter to create the MD5 sums of all the .COM files in the root directory of a diskette in your B: drive and redirecting the output to an new file (called pdos330s.md5) in your C:\TEMP directory:
B:\>md5sum *.com > c:\temp\pdos330s.md5
which created the following pdos330s.md5 file from my own Phoenix Computer MS-DOS 3.30 (OEM) Supplemental Programs (backup) diskette:
6924fd81513d827a6ca91472f7e9eeeb
*BACKUP.COM
5b12878faa52117af2ef16668b62cd7c *DEBUG.COM
aa2bb7fa6539119c3805d4b73230dd64 *RESTORE.COM
67b1da798e9e6d7c6dbcd322bed44233 *TREE.COM
In order to check the MD5 sums in the .md5 file against those on the present diskette, you once again need to be in the directory for the files to be checked, then enter md5sum followed by the "-c" switch and the full-path to the .md5 file like this:
B:\>md5sum
-c c:\temp\pdos330s.md5
BACKUP.COM: OK
DEBUG.COM: OK
RESTORE.COM: OK
TREE.COM: OK
MD5deep is a cross-platform program to compute MD5 sums on an arbitrary number of files. The program is known to run on Windows, Linux, FreeBSD, OS X, Solaris, and should run on most other platforms. md5deep can now use *.md5 files created by such programs as hkSFV (with comment lines) md5deep is similar to the md5sum program found in Linux, but has the following additional features:
For version 1.0, the author (Jesse Kornblum) has created many new examples to explain the use of various switches under md5deep! (See the link to his site below.)
Here is md5deep's usage information display (note the -h):
C:\TEMP>md5deep -h
md5deep version 1.0 by Jesse Kornblum.
Usage:
$ md5deep [-v|-V|-h] [-m|-M|-x|-X <file>] [-resbt]
[-o fbcplsd] FILES
-v - display version number
and exit
-V - display copyright information and exit
-h - display this help message and exit
-m - enables matching mode. See README/man page
-x - enables negative matching mode. See README/man page
-M and -X are the same as -m and -x but also print hashes of each file
-r - enables recursive mode. All subdirectories are traversed
-e - compute estimated time remaining for each file
-s - enables silent mode. Suppress all error messages
-o - Only process certain types of files:
f - Regular File
b - Block Device
c - Character Device
p - Named Pipe (FIFO)
l - Symbolic Link
s - Socket
d - Solaris Door
md5deep will also accept the switches: -t and -b just to remain compatible with the older *nix/Win32 md5sum program, but they are ignored; and do not affect its output.
md5deep is a very useful program for creating *.md5 files. The output of md5deep does not use an '*' (asterisk) in front of the file names, but as we stated above, many Windows programs that check the output of *.md5 files don't seem to care one way or the other about that. The following is a copy of the output md5deep gave with a Windows 98 Startup Disk in the A: drive using these parameters:
C:\TEMP>md5deep A:\*.COM
A:\IO.SYS A:\MSDOS.SYS
b067cc477e113932e0a997e9d5e4319d A:\COMMAND.COM
4823258556ae481a19015c22e33e8a9e A:\IO.SYS
659d373aefa0966f804ca7d0304c3118 A:\MSDOS.SYS
Here's how you would create an *.md5 file for your Windows XP Startup Disk:
C:\TEMP>md5deep A:\*.* > WinXPSD.md5
And to its credit, if you then clicked on the file "WinXPSD.md5" with hkSFV installed on your Windows system, it would be able to check on the MD5 sums for all the files of any Windows XP Startup Disk in your A: drive! [ Here's a copy of that WinXPSD.md5 file; created by md5deep using the DOS redirect symbol (>). You should also note that the MD5 sums for both its AUTOEXEC.BAT and CONFIG.SYS files are: d41d8cd98f00b204e9800998ecf8427e. Do you recall what that means? Look briefly at the top of this page again... Do you know now? I like to think of this as the "foo bar" sum; some of you might know what that means, because the hex digits "f00b" can be thought of as the beginning of that phrase. The repetition of "98" three times: in front of the "f00b" and around "009" and/or the digit "d" at the beginning and "e" at the end might help you remember it. The point for the Win XP Startup Disk is that both of these files are zero-length; or empty! So booting up your system with a Windows XP Startup Disk will only give you an A:\> prompt on the screen! And if you type in the command ver, you'll find out that you've got the same COMMAND.COM file as a Windows ME user! In short, you should either add more files to this disk or go find yourself some other boot disk, because 'as is' this thing is almost worthless to you! One very useful program I'd add is a disk editor and some of the utility programs that Microsoft erased from the original diskette! Take this link for all the details of the Windows XP Startup Disk. ]
An md5.exe DOS (16-Bit) Program (Compiled by 3L Ltd.)
See the comments below (under MD5 Tools) for a link to 3L Limited. This program is an adaptation of the source code presented in RFC1321 by Ronald Rivest and conforms to the parameters listed in that document.
Although the output of this program does not conform to the format discussed above, it can be very useful when you're limited to running under DOS (real 16-bit mode). It will, however, also function correctly in a 32-bit DOS-box under Windows.
If you run this program against a copy of itself by entering: md5 md5.exe, the output will appear as follows:
MD5 (md5.exe) = ab5f8d3485f9a0e660dd93f151f8c03c
Since this program conforms to the parameter usage of the code in RFC1321, you also have the choice of entering "-s ", "-x " or "-t " on the command line.
For the -s (string) parameter, you must place a string of characters immediately after the "-s" with no intervening spaces (unless you use double-quote marks around the string)! Here are some examples to clarify this usage:
C:\TEMP>md5 -snospaceshere MD5 ("nospaceshere") = 201fbc7441fa2a66b72bbed1247d1379 C:\TEMP>md5 -s"this string uses quote marks" MD5 ("this string uses quote marks") = d3581f9f0a3ef0b5dd342603e7824fcd C:\TEMP>md5 -s"message digest" MD5 ("message digest") = f96b697d7cb7938d525a2f31aaf161d0 C:\TEMP>md5 -s MD5 ("") = d41d8cd98f00b204e9800998ecf8427e
For -x, you should see the following on your display; except for the ' / '(red slash) where the lines were wrapped:
C:\TEMP>md5 -x MD5 test suite: MD5 ("") = d41d8cd98f00b204e9800998ecf8427e MD5 ("a") = 0cc175b9c0f1b6a831c399e269772661 MD5 ("abc") = 900150983cd24fb0d6963f7d28e17f72 MD5 ("message digest") = f96b697d7cb7938d525a2f31aaf161d0 MD5 ("abcdefghijklmnopqrstuvwxyz") = c3fcd3d76192e4007dfb496cca67e13b MD5 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789")/ = d174ab98d277d9f5a5611c2c9f419d9f MD5 ("1234567890123456789012345678901234567890123456789012345678901234/ 5678901234567890") = 57edf4a22be3c955ac49da2e2107b67a
and something similar (for -t ; most likely with a different time and speed) to this:
C:\TEMP>md5 -t MD5 time trial. Digesting 10000 5000-byte blocks ... done Digest = 7870442513c89405972bca276a153ca3 Time = 6 seconds Speed = 8333333 bytes/second
If you simply execute the program without any parameters (md5 ENTER) and enter text data from the standard input device, MD5.EXE will simply output a single MD5 sum on the next line (multiple lines of text are acceptable too). For example, if you type in the text "message digest" (without the quote marks) and enter the appropriate End of File character for MS-DOS / Windows; a ^Z (use the CTRL and z keys) immediately after the text (on the same line), you should get the MD5 sum for a file that contains only that text. Here's the output as it should appear on your screen:
C:\TEMP>md5
message digest^Z
f96b697d7cb7938d525a2f31aaf161d0
Although this program will not accept 'wildcards' in the file name; you can compute the MD5 sum for a small number of files at the same time by entering multiple file names (separated by a space) on the same command line. There is no usage help built into the program!
RFC
1321 (Request
for Comments # 1321 by R. Rivest, MIT Laboratory for Computer Science and RSA
Data Security, Inc. April 1992). Here are a couple sites with copies of this
document (if you do a Google search for 'rfc1321.txt' or 'rfc1321.html' you'll
find many others): http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc1321.txt,
ftp://ftp.isi.edu/in-notes/rfc1321.txt
or http://www.fourmilab.ch/md5/rfc1321.html
.
*Practically Unique: It is not possible to prove with 100% certainty that any two files could never have the same MD5 sum. As a matter of fact, some recent studies may indicate that it is possible. But the odds against it are so great, that for all practical purposes you might as well consider it as impossible. In order to understand some of the remarks that have been made about the security of MD5, it's necessary to cover some terminology from the field of cryptography:
First, MD5 is called a hash function. These functions can have an input (a message or digital data) of any length, but their output (called hash values) will always have the same fixed length for a given type of function. [Although it's part of its name (MD5), these values should be called a message digest only when they result from applying a hash function to a message; thus, the terms 'sum' or 'checksum' being more general are the ones often used in relation to the MD5 of computer programs and other files.]
Although I'm still interested
in providing an explanation for more of the technical aspects of MD5 in my own
words, for now you'll have to study the following pages on your own (along with
reading RFC 1321) for all of the details:
The RSA Security Crypto FAQ: http://www.rsasecurity.com/rsalabs/faq/.html
"RSA Laboratories' Frequently Asked Questions About Today's Cryptography,
Version 4.1"
The following sections of that FAQ pertain to MD5 or terminology
used about it:
Section: 3.6.6 "What
are MD2, MD4, and MD5?",
Section: 2.1.6 "What
is a hash function?" and in the
CryptoBytes
Technical Newsletter: Volume 2, No. 2 - Summer 1996 (Acrobat
.PDF, 357k)
"The Status of MD5 After a Recent Attack" by Hans Dobbertin.
Add to that another article in the RSA Laboratories Bulletin
News and advice from RSA Laboratories: Number 4 - November 12, 1996 (Acrobat
.PDF, 235k) "On Recent Results for MD2, MD4, and MD5"
by M.J.B. Robshaw of RSA Labs, and you'll probably have enough to get a very
good idea about MD5's usefulness even if it can't be proved with 100% certainty
that no two programs will have the same checksums.
1.
hkSFV For Win9x/2000/XP -- I
used Version 2.0.1 (build 84); dated, October 30, 2002. I'm hoping there will
be a new one soon that allows the creation of MD5 sums from a CD! (Old
link: http://www.big-o-software.com/products/hksfv/)
The original site no longer exists, but you can still download it from DOWNLOAD.COM
here:
New Link: Download
hkSFV from here!
2.
md5sum.exe (Note:
Caution! There are many programs by this name!) This
one is a command-line tool for Windows by bruce@gridpoint.com.
Link: http://www.etree.org/md5com.html
3. md5deep.exe (This
program can be downloaded as a compiled binary for Windows, or
as Source Code which compiles easily under Linux / FreeBSD / Solaris
and other *nix systems.) The Program was written by Special Agent Jesse
Kornblum of the United States Air Force
Office of Special Investigations.
Link: http://md5deep.sourceforge.net/
(Note: MD5deep is now at version 1.12, but we haven't had time
to check the commands and functionality; except for the fact that it now includes
SHA1, SHA256, Whirlpool and Tiger hashes as well.
SHA256 is the new FIPS standard!)
4.
md5.exe (
The same MD5 program found in my BCTEST.ZIP download; which is part
of my Basic Course in Forensics/Data Recovery.
It will function under either real 16-bit DOS or in a DOS-box under Windows.
) The Program is an
adaptation of Ron Rivest's original code in RFC1321 and as such its output does
not conform to the "Commonly Accepted Format" for an .MD5 file
as described above! However, it can still be very useful in finding the MD5
Sum for a small number of files especially if you must do so under
a Real (16-bit) DOS environment! Although I have an e-mail from
the 3L Ltd. company stating I can distribute this file with my own software,
I thought it only fair to reference them as the party who actually compiled
this program: http://www.shen.myby.co.uk/threel/tech/tools/md5.htm
[dead link!].
5.
More links to
be added at a later date.
The ("unofficial")
MD5 Homepage:
http://userpages.umbc.edu/~mabzug1/cs/md5/md5.html.
Last Updated: 28 OCT 2006 (28/10/2006).
The Starman's Realm Index Page