Sponsoring website: Emergency Boot CD



55AAH is the Wrong Way
to represent the Byte Sequence
55h followed by AAh on a PC


Copyright © 2009, 2011 by Daniel B. Sedory
NOT to be reproduced in any form without Permission of the Author!


Those who use the hex Word, 55AAH, to represent the byte sequence "55 AA" on a hard disk, floppy or in the memory of an IBM PC-compatible are either ignorant of the little-endian[1] nature of the entire x86 family of Intel® processors, or do not care enough about the truth to correct their errors[2]. Some of the correct ways to represent the hex byte sequence "55 AA" on a PC (as a hexadecimal Word) would be: 0xAA55, 0AA55H, AA55h. Of course, employing a phrase like "55h followed by AAh" may be the best way to describe this sequence for those who will never use an x86 assembler.



Anyone with a PC can prove this!

1. Using DEBUG[3] under MS-DOS or Windows™

A. From any IBM® or Microsoft® DOS prompt, you would simply type, debug, to enter this program.
B. Under a Windows™ OS, such as Windows™ XP, you can either open a CMD window first, then enter debug, or if it's easier for you: Click on the start button, then click on "Run...", type debug in the box and click on OK. This will pop up a virtual DEBUG window.

 

Note: DEBUG always uses hexadecimal for its bytes, words, etc., so there's never a need to add an H character after its numbers; in fact, doing so will cause an error.

Press the ENTER key after typing each line of green colored characters below, and DEBUG will display the gray characters (notes are shown in yellow):

-f 100 fff 0
-a
0B1E:0100 dw 1234
0B1E:0102 dw 55aa
0B1E:0104   (Just press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 00 00 00 00-00 00 00 00 00 00 00 00   4..U............
-a
0B1E:0104 dw 3412
0B1E:0106 dw aa55
0B1E:0108   (Press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 12 34 55 AA-00 00 00 00 00 00 00 00   4..U.4U.........
-a
0B1E:0108 db 55
0B1E:0109 db aa
0B1E:010A   (Press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 12 34 55 AA-55 AA 00 00 00 00 00 00   4..U.4U.U.......
-a
0B1E:010A db 12 34 55 aa
0B1E:010E   (Press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 12 34 55 AA-55 AA 12 34 55 AA 00 00   4..U.4U.U..4U...
-
-q  (Q as in Quit - closes the DEBUG window.)
Figure 1.

EXPLANATION:

The command: f 100 fff 0 places a zero-byte (00h) into every memory location at offsets 100h through FFFh. This will help to see what we'll view a bit further down on the screen by cleaning up any bytes left in memory from other programs.

a starts DEBUG's built-in assembler and dw 1234 is an assembly instruction (where dw stands for "data word") to place the hex Word, 1234h, at memory locations 100h and 101h; since DEBUG started the assembly process at 100h. Similarly, dw 55aa now directs DEBUG to place the hex word 55AAh at the memory locations which follow. Pressing ENTER on a blank line stops the assembly process.

The command: d 100 10f displays (dumps) a copy of all the bytes in memory from 100h through 10Fh on the screen. Look at the sequence of the bytes: "34 12 AA 55" and ask yourself, "Why are they in reverse order from the way I entered them?" The answer is: Hexadecimal Words of two or more bytes, are always stored in reverse order when run under a little-endian CPU; and all of Intel's x86 processors are little-endian in nature!

Therefore, if you want to see the byte sequence "12 34" or the one in question, "55 AA", what hexadecimal Words would you have to ENTER for these byte sequences? Well, after using: dw 3412 and dw aa55, and proceeding to dump the same area of memory (100h through 10Fh) on the screen, you can clearly see for yourself that the byte sequence "55 AA" is produced when we use the hexadecimal Word, aa55; commonly written as 0xAA55 or 0AA55h.

However, there's another way to produce this byte sequence in assembly. Rather than using a hex Word, you can use individual bytes instead: Here we start DEBUG's assembler one last time (a), then enter the lines:
db 55
db aa
(db stands for "data byte") and dump memory locations 100h through 10Fh on screen. As you can see, these two instructions create the byte sequence "55 AA" without any of the confusion sometimes caused when using hexadecimal Words.

Lastly, you could also create the hex byte values of "12 34 55 AA" in that order, using a single x86 assembly instruction:
db 12h, 34h, 55h, AAh
And again, if you want to do this in DEBUG, drop the "h" symbol, and simply ENTER:
db 12, 34, 55, AA
(or: db 12,34,55,aa,
 or: db 12 34 55 aa; as we did in the DEBUG window above).

ENTER "q" (for Quit) to close the DEBUG window.

2. Under a Linux OS

Most other PC operating systems, including Linux, do not have as easy to use a debugger as DEBUG. However, it's still possible to prove the correct way to refer to the byte sequence "55 AA" by creating a small binary program with NASM or even as (or some other assembler) using equivalent assembly instructions to those above, and then dumping its contents with the command: hexdump -Cv <filename> to view the order of the bytes stored in memory or on the hard disk. If any Linux users really need help doing this, please contact us. [We will try to add more about this in the future.]


Why is this error so prevalent?

Early Documentation used incorrect Hex Word for Signature

When the first IBM® Personal Computer™ became available in 1981, it had no hard disk and no concept of a boot record signature in its operating system. It wasn't until the introduction of IBM® Personal Computer™ DOS 2.00 in 1983 that our identifier "55 AA" appeared in boot sectors[4] on floppy diskettes. We are still seeking the earliest reference within any IBM® or Microsoft® documents to this Signature ID[5].

We have found the incorrect hex word (55AAH) in the "First Edition (April 1987)" of IBM's "Technical Reference (Programming Family)" for the "Disk Operating System Version 3.30" only in "Chapter 9. Fixed Disk Information"; located on following pages (quoting the sentences that include them for proper context):
"Signature: The last 2 bytes of the boot record (55AAH) are used as a signature to identify a valid boot record. Both this record and the partition boot records are required to contain the signature at offset 1FEH." (p. 9-9),
"Each extended volume contains an extended boot record in the first sector of the disk location assigned to it. This boot record contains the 55AAH signature id byte." (p. 9-12) and
"The last two bytes of the extended boot record (55AAH) are used as a signature to identify a valid boot record. Both this record and the logical drive boot records are required to contain the signature at offset 1FEH." (p. 9-14).

Although it's only a guess, we suspect this erroneous hex Word (55AAH) for the Signature ID became part of the IBM documentation simply because an employee assigned the task (and/or a manager who may have provided the final "corrections") was unfamiliar with the little-endian nature of the Intel processor used in the PC to realize their mistake. (Most, if not all, IBM computers up to that point had big-endian architecture.) Sadly, this error continued to be perpetuated within IBM's documentation for so long that many today believe this must be the correct way to refer to these Signature ID bytes on a PC. However, if someone takes a class in x86 assembly, using MASM[6], NASM[7], TASM[8] or any other x86 assembler, they will learn the truth.

Journalists and computer book authors must often rely upon the documentation from a manufacturer, so we're sure this is the main reason so many writer's comments about the signature ID contain this error; yet another reason for their reluctance to correct it. Unfortunately, Microsoft® made things worse by apparently deciding to come up with an excuse to actually consider this error as acceptable usage in their documentation! We have written a great deal more about their very confusing notes[9] below.

Erroneous Examples

If you search the Net for the terms "Sammes Jenkinson 55aah," you should find a link at Google Books to page 142 of the first edition of Forensic Computing, A Practitioner's Guide, by A. J. Sammes, Tony Sammes and Brian Jenkinson. Although the comments there and in Table 5.25 on page 143 relate to loading additional BIOS code, such as that for a video card, it still uses the same 2-byte identifier as an MBR sector, but it's located in the first two bytes of its code rather than the last two; as shown in this 64-byte dump of 0C0000h and following from a computer's memory:

Offset   0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0C0000  55 AA 5A E9 CB AA 30 30-30 30 30 30 30 30 30 30   U.Z...0000000000
0C0010  30 30 C4 17 E9 DD 16 BD-40 00 B0 0A 30 30 49 42   00......@...00IB
0C0020  4D 20 56 47 41 20 43 6F-6D 70 61 74 69 62 6C 65   M VGA Compatible
0C0030  20 42 49 4F 53 2E 20 03-5B 00 6B 00 79 00 8B C0    BIOS. .[.k.y...
Figure 2.

But twice on page 142, this book makes reference to these bytes using the incorrect hex Word, 55AAh. [The same errors were repeated in the book's second edition on pages 170-171.] Now click your way to page 145, where you will find at the end of the 2nd paragraph, the comment: "the final two bytes of the partition table are always of value 55h and aah, as shown." Unfortunately, the second edition, added a footnote here (#71 on page 174), which reads: "Many systems will refuse to boot if these two bytes are not set to 55aah." The publisher of this book (Springer) is a prestigious company, and its authors are quite knowledgeable. Since the authors discussed the topic of little-endianness at length in chapter 2 and elsewhere; including sections of chapter 5, why did they use an incorrect hexadecimal number when describing the two-byte identifier of 55h and AAh on a PC? We have been unable to contact the authors, so don't know for sure, but they probably did so simply because so many others have.


MORE EXAMPLES TO FOLLOW AT A LATER DATE.


Does using the wrong number even thousands of times make it
right?

No! AA55H is hexadecimal for 43,605 and 55AAH will always equal 21,930; whether these numbers are stored on a little-endian PC or a big-endian computer. A number is a number, period; whether it's binary, decimal or hex. And changing the type of computer (from one endian-type to another) a number is stored on, does not change one number into another, or our digital society would be in chaos! It only changes the way the representation of that number's bytes must be ordered in the different computer-type's media. When we see the characters "0x" followed by a string of digits, we must logically conclude they represent one, and only one, hexadecimal number; a number which we could also convert into equivalent decimal, octal or binary representations of the same number whenever we need to! If someone wishes to use different symbols to define some "little-endian hex representation on a PC" that's their choice. But pre-pending "0x" or appending an "h" or "H" to one or more hexadecimal digits is already a well-defined standard.

A signature though, of two or more bytes in length, must never be separated from the concept of byte order. In other words, once we define some string of bytes as a signature, we should maintain the same order of its bytes whether it's being stored on a big-endian or little-endian computer; yet this also means the same signature must be represented by different hexadecimal numbers on these different types of computers, since each one stores multi-byte numbers differently.

However, as we saw above, an Option ROM or Boot Record (or any other kind of) signature is not required to be expressed as a number. Technically, it's just an indicator or identification, and could just as well be handled as two separate bytes of data; for example, a 55h followed by AAh in this case (or: db 55h, AAh in x86 assembly code). The main reason these two bytes show up more often in the literature as a single Word (0xAA55 or 0AA55h), is simply because it's easier, and more logical, to compare two hex Words in a single line of code rather than using two separate lines of code for each byte. We'll examinine some actual assembly code below to clarify this.

Therefore, we'd advise any authors who need to describe a signature (apart from books on assembly language), to list each byte as it's found on some media or in memory, rather than attempting to use a hexadecimal number comprised of more than a single byte.

Assembly programmers do not make this mistake

Students who pass a class in x86 assembly (or even you, if you studied the DEBUG proof above) know the truth, and so did the early programmers at Microsoft® who used this "Signature ID" in their code. For example, here's a bit of commented MASM source code which checks if a valid boot record signature ID exists; followed by two lines of code that define where and what it should be:

; The partition table is located at offset 1BEh in the sector.
; The signature is located at offset 1FEh (= 55h, AAh or word AA55h).

TestBootSignature:
	cmp	WORD PTR [BX + 510],0aa55h	; Check for 55 AA sequence

#define BOOT_SIG_OFFSET    510
#define BOOT_SIG           0xaa55

The cmp instruction above, compares the WORD pointed to (PTR) at offsets 510 and 511 decimal to the hex word 0aa55h; which could have also been written as, cmp WORD PTR [BX + 510],BOOT_SIG, if BOOT_SIG had been previously defined as 0xAA55. Here's another snippet of code that checks for an Option ROM:

mov ax,0C000h   ; look for 2nd graphics card installed
mov ds,ax
cmp word ptr DS:[0],0AA55h ; DS:0 -- check for option ROM

You can clearly see we need to write the hex Word as "0AA55h" to check for the byte sequence "55 AA" of either an MBR's signature, or the beginning of a video card's BIOS (as we saw in Fig. 2 above). And one last example:

;---- Check for ROM signature ----
cmp   es:[di],0xAA55      ; Is the ROM signature present?
jne   NotOptionRom        ; If not, jump out.

If any early DOS programmers had used the erroneous 55AAh in their code, the operating system would not have functioned correctly! People who merely write about such things, rather than those who assemble working code, are the ones more likely to make mistakes that may never be corrected.

 


Footnotes

1[Return to Text] [Table 1.] The original IBM® Personal Computer ("PC") or any IBM PC-compatible (whether its CPU is an Intel®, AMD® or some other manufacturer's processor), has what's known as a little-endian architecture; as opposed, for example, to the big-endian architecture of the Motorola® processors in a Macintosh or PowerPC system. (Note: Many Apple™ OSX computers today are being sold with Intel® processors having little-endian architecture.) This Little-endian architecture refers to the order of the bytes found in memory (or on a storage medium) for hexadecimal numbers composed of more than a single byte; where the least-significant byte will occur at a lower (or preceding) location than its most-significant byte (or any more significant bytes in-between). Thus, the Hex number, 38DA75C6h, as seen in a PC's memory would occur as: C6 75 DA 38 (each byte will be in reverse order); whereas, a Big-endian system would have: 38 DA 75 C6. The creators of the first 16-bit Intel® processor used this little-endian order so the CPU could start working on arithmetic problems as soon as the first byte of a large number was accessed; basically the same as we do when adding the least-significant digits of two large numbers first, carrying any remainder over to the next column on the left, and finally adding together the most-significant digits, on the far left, last.

2[Return to Text] In all fairness, some authors may not become aware of this truth until after publishing a book. Thus a sizeable cost would be involved in correcting this error; and the publisher might not agree to changing it even in a second edition. But they should at least be honest enough to include it an errata note somewhere on the Internet. For those who never actually produce printed books; only create web pages or some other digital form of their works, hopefully they will be able to find enough time to correct this error if it appears in their work.

Some authors, however, may make excuses and "shift the blame" onto some large company, stating they won't make any changes unless that "well-known company" changes their documentation first!

3[Return to Text] If you like what you see here, we have a complete Guide to DEBUG here.

4[Return to Text] We have much more information about various MBR and Volume boot sectors here.

The following shows how we described the Boot Record Signature on our pages about Partition Tables:

Table 1. This should remove any confusion over what constitutes a valid Boot Record signature; sometimes called its Magic number, and often expressed as the 16-bit hexadecimal Word, 0xAA55 (or: AA55h) for the little-endian[1] PC.

Boot Record Signature
Offset (within sector)
Byte Values
( Hexadecimal )
Decimal
in Hex
510
1FE
55
511
1FF
AA

5[Return to Text] Please contact us here if you have any documentation you'd like to share with us.

6[Return to Text] Microsoft® Macro Assembler (MASM).

7[Return to Text] The Netwide Assembler (NASM) was originally written by Simon Tatham (with assistance from Julian Hall), but is now maintained by a team led by H. Peter Anvin. It's available as free software under GNU Lesser General Public License. See http://www.nasm.us/ for more information.

8[Return to Text] Turbo™ Assembler (TASM) by Borland; no longer maintained. Here's a FAQ about TASM.

9[Return to Text] Please read our note on Microsoft's very confusing way of referring to hex numbers on their web sites and in some of their documentation by using a nonstandard definition for little-endian Hex numbers! Although we were happy to see at least one author at microsoft.com who'd dropped the incorrect 0x usage and simply listed the hex bytes in order with spaces between them (as we've done on our own boot record pages), many others continue to reference the same illogical note under their tables which use erroneous hex numbers.

So, we're presenting another analogy here to illustrate how wrong that concept is. The note is often worded as follows: "Numbers larger than one byte are stored in little endian format or reverse-byte ordering. Little endian format is a method of storing a number so that the least significant byte appears first in the hexadecimal number notation." (Searching microsoft.com or any of its related sites for the phrase "Troubleshooting Disks and File Systems" should turn up some links to pages with this note).

First, we agee with everything in the note up to the words in red; that's where they err. The little-endianness of a PC has absolutely nothing to do with "hexadecimal number notation" as this note claims! If you want to enter the hex number 0x3F (63) as a double-word (i.e., as 4 bytes; quad-words are 8 bytes), you could simply write "dd 0x3F" and an assembler would know it needed to reserve 4 bytes for this data (it would store this as: "3F 00 00 00"). But you would never enter 0x3F000000 since that would be a completely different number!

Yet some Microsoft employees think it's correct to write something like this: "The sample value for the Relative Sectors field in the previous table, 0x3F000000, is a little endian representation of 0x0000003F. The decimal equivalent of this little endian number is 63. The sample value for Total Sectors is 0x41D31200, which represents 0x0012D341. Therefore, in decimal, there are 1,233,729 sectors in the volume."

To show just how ridiculous this is, I propose a similar "definition" to theirs, but dealing with money: "All currency values larger than $9 are stored on my PC in reverse-order. This format is a method of storing the least significant digit first in decimal dollar notation! So, the sample value for our Relative Taxes in the previous table, a full $1,056,964,608.00, is a little endian representation of $63.00. The hexadecimal equivalent of this little endian amount is 0x3F dollars. The sample value for a Microsoft employee's income is $26,292,224.00, which represents $233,729.00. Therefore, in hexadecimal, he earned 0x39101 dollars." That's really not much different than what I quoted from Microsoft above; saying that two different dollar amounts represent the same thing, or that two different hex numbers are the same, is an equally insane way to approach this topic.

Wouldn't it confuse you if various banks and lenders started using my "dollar notation" (which you can't even tell is any different than normal dollars by looking at the symbol) for their assets, revenues or savings with only a little footnote redefining what everyone normally takes for granted? Didn't something similar to that happen here in the USA when criminals "cooked the books"?! That's why it's best to use a standard correctly.


Created: February 25, 2009. (2009.02.25)
Updated: June 27, 2009. (2009.06.27)
Last Update: January 15, 2011. (2011.01.15)

You can write to us here: contact page (opens in a new window).

MBR and Boot Records Index

The Starman's Realm Index Page