Sponsoring website: Emergency Boot CD




55AAH is the Wrong Way
to represent the Byte Sequence
55h followed by AAh on a PC


Copyright © 2009, 2011 by Daniel B. Sedory
NOT to be reproduced in any form without Permission of the Author!


Those who use the hex Word, 55AAH, to represent the hex byte sequence "55 AA" on a hard disk, floppy or in the memory of an IBM PC-compatible are either ignorant of the little-endian[1] nature of the entire x86 family of Intel® processors, or do not care enough about the truth to correct their errors[2]. Some of the correct ways to represent the hex byte sequence "55 AA" on a PC (as a hexadecimal Word) would be: 0xAA55, 0AA55H or AA55h. But employing a phrase like "55h followed by AAh" (or simply putting a space between them: "55h AAh") is probably the best way to describe this sequence for those who will never use an x86 assembler.



Anyone with a PC can prove this!

1. Using DEBUG[3] under MS-DOS or Windows™

A. From any IBM® or Microsoft® DOS prompt, you would simply type, debug, to enter this program.

B. Under a Windows™ OS, such as Windows™ XP, you can either open a CMD window first, then enter debug, or if it's easier for you: Click on the start button, then click on "Run...", type debug in the box and click on OK. This will pop up a virtual DEBUG window.

 

Note: DEBUG always uses hexadecimal for its bytes, words, etc., so there's never a need to add an H character after its numbers; in fact, doing so will cause an error.

Press the ENTER key after typing each line of green colored characters below, and DEBUG will display the gray characters (notes are shown in yellow; explanations are provided further below):

 
-f 100 fff 0
-a
0B1E:0100 dw 1234
0B1E:0102 dw 55aa
0B1E:0104   (Just press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 00 00 00 00-00 00 00 00 00 00 00 00   4..U............
-a
0B1E:0104 dw 3412
0B1E:0106 dw aa55
0B1E:0108   (Press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 12 34 55 AA-00 00 00 00 00 00 00 00   4..U.4U.........
-a
0B1E:0108 db 55
0B1E:0109 db aa
0B1E:010A   (Press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 12 34 55 AA-55 AA 00 00 00 00 00 00   4..U.4U.U.......
-a
0B1E:010A db 12 34 55 aa
0B1E:010E   (Press ENTER key.)
-d 100 10f
0B1E:0100  34 12 AA 55 12 34 55 AA-55 AA 12 34 55 AA 00 00   4..U.4U.U..4U...
-
-q  (Q as in Quit - closes the DEBUG window.)
 
Figure 1.

EXPLANATION:

The command: f 100 fff 0 places a zero-byte (00h) into every memory location at offsets 100h through FFFh. This will help you see any changes we make in memory a bit further down on the screen by cleaning up any byte values left in this memory area by other programs.

a starts DEBUG's built-in assembler and dw 1234 is an assembly instruction (where dw stands for "data word") to place the hex Word, 1234h, at memory locations 100h and 101h; since DEBUG started the assembly process at 100h. Similarly, dw 55aa now directs DEBUG to place the hex word 55AAh at the memory locations which follow. Pressing ENTER on a blank line stops the assembly process.

The command: d 100 10f displays (dumps) a copy of all the bytes in memory from 100h through 10Fh on the screen. Look at the sequence of the bytes: "34 12 AA 55" and ask yourself, "Why are they in reverse order from the way I entered them?" The answer is: Hexadecimal Words of two or more bytes, are always stored in reverse order when run under a little-endian CPU; and all of Intel's x86 processors are little-endian in nature!

Therefore, if you want to see the byte sequence "12 34" or the one in question, "55 AA", what hexadecimal Words would you have to ENTER for these byte sequences? Well, after using: dw 3412 and dw aa55, and proceeding to dump the same area of memory (100h through 10Fh) on the screen, you can clearly see for yourself that the byte sequence "55 AA" is produced when we use the hexadecimal Word, aa55; commonly written as 0xAA55 or 0AA55h.

However, there's another way to produce this byte sequence in assembly. Rather than using a hex Word, you can use individual bytes instead: So we start DEBUG's assembler once again (a), then enter the lines:
db 55
db aa
(db stands for "data byte") and dump memory locations 100h through 10Fh on screen. As you can see, these two instructions create the byte sequence "55 AA" without any of the confusion sometimes caused when using hexadecimal Words.

Lastly, you could also create the hex byte values of "12 34 55 AA" in that order, using a single x86 assembly instruction:
db 12h, 34h, 55h, AAh
And again, if you want to do this in DEBUG, drop the "h" symbol, and simply ENTER:
db 12, 34, 55, AA
(or: db 12,34,55,aa,
 or: db 12 34 55 aa; as we did in the DEBUG window above).

ENTER "q" (for Quit) to close the DEBUG window.

2. Under a Linux OS

Most other PC operating systems, including Linux, do not have as easy to use a debugger as MS-DEBUG, since it is an 'interactive' assembler and debugger (with a memory dump function) all in the same program. However, it's still possible to prove the correct way to refer to the hex byte sequence "55 AA" by creating a small binary file in linux with NASM or as (the GNU Assembler) or some other assembler, using assembly instructions equivalent to those above, then dump its contents with the command hexdump -Cv <filename> to view the actual sequence of the hex bytes stored in the file; and threfore, in memory (or on a hard disk). One could also use a GUI hex editor; such as Gnome's GHex, so 'hexdump' isn't mandatory. Here are some steps for doing so in an Ubuntu:

1. Open a "Terminal" (also called 'Command Line') window; on our Ubuntu 10.04 LTS install, under the "Applications" menu, choose "Accessories" and then Terminal. The following should also work equally well at any linux prompt; even a command-line only install (provided it has an assembler).
2. Enter at the prompt: whereis nasm hexdump; and if both are already installed, you will see something similar to:

   nasm: /usr/bin/nasm /usr/share/man/man1/nasm.1.gz
   hexdump: /usr/bin/hexdump /usr/share/man/man1/hexdump.1.gz
3. If one or both are missing, then you'll only see "nasm:" and/or "hexdump:" and will need to install the missing program(s).
4. Unlike MS-DEBUG, where all digits in a data declaration (db or dw) are always hexadecimal, NASM interprets plain digits as being decimal. So in NASM, we must either prepend "0x" or simply append an "h" to indicate a hexadecimal data byte (db; 8 bits), word (dw; 16 bits) or double word (dd; 32 bits). But Note: Any hex number that begins with A through F must have a zero (0) prepended to it.
5. If you're running a Gnome Linux install, you can use gedit to create the following plain text file and save it as "test.asm"; otherwise use nano, vi or whatever other text editor you are familiar with:
   db      255     ; NASM converts 255 to the hex byte FF, proving
                   ;      plain digits are taken as decimal values.
   dd      0       ; A buffer of 4 zero bytes.

   dw      55aah   ; This is how some books and articles refer to the
                   ;   MBR's Signature ID. But look at the byte order
                   ;   this creates in the hexdump below of 'test'.
                   ;   The assembler will reverse it into: AA 55.
   dd      0       ; 4 zero bytes.

   dw      1234h   ; As noted above, this word will appear as: 34 12
                   ; when you dump the binary file 'test'.
   dd      0       ; 4 zero bytes.

   db      55h,0aah  ; You will see the hex byte 55 followed by AA (in
                     ;    that order). A-F requires a leading zero(0).
   dd      0         ; 4 zero bytes.

   dw      43605   ; Decimal 43605 = 0aa55 hex. But a PC stores this
                   ;  number as 55 aa as seen in the hex dump below.
   dd      0       ; 4 zero bytes.

   dw      0aa55h  ; The correct way to write the MBR Signature ID
                   ;   when followed by an 'h'.
   dd      0       ; 4 zero bytes.

   dw      0xaa55  ; Another correct way to write these bytes.
And remember to save this as "test.asm". We've added a number of 4-byte buffer/padded areas of zeros between the important bytes; making it easier for you to see them in the hex dump. Anything written after a semi-colon (;) is ignored by the assembler; these are comments.
6. Now enter the commmand: nasm test.asm to assemble it; creating the binary file "test".
7. Lastly, either open the binary file, test, in a hex editor, or enter the command: hexdump -Cv test and you should see the following:

 

00000000  ff 00 00 00 00 aa 55 00  00 00 00 34 12 00 00 00  |......U....4....|
00000010  00 55 aa 00 00 00 00 55  aa 00 00 00 00 55 aa 00  |.U.....U.....U..|
00000020  00 00 00 55 aa                                    |...U.|
 

(If you have any problems with the steps above, please contact us.)

 


Why is this error so prevalent?

Early Documentation used incorrect Hex Word for Signature

When the first IBM® Personal Computer™ became available in 1981, it had no hard disk and no concept of a boot record signature in its operating system. It wasn't until the introduction of IBM® Personal Computer™ DOS 2.00 in 1983 that our identifier "55 AA" appeared in boot sectors[4] on floppy diskettes. We are still seeking the earliest reference within any IBM® or Microsoft® documents to this Signature ID[5].

We have found the incorrect hex word (55AAH) in the "First Edition (April 1987)" of IBM's "Technical Reference (Programming Family)" for the "Disk Operating System Version 3.30" only in "Chapter 9. Fixed Disk Information"; located on following pages (quoting the sentences that include them for proper context):

"Signature: The last 2 bytes of the boot record (55AAH) are used as a signature to identify a valid boot record. Both this record and the partition boot records are required to contain the signature at offset 1FEH." (p. 9-9),
"Each extended volume contains an extended boot record in the first sector of the disk location assigned to it. This boot record contains the 55AAH signature id byte." (p. 9-12) and
"The last two bytes of the extended boot record (55AAH) are used as a signature to identify a valid boot record. Both this record and the logical drive boot records are required to contain the signature at offset 1FEH." (p. 9-14).

    Although it's only a guess, we suspect this erroneous hex Word (55AAH) for the Signature ID became part of the IBM documentation simply because an employee assigned the task (and/or a manager who may have provided the final "corrections") was unfamiliar with the little-endian nature of the Intel processor used in the PC to realize they were making a mistake. (Remember: Most, if not all, IBM computers up to that point had a big-endian architecture.) Sadly, this error continued to be perpetuated within IBM's documentation for so long, that many today believe this must be the correct way to refer to the Signature ID bytes on a PC. However, if someone takes a class in x86 assembly, using MASM[6], NASM[7], TASM[8] or any other x86 assembler, they will quickly learn the truth.

Journalists and computer book authors must often rely upon the documentation from a manufacturer, so we're sure this is the main reason so many writer's comments about the signature ID contain this error; yet another reason for their reluctance to correct it. Unfortunately, Microsoft® made things worse by apparently deciding to come up with an excuse to actually consider this error as acceptable usage in their documentation! We have written a great deal more about their very confusing notes[9] below.

Erroneous Examples

If you search the Net for the terms "Sammes Jenkinson 55aah," you should find a link to Google Books for page 170 of the second edition (page 142; 1st edition) of Forensic Computing, A Practitioner's Guide, by A. J. Sammes, Tony Sammes and Brian Jenkinson. Although the comments there and in Table 5.25 (page 143 of 1st ed.) relate to loading additional BIOS code, such as that for a video adapter card or some 'Expansion ROM', these 'Add-Ins' still use the same 2-byte identifier found in the MBR sector, except it's located in the first two bytes of its code rather than at the end; as you can see in this 64-byte dump of location 0C0000h and following from a PC's Memory:

Offset   0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0C0000  55 AA 5A E9 CB AA 30 30-30 30 30 30 30 30 30 30   U.Z...0000000000
0C0010  30 30 C4 17 E9 DD 16 BD-40 00 B0 0A 30 30 49 42   00......@...00IB
0C0020  4D 20 56 47 41 20 43 6F-6D 70 61 74 69 62 6C 65   M VGA Compatible
0C0030  20 42 49 4F 53 2E 20 03-5B 00 6B 00 79 00 8B C0    BIOS. .[.k.y...
Figure 2.

Twice on page 170 and again on 171 (in footnote 66), this book makes reference to these bytes using the incorrect hex Word, 55AAh. [These same errors began in the book's first edition (pages 142-143).]

On page 174, at the end of the 3rd paragraph, we find the clear comment: "the final two bytes of the partition table are always of value 55h and aah, as shown." Unfortunately, this second edition, added a footnote (# 71) just before this, which reads: "Many systems will refuse to boot if these two bytes are not set to 55aah." Adding more incorrect occurences. The publisher of this book (Springer) is a prestigious company, and its authors are quite knowledgeable. Since the authors discussed the topic of little-endianness at length in chapter 2 and elsewhere; including sections of chapter 5. So why did they use an incorrect hexadecimal number when describing the two-byte identifier of 55h and AAh on a PC? We have been unable to contact the authors, so don't know for sure, but they probably did so simply because so many others have.


MORE EXAMPLES TO FOLLOW AT A LATER DATE.


Does using the wrong number even thousands of times make it
right?

No!
AA55H is hexadecimal for 43,605 and 55AAH will always equal 21,930; whether you simply write these numbers on a piece of paper, or store them on either a little-endian PC or a big-endian computer, they are not the same number. A number is a number, period; whether it's binary, decimal or hex. And changing the type of computer (from one endian-type to another) that a number is stored on, does not change one number into another, or our digital society would be in chaos! It merely changes how the representation of a number's bytes must be ordered in a storage media of the different computer types. When we see the characters "0x" followed by a string of digits, we must logically conclude they represent one, and only one, hexadecimal number; a number which we could also convert into equivalent decimal, octal or binary representations of the same number whenever we need to! If someone wishes to use different symbols to define some "little-endian hex representation on a PC" that's their choice. But pre-pending "0x" or appending an "h" or "H" to one or more hexadecimal digits is already a well-defined standard.

A signature though, of two or more bytes in length, must never be separated from the concept of byte order. In other words, once we define some string of bytes as a signature, we should maintain the same order of its bytes whether it's being stored on a big-endian or little-endian computer; yet this also means the same signature would need to be represented by different hexadecimal numbers on these two different types of computers, since each one stores multi-byte numbers differently. However, because programmers and software engineers already know this, signatures in common file types are handled as character strings in PC code. Characters, as opposed to numbers, always remain in the same order in which they are displayed or given in the code, whether found on big- or little-endian storage media! For example, if you enter the following into either DEBUG or assemble it under NASM or some other assembler, the byte order will never change:

   db    'This is a character string.'
   db    'NTFS'   ; These 4 chars. will always remain in this order.
   db    'U',0xaa ; Another way to enter the MBR Signature ID,
                  ;    since 55h is the ASCII code for 'U'.
                  ;    Unfortunately, there is no standard
                  ;    ASCII character for 0xaa.

And remember, as we've seen above, an Option ROM or Boot Record (or any other kind of) signature is not required to be expressed as a number. Technically, it's just an indicator or identification, and could just as well be handled as two separate data bytes; in this case, a 55h followed by AAh (or: db 55h, AAh in x86 assembly code). The main reason these two bytes show up more often in the literature as a single Word (0xAA55 or 0AA55h), is simply because it's easier, and more logical, to compare two hex Words in a single line of code rather than using two separate lines of code for each byte. We'll examinine some actual assembly code below to clarify this.

Therefore, we'd advise any authors who need to describe a signature (apart from books on assembly language), to list each byte as it's found on some media or in memory, rather than attempting to use a hexadecimal number comprised of more than a single byte.

Assembly programmers do not make this mistake

Students who pass a class in x86 assembly (or even you, if you studied the proofs above) know the truth, and so did the early programmers at Microsoft® who used this "Signature ID" in their code. For example, here's a bit of commented MASM source code which checks if a valid boot record signature ID exists; followed by two lines of code that define where and what it should be:

; The partition table is located at offset 1BEh in the sector.
; The signature is located at offset 1FEh (= 55h, AAh or word AA55h).

TestBootSignature:
	cmp	WORD PTR [BX + 510],0aa55h	; Check for 55 AA sequence

#define BOOT_SIG_OFFSET    510
#define BOOT_SIG           0xaa55

The cmp instruction above, compares the WORD pointed to (PTR) at offsets 510 and 511 decimal to the hex word 0aa55h; which could have also been written as, cmp WORD PTR [BX + 510],BOOT_SIG, if BOOT_SIG had been previously defined as 0xAA55. Here's another snippet of code that checks for an Option ROM:

mov ax,0C000h   ; look for 2nd graphics card installed
mov ds,ax
cmp word ptr DS:[0],0AA55h ; DS:0 -- check for option ROM

You can clearly see we need to write the hex Word as "0AA55h" to check for the byte sequence "55 AA" of either an MBR's signature, or the beginning of a video card's BIOS (as we saw in Fig. 2 above). And one last example:

;---- Check for ROM signature ----
cmp   es:[di],0xAA55      ; Is the ROM signature present?
jne   NotOptionRom        ; If not, jump out.

If any early DOS programmers had used the erroneous 55AAh in their code, the operating system would not have functioned correctly! People who merely write about such things, rather than those who assemble working code, are the ones more likely to make mistakes that may never be corrected.

 


Footnotes

1[Return to Text] [Table 1.] The original IBM® Personal Computer ("PC") or any IBM PC-compatible (whether its CPU is an Intel®, AMD® or some other manufacturer's processor), has what's known as a little-endian architecture; as opposed, for example, to the big-endian architecture of the Motorola® processors in a Macintosh or PowerPC system. (Note: Many Apple™ OSX computers today are being sold with Intel® processors having little-endian architecture.) This Little-endian architecture refers to the order of the bytes found in memory (or on a storage medium) for hexadecimal numbers composed of more than a single byte; where the least-significant byte will occur at a lower (or preceding) location than its most-significant byte (or any more significant bytes in-between). Thus, the Hex number, 38DA75C6h, as seen in a PC's memory would occur as: C6 75 DA 38 (each byte, but not the bits in a byte, when moving from a lower to higher location in memory, would be found in reverse order); whereas, a Big-endian system would have: 38 DA 75 C6. The creators of the first 16-bit Intel® processor used this little-endian order so the CPU could start working on arithmetic problems as soon as the first byte of a large number was accessed; basically the same as we do when adding the least-significant digits of two large numbers first, carrying any remainder over to the next column on the left, and finally adding together the most-significant digits, on the far left, last.

2[Return to Text] In all fairness, some authors may not become aware of this truth until after publishing a book. Thus a sizeable cost would be involved in correcting this error; and the publisher might not agree to changing it even in a second edition. But they should at least be honest enough to include it an errata note somewhere on the Internet. For those who never actually produce printed books; only create web pages or some other digital form of their works, hopefully they will be able to find enough time to correct this error if it appears in their work.

Some authors, however, may make excuses and "shift the blame" onto some large company, stating they won't make any changes unless that "well-known company" changes their documentation first! Obviously, they're not real teachers nor do programming!

3[Return to Text] If you like what you see above, we have a complete Guide to DEBUG here.

Note: If you use a 64-bit version of the Windows™ 7 (or later) OS, you will not find DEBUG at your Command prompt; it was removed from all 64-bit versions (it was in the first 32-bit version of Win 7). However, there have been a number of DOS emulators and even free 'Virtual PC' programs which would allow you to install DOS or even an earlier version of Windows; such as XP, inside your current 64-bit OS, in order to prove this!

4[Return to Text] We have much more information about various MBR and Volume boot sectors here.

The following shows how we described the Boot Record Signature on our pages about Partition Tables:

Table 1. This should remove any confusion over what constitutes a valid Boot Record signature; sometimes called its Magic number, and often expressed as the 16-bit hexadecimal Word, 0xAA55 (or: AA55h) for the little-endian[1] PC.

Boot Record Signature
Offset (within sector)
Byte Values
( Hexadecimal )
Decimal
in Hex
510
1FE
55
511
1FF
AA

5[Return to Text] Please contact us here if you have any documentation you'd like to share with us.

6[Return to Text] Microsoft® Macro Assembler (MASM).

7[Return to Text] The Netwide Assembler (NASM) was originally written by Simon Tatham (with assistance from Julian Hall), but is now maintained by a team led by H. Peter Anvin. It's available as free software under GNU Lesser General Public License. See http://www.nasm.us/ for more information.

8[Return to Text] Turbo™ Assembler (TASM) by Borland; no longer maintained. Here's a FAQ about TASM.

9[Return to Text] Please read our note on Microsoft's very confusing way of referring to hex numbers on their web sites and in some of their documentation by using a nonstandard definition for little-endian Hex numbers! Although we were happy to see at least one author at microsoft.com who'd dropped the incorrect 0x usage and simply listed the hex bytes in order with spaces between them (as we've done on our own boot record pages), many others continue to reference the same illogical note under their tables which use erroneous hex numbers.

So, we're presenting another analogy here to illustrate how wrong that concept is. The note is often worded as follows: "Numbers larger than one byte are stored in little endian format or reverse-byte ordering. Little endian format is a method of storing a number so that the least significant byte appears first in the hexadecimal number notation." (Searching microsoft.com or any of its related sites for the phrase "Troubleshooting Disks and File Systems" should turn up some links to pages with this note).

First, we agee with everything in the note up to the words in red; that's where they err. The little-endianness of a PC has absolutely nothing to do with "hexadecimal number notation" as this note claims! If you want to enter the hex number 0x3F (63) as a double-word (i.e., as 4 bytes; quad-words are 8 bytes), you could simply write "dd 0x3F" and an assembler would know it needed to reserve 4 bytes for this data (it would store this as: "3F 00 00 00"). But you would never enter 0x3F000000 since that would be a completely different number!

Yet some Microsoft employees think it's correct to write something like this: "The sample value for the Relative Sectors field in the previous table, 0x3F000000, is a little endian representation of 0x0000003F. The decimal equivalent of this little endian number is 63. The sample value for Total Sectors is 0x41D31200, which represents 0x0012D341. Therefore, in decimal, there are 1,233,729 sectors in the volume."

To show just how ridiculous this is, I propose a similar "definition" to theirs, but dealing with money: "All currency values larger than $9 are stored on my PC in reverse-order. This format is a method of storing the least significant digit first in decimal dollar notation! So, the sample value for our Relative Taxes in the previous table, a full $1,056,964,608.00, is a little endian representation of $63.00. The hexadecimal equivalent of this little endian amount is 0x3F dollars. The sample value for a Microsoft employee's income is $26,292,224.00, which represents $233,729.00. Therefore, in hexadecimal, he earned 0x39101 dollars." That's really not much different than what I quoted from Microsoft above; saying that two different dollar amounts represent the same thing, or that two different hex numbers are the same, is an equally insane way to approach this topic.

Wouldn't it confuse you if various banks and lenders started using my "dollar notation" (which you can't even tell is any different than normal dollars by looking at the symbol) for their assets, revenues or savings with only a little footnote redefining what everyone normally takes for granted? Didn't something similar to that happen here in the USA when criminals "cooked the books"?! That's why it's best to use a standard correctly.


Created: February 25, 2009. (2009.02.25)
Updated: June 27, 2009 (2009.06.27), January 15, 2011 (2011.01.15).
Last Update: December 7, 2014 (2014.12.07)

You can write to us here: contact page (opens in a new window).

MBR and Boot Records Index

The Starman's Realm Index Page