The Hex Dump – Programming by Design

(Updated December 18, 2024)

Overview

Hex dumps have been around for a very long time. Looking at binary is mind-numbing, and looking at decimal numbers is less than ideal. Using hexadecimal to show values is much easier on the eyes. With some experience, you can see the binary tucked inside, as each hex digit represents four binary digits.

The question is often raised: do hex dumps matter these days? Excellent question! One need look no further than Wireshark.

In the image, you can see the packet trace in the top half. In the bottom half, the left side shows the protocol breakdown, while the right side shows the hex dump of a selected packet that Wireshark captured.

So, yes, while they may seem archaic, they are still quite handy today. The image below describes the three components of a typical hex dump: the addresses, data, and text representation of the data.

Often, seeing the hex values is not enough. We may not be readily aware that they may be specifically chosen, represent printable text, and can be helpful in debugging.

Sign Extension

Depending on the language used to create a hex dump, you may have to be prepared for some events affecting your data. Our code will contain the necessary details for looking at memory as a contiguous block of bytes. The code uses several features to maintain the continuity of the data representation. Since Java has no unsigned types, like C, when we move between byte and int types, there is a problem with sign extension. Ordinarily, we embrace sign extension. That’s what allows us to move -17 from byte to short to int without any work on the part of the programmer. As you can see below, we have the -17 representations, achieved by taking the sign bit and repeating it on the left.

                              1110 1111    8-bit

                    1111 1111 1110 1111   16-bit

1111 1111 1111 1111 1111 1111 1110 1111   32-bit

This is also true when working with positive numbers. Below are the +17 representations.

                              0001 0001    8-bit

                    0000 0000 0001 0001   16-bit

0000 0000 0000 0000 0000 0000 0001 0001   32-bit

The Hex Dump Code

First, we look at code demonstrating the sign extension problem.

SignExt.java

public class SignExt {

    public static void main(String[] args) {
        byte b = (byte)0xff; // 255 unsigned, but -1 signed.

        // se represents the sign-extended value.
        int se = b;
        // nse prohibits sign extension by masking the higher-order bits.
        int nse = b & 0xff;

        // first, we show the decimal representations
        System.out.println("b is " + b + "\nse is " + se + "\nnse is " + nse);
        // then we show how the hex values enlighten the issue.
        System.out.printf("b is %02X%nse is %02X%nnse is %02X", b, se, nse);
    }
}

The output of the code is shown below.

b is -1
se is -1
nse is 255
b is FF
se is FFFFFFFF
nse is FF

When the first three lines are shown, it’s difficult to see the issue because we see -1 and 255 and wonder what’s happened. However, the following three lines show what has happened; the number of bytes has grown, and the representation has exceeded 8 bits. Remember that byte is a signed value by default. What we look to do is preserve the bit pattern, which has nothing to do with the sign.

The mask used for nse shows how we correct the issue.

    1111 1111 1111 1111 1111 1111 1111 1111     (-1 in two's complement form)
  & 0000 0000 0000 0000 0000 0000 1111 1111     (bitwise and with 0xff)
-------------------------------------------
    0000 0000 0000 0000 0000 0000 1111 1111     (preserved 8-bit representation)

The same works for any number, like our -17

    1111 1111 1111 1111 1111 1111 1110 1111     (-17 in two's complement form)
  & 0000 0000 0000 0000 0000 0000 1111 1111     (bitwise and with 0xff)
-------------------------------------------
    0000 0000 0000 0000 0000 0000 1110 1111     (preserved 8-bit representation)

This makes it sound like 0xff is some magical constant. It’s not. The value represents the lowest 8 bits as ones and all others as zeros. Nothing more.

Finally, we take this knowledge and move forward with our hex dump.

The hex dump code takes an array of bytes and displays them in an orderly grouping of hexadecimal values. This is done by reading the entire file into a byte array. Since the file represents an executable for a Commodore 64 computer, we know that the first two bytes are the little-endian representation of the loading address in memory.

Important Note!

The address of the example program in hex is 5000. The Commodore 64 CPU is a MOS-6510, which is a little-endian machine. In little-endian form, the first two bytes in the file are 00 50. Thus, the low-order, or little end of the number is listed first. In big-endian, the big end is listed first, so it would be 50 00.

Once we know that, it’s simple to capture the starting address; the remaining bytes are the program itself. The comments in the code below detail how the formatted output is achieved.

HexDump.java

import java.io.FileInputStream;
import java.io.IOException;

public class HexDump {

    public static void hexDump(byte[] bytes) {
        // we and with 0xff to mask off the bits when sign-extended.
        int start = (bytes[1] & 0xff) * 256 + (bytes[0] & 0xff);
        System.out.println(start);
        int pc = start;
        StringBuilder chars = new StringBuilder();

        for (int x = 2; x < bytes.length; x++) {
            // make it pretty
            if ( (pc - start) % 8 == 0 ) {
                // chars are built during each iteration, but
                // printed when we reach the end of the line.
                System.out.printf(" %s", chars);
                System.out.printf("%n%04X:", pc);
                chars.setLength(0);
            }
            // build the chars - only printable chars
            char c = (char)(bytes[x] & 0xff);
            if ( c >= 32 && c <= 127 )
                chars.append(c);
            else
                chars.append('.');
            // the next line uses a ternary operator to achieve the above.
            //chars.append(( c >= 32 && c <= 127 ) ? c : '.')
            System.out.printf(" %02X", bytes[x]);
            pc++;
        }

        // fix final row
        int last = (pc - start) % 8;
        for (int x = 0; x < 8-last; x++)
            System.out.print("   ");
        System.out.printf(" %s", chars);
    }

    public static void main(String[] args) throws IOException {
        FileInputStream in = new FileInputStream("/uploads/mult.prg");

        byte[] file = in.readAllBytes();

        in.close();
        hexDump(file);
    }
}

The output of hexDump() is shown below.

20480

5000: AE 5A 50 A9 00 20 CD BD .ZP.. ..
5008: A9 0D 20 D2 FF AE 5B 50 .. ...[P
5010: A9 00 20 CD BD A9 0D 20 .. .... 
5018: D2 FF A9 00 8D 57 50 8D .....WP.
5020: 58 50 8D 59 50 A2 08 4E XP.YP..N
5028: 5A 50 90 13 AD 58 50 18 ZP...XP.
5030: 6D 5B 50 8D 58 50 AD 59 m[P.XP.Y
5038: 50 6D 57 50 8D 59 50 0E PmWP.YP.
5040: 5B 50 2E 57 50 CA D0 DF [P.WP...
5048: AE 58 50 AD 59 50 20 CD .XP.YP .
5050: BD A9 0D 20 D2 FF 60 00 ... ..`.
5058: 00 00 0C 0A             ....

C Version

Here is a variation of the hex dump code written in C (C23 compatible).

hexdump.c

#include <stdio.h>
#include <stdlib.h>

void hexDump(const unsigned char bytes[], long length) {
    // we and with 0xff to mask off the bits when sign-extended.
    int start = bytes[1] * 256 + bytes[0];
    printf("%d\n", start);
    int pc = start, c = 0, x;
    unsigned char chars[10] = {0};

    for (x = 2; x < length; x++) {
        // make it pretty
        if ( (pc - start) % 8 == 0 ) {
            // chars are built during each iteration, but
            // printed when we reach the end of the line.
            printf(" %s", chars);
            printf("\n%04X:", pc);
            c = 0;
        }
        unsigned char ch = bytes[x];
        // build the chars - only printable chars
        chars[c++] = ( ch >= 32 && ch <= 127 ) ? ch : '.';

        printf(" %02X", bytes[x]);
        pc++;
    }

    // fix final row
    int last = (pc - start) % 8;
    chars[c] = '\0';
    for (x = 0; x < 8-last; x++)
        printf("   ");
    printf(" %s\n", chars);
}

int main(void)  {
    FILE *fp;
    const char *fname = "/uploads/mult.prg";
    unsigned char *file;

    // open file
    if ( (fp = fopen(fname, "r")) == NULL ) {
        fprintf(stderr, "Cannot open %s.\n", fname);
        return(1);
    }

    // find the file size
    fseek(fp, 0, SEEK_END);
    long len = ftell(fp);

    // allocate memory for the file - no error checking
    if ( (file = malloc(len + 1)) == NULL ) {
        fprintf(stderr, "Cannot allocate memory for file.\n");
        return(2);
    }

    // Read the file into the char array
    rewind(fp);
    fread(file, len, 1, fp);
    fclose(fp);

    hexDump(file, len);
    free(file);
}

This differs significantly from the Java version. Much of C involves primitive actions. For example, in Java, we could read the entire file with a call to readAllBytes(), and it even created the array. In C, we have to use fseek() to find the end, then ftell() to know where we are, and finally rewind() to get back to the beginning.

But it doesn't stop there since we also have to allocate memory with malloc(), fread() the data and remember to free() the memory after we're done calling hexDump().

One bright spot is using unsigned char. There's no need to mask things since the unsigned modifier stops the sign extension from happening.

Python Version

We present a Python version for anyone who wishes yet another version. This is free of warnings up to Python 3.12.x. There may be more clever ways to attack this problem, but the goal of this version was to try to align with the other versions so newcomers can compare them.

hexdump.py

def hexdump(byte_data):
    start = byte_data[1] * 256 + byte_data[0]
    pc = start
    print(start)
    chars = ""
    for x in range(2, len(byte_data)):
        # make it pretty
        if (pc - start) % 8 == 0:
            # chars are built during each iteration, but
            # printed when we reach the end of the line
            print(" %s" % chars)
            print("%04X" % pc, end='')
            chars = ""

        # build the chars = only printable chars
        c = byte_data[x]
        chars += chr(c) if 32 <= c <= 127 else '.'
        print(" %02X" % byte_data[x], end='')
        pc += 1  # no ++ op in python!
    # fix final row
    for i in range(0, 8 - (pc - start) % 8):
        print("   ", end='')
    print(" %s" % chars)


if __name__ == "__main__":
    # main function
    file_name = "/uploads/mult.prg"
    try:
        # with closes file when completed.
        with open(file_name, 'rb') as file:
            file_data = file.read()
    except FileNotFoundError:
        print(f"File not found: {file_name}.")
        exit(1)

    hexdump(file_data)

Note that this version only uses a ternary operator to build the string. What is incredible is using printf-style formatting in all versions.