(Updated December 16, 2024)
Table of Contents
Let’s take a 10,000-foot tour.
Our IDE has a form of compiler called an assembler. That is because it assembles the mnemonics, addresses, names, and such for a given CPU and translates it to machine code. Whereas a compiler translates a higher-level language (C, Java, etc.), which is highly abstracted, into an alternate language. That alternate language might be machine code, or it might be bytecode or some other intermediate form.
The assembler, like a compiler, translates source code from human consumable form (mnemonics, names, addresses, literals) to machine code. The machine code is the series of bytes in a specific order that represents the set of steps necessary to perform the programmed task.
Our assembler is primitive. It does not have many bells and whistles, but it will suffice for the work we must do to understand how the 6502 assembly language works.
We are writing our programs in a simple editor and then assembling them.
One well-known build system that has a very robust 6502 Assembler is CC65.
Let’s look at a simple program and identify the components necessary to write effective programs.
define CHROUT $ffd2
ldx #0 ; set index to zero
lda words,x ; load a letter from words
beq done ; if it's the zero, we're done
jsr CHROUT ; print the character
inx ; increment the index
bne print ; keep printing
brk ; end!
; data below this line
txt "The quick brown\nfox jumps over\n"
txt "the lazy dog.\n"
dcb 0
So, let’s break down what this program represents. In addition to the actual 6502 instructions, there are also some assembler-specific pseudo functions or pseudo operations. We will break this down to describe what is happening in the code.
- The
directive. This allows us to use the stringCHROUT
instead of having to remember and type$ffd2
. - We define the labels
, andwords
. Labels are useful ways to mark locations in the program without identifying specific memory locations. - Two directives are used to allocate memory for storage. The
directive indicates the quoted string should be stored as a series of bytes (without the quotes), whiledcb
(define constant byte) allows us to define a value to be placed in memory at the point of thedcb
in the program.
The Assembler
In the previous section, we quickly looked at a typical 6502 assembly language program. We are using the assembler to create very primitive programs. This means multiple things:
- We are writing code at a very low level. Writing programs will initially feel like a struggle because we need to be meticulous with our code.
- Programs will be textually longer than in higher-level languages. This is, of course, why the higher-level languages were designed.
- In addition to knowing the ins and outs of the CPU language, you need to understand how the assembler works and the additional language used to describe how the code should perform.
By way of example, here are two programs to print characters in a string:
Print a string (Java) | Print a string (6502) |
The extensive use of whitespace in the Java code attempts to line up the for
loop with and printf()
statement with the code that fetches and prints a character in the assembly language version.
The following Java code
for (int x = 0; x < s.length(); x++)
is equivalent to
ldx #0 ; set index to zero
lda words,x ; load a letter from words
beq done ; if it's the zero, we're done
jsr CHROUT ; print the character
inx ; increment the index
bne print ; keep printing
You may begin to understand why higher-level languages were developed. Of course, using higher-level languages also means you do not have to know every CPU on the planet to port your program to another platform. You would simply use the compiler on that system, and it would generate the code necessary for that machine.
In the previous chapter, like registers, we introduced categories of instructions. Now, we will begin to put those instructions to use to move data around and begin to solve problems. The example from above will be used to explain some finer details of the assembler while getting us acquainted with the language.
Here is the same program with significantly more detail in the comments.
; We can use CHROUT instead of $FFD2 throughout the program.
; This provides more readability in the code.
define CHROUT $ffd2
; Load the X register with 0 as a starting point.
ldx #0
; Set a label here, marking the top of our loop.
; This is indexed addressing (array). We take the address of words and
; add to it the current value of X. Then, fetch data from that address.
lda words,x
; If the value loaded into A is 0, this will set the Z flag
; and we take the branch to done.
beq done
; otherwise, print the character
; increment X, and as long as we've not overflowed, go to the top of the loop.
bne print
; Set another label we can branch to when finished.
; STOP the program.
; data below this line
; Another label that represents the memory location of our sentence.
; Use the TXT pseudo-op to direct the assembler to turn our string into
; a sequence of ASCII characters in memory.
txt "The quick brown\nfox jumps over\n"
txt "the lazy dog.\n"
; Use the DCB pseudo-op to mark the end of the string with the null character.
dcb 0
Now let's have a free-form discourse on the goings on...
Our IDE provides a handful of primitive library routines. The routine at address $FFD2 will print the character in the accumulator as ASCII. This is a throwback to the Commodore 64 days with a similar routine at the same location.
The define
directive doesn't allocate any memory. Rather, it defines a named replacement for a location in memory. This is incredibly useful, so we don't need to remember memory locations.
Our use of labels serves a similar purpose, but it goes a bit deeper. The labels are maintained by the assembler and represent the memory location where that label occurred in the source code. This is hugely beneficial. Before assemblers were created, it was up to the programmer to know where in memory they needed to branch. Subsequent changes to the source code meant these locations likely changed and had to be recomputed. The programmer was then responsible for changing every occurrence of this location in the code! You can imagine how miserable that would have been!
The use of TXT
and DCB
pseudo-ops will allocate memory. These, used with labels, provide the makings for some primitive variables, as in higher-level languages.
instruction calls a subroutine. This is just like calling a function or method in a higher-level language. The subroutine returns when it executes RTS
(not shown here).
The next bit is some of the magic of the processor.
lda words,x
We load the accumulator with a character from the string words
. This is done by taking the fixed address of words
and adding to it the value of the X
register. The resulting address is used to fetch the next character and store it in A
. This is one of the forms of absolute addressing.
Once we have the character, we must determine when to stop printing. This is where the zero byte in the DCB
comes in. The BEQ
instruction is shown below.
beq done
This branch instruction checks the zero flag - which is set if we load the accumulator with the zero value from the string. The limitation of branching on the 6502 is we can only move 127 bytes forward or 128 bytes back.
Finally, BRK
ends the program, and the control returns to the IDE.
There are some very nice features in this assembler within the IDE. The version we are using has a small graphical area based on 16 colors (see Chapter 1), a text output area, and a few subroutines for making some useful programs using the 6502 assembly language.
These are defined in the Notes
section of the IDE.
define SCINIT $ff81 ; initialize/clear screen
define CHRIN $ffcf ; input character from keyboard
define CHROUT $ffd2 ; output character to screen
define SCREEN $ffed ; get screen size
define PLOT $fff0 ; get/set cursor coordinates
The routine for CHROUT
we've already seen. What we'll look at next is CHRIN
. An example program is provided below to show some basic I/O.
define CHROUT $ffd2
define CHRIN $ffcf
define MAXNAME 32
define CR $0d
ldx #0 ; set index to zero
; ask them their name
lda query,x
beq endp
bne printQuery
; read in the name
ldx #0
cmp #0 ; no char returned
beq getName ; try again
; was is the return key?
cmp #CR
beq endg
; store the char
sta name, x
; stop if there are 32 chars!
bcc getName
; print a CR and terminate the name
lda #CR
lda #0
sta name, x
ldx #0
lda hello, x
beq endh
bne sayHi
ldx #0
lda name, x
beq endn
bne prName
lda end
txt "What is your name? "
dcb 0
txt "Hello, "
dcb 0
txt "!"
dcb 0
dsb 32
dcb 0
Now, it's ok that some or all of this might be unclear. These are some examples that utilize the assembler embedded in our IDE. All of what you see in this code will be explained in the upcoming chapters.
As you write programs, you'll write code that doesn't make sense to the assembler. Our IDE is not very robust, and the assembler is less so. It's not a bad assembler; it's just primitive in its abilities. Certain enhancements have been made to the original, but one very lacking area is error reporting. You will not get errors like you would in more advanced compilers.
So, what do you do? Research the instruction or pseudo-op and make sure it's being used correctly.