Skip to content

Programming by Design

If you're not prepared to be wrong, you'll never come up with anything original. – Sir Ken Robinson

  • About
  • Java-PbD
  • C-PbD
  • ASM-PbD
  • Algorithms
  • Other

CISS-111 Project 2

Posted on January 18, 2023February 20, 2025 By William Jojo
CISS-111-Project
CISS-111 Project 2

Write a Java program to demonstrate using Vector, ArrayList, enumerations, regular expressions, and record. You will create a program to perform basic parsing of a 6502 assembly language program.


Learning outcomes

  • Working with files.
  • Building enumerations.
  • Working with enumeration methods.
  • Working with strings and parsing.
  • Working with Java records.
  • Exposure to predefined data structures.
  • Exposure to Vector, ArrayList.
  • Exposure to regular expressions.

Token Types

Recall that enumerations are special forms of classes and, therefore, are subclasses of Object. Using your knowledge of enumerations, create the following enumeration with the constants defined below:

TokenType.java
package hvcc.ciss111;

public enum TokenType {

    // instructions
    OPCODE,

    // punctuation
    LPAREN, COMMA, RPAREN,

    // numerics
    ADDRESS, CONST,

    // registers
    REGISTER
}

This enumeration centers around reserved words, some punctuation, numbers, and a couple of registers.

Now, write Java code to read a program written in 6502 assembly language. This will be an easy introduction to parsing. Everything is space-delimited so that you can use split() to help you out. The entire exercise is about identification.


Numbers

The following Java regex will help identify the values you may find in the program.

#?\$\p{XDigit}{2}(?:\p{XDigit}{2})?
               #? -> match a # zero or one time.
               \$ -> match exactly a $
    \p{XDigit}{2} -> match exactly 2 ({2}) hexadecimal digits (\p{XDigit})
(?:\p{XDigit}{2}) -> This one takes a little more to explain...

The final piece matches everything enclosed in the parentheses, the (?:). Inside of that, we are trying to, again, match exactly 2 ({2}) hexadecimal digits (\p{XDigit}). The key is the final question mark that says to do the parenthetic match zero or one time. The following will match this regex:

$89
#$89
$Ff8b
#$ff8b

Pasting the regex into your Java IDE (especially IntelliJ) will add more backslashes to get the escapes right. So, when you paste it into the IDE, it will look more like this:

DON’T COPY THIS ONE!
#?\\$\\p{XDigit}{2}(?:\\p{XDigit}{2})?

We will not be using the default namespace for this project. We will be creating our own based on the package details noted below.

Important Note!
Make sure you create the new package in the src folder (right click, select New, and then Package) and give the name hvcc.ciss111. Then, you are ready to start writing code by creating a class in that package!
Project2_lastname.java
package hvcc.ciss111;   // <<--- Hey! We're making a package!

import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.Vector;
import java.util.ArrayList;
import java.util.Arrays;
import java.io.FileReader;

import static hvcc.ciss111.TokenType.*;

public class Project2_lastname {
    public record Token (TokenType type, String value) {
        // Override the toString() method
    }

    public static void main(String[] args) throws FileNotFoundException {
        Scanner inFile;
        Vector<Token> tokens = new Vector<>();
        ArrayList<String> opcodes = new ArrayList<>(Arrays.asList("ADC", "AND", "ASL", "BCC", "BCS", "BEQ", "BIT", "BMI", "BNE", "BPL", "BRK", "BVC", "BVS", "CLC",
                "CLD", "CLI", "CLV", "CMP", "CPX", "CPY", "DEC", "DEX", "DEY", "EOR", "INC", "INX", "INY", "JMP",
                "JSR", "LDA", "LDX", "LDY", "LSR", "NOP", "ORA", "PHA", "PHP", "PLA", "PLP", "ROL", "ROR", "RTI",
                "RTS", "SBC", "SEC", "SED", "SEI", "STA", "STX", "STY", "TAX", "TAY", "TSX", "TXA", "TXS", "TYA"));

        /* YOUR CODE HERE! */

    }
}

Start by opening the file, setting up an EOF loop, and reading lines. Each line can be split() and then begin the identification process. You don’t need to create methods, but it may make things easier.

The ArrayList is there to help you lookup strings more quickly for opcode matching. With each token from the line, classify it and create a Token object and add it to the Vector.

When the file is exhausted, iterate over the Vector and write the objects to the screen.


Test Program

program.asm
JSR $0606
LDX #$02
LDA $10 , X
CMP $10
BNE $06b6
LDA $11 , X
CMP $11
BEQ $06bf
INX
INX
CPX $03
BEQ $06c2
JMP $06aa
JMP $0735
RTS
LDX $03
DEX
TXA
LDA $10 , X
STA $12 , X
DEX
BPL $06c7
LDA $02
LSR A
BCS $06dc
LSR A
BCS $06ef
LSR A
BCS $06f8
LSR A
BCS $070b
LDA $10
SEC
SBC #$20
STA $10
BCC $06e6
RTS
DEC $11
LDA #$01
CMP $11
BEQ $0716
RTS
INC $10
LDA #$1f
BIT $10
BEQ $0716
RTS
LDA $10
CLC
ADC #$20
STA $10
BCS $0702
RTS
INC $11
LDA #$06
CMP $11
BEQ $0716
RTS
DEC $10
LDA $10
AND #$1f
CMP #$1f
BEQ $0716
RTS
JMP $0735
LDY #$00
LDA $fe
STA ( $00 ) , Y
RTS
LDX $03
LDA #$00
STA ( $10 , X )
LDX #$00
LDA #$01
STA ( $10 , X )
RTS
LDX #$00
NOP
NOP
DEX
BNE $072f
RTS
LDX $03
DEX
TXA
LDA $10 , X
STA $12 , X
DEX
BPL $06c7
LDA $02
LSR A
BCS $06dc
LSR A
BCS $06ef
LSR A
BCS $06f8
LSR A
BCS $070b
LDA $10
SEC
SBC #$20
STA $10
BCC $06e6
RTS

Your output will go to the screen. Since we are using a record and the default output is messy, you will override the toString() method provided to look more like the output below.

Token type OPCODE, value = JSR
Token type ADDRESS, value = $0606
Token type OPCODE, value = LDX
Token type CONST, value = #$02
Token type OPCODE, value = LDA
Token type ADDRESS, value = $10
Token type COMMA
Token type REGISTER, value = X
Token type OPCODE, value = CMP
Token type ADDRESS, value = $10
Token type OPCODE, value = BNE
Token type ADDRESS, value = $06b6
Token type OPCODE, value = LDA
Token type ADDRESS, value = $11
Token type COMMA
Token type REGISTER, value = X
Token type OPCODE, value = CMP
Token type ADDRESS, value = $11
Token type OPCODE, value = BEQ
Token type ADDRESS, value = $06bf
Token type OPCODE, value = INX
Token type OPCODE, value = INX
Token type OPCODE, value = CPX
Token type ADDRESS, value = $03
Token type OPCODE, value = BEQ
Token type ADDRESS, value = $06c2
Token type OPCODE, value = JMP
Token type ADDRESS, value = $06aa
Token type OPCODE, value = JMP
Token type ADDRESS, value = $0735
Token type OPCODE, value = RTS
Token type OPCODE, value = LDX
Token type ADDRESS, value = $03
Token type OPCODE, value = DEX
Token type OPCODE, value = TXA
Token type OPCODE, value = LDA
Token type ADDRESS, value = $10
Token type COMMA
Token type REGISTER, value = X
Token type OPCODE, value = STA
Token type ADDRESS, value = $12
Token type COMMA
Token type REGISTER, value = X
Token type OPCODE, value = DEX
Token type OPCODE, value = BPL
Token type ADDRESS, value = $06c7
Token type OPCODE, value = LDA
Token type ADDRESS, value = $02
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $06dc
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $06ef
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $06f8
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $070b
Token type OPCODE, value = LDA
Token type ADDRESS, value = $10
Token type OPCODE, value = SEC
Token type OPCODE, value = SBC
Token type CONST, value = #$20
Token type OPCODE, value = STA
Token type ADDRESS, value = $10
Token type OPCODE, value = BCC
Token type ADDRESS, value = $06e6
Token type OPCODE, value = RTS
Token type OPCODE, value = DEC
Token type ADDRESS, value = $11
Token type OPCODE, value = LDA
Token type CONST, value = #$01
Token type OPCODE, value = CMP
Token type ADDRESS, value = $11
Token type OPCODE, value = BEQ
Token type ADDRESS, value = $0716
Token type OPCODE, value = RTS
Token type OPCODE, value = INC
Token type ADDRESS, value = $10
Token type OPCODE, value = LDA
Token type CONST, value = #$1f
Token type OPCODE, value = BIT
Token type ADDRESS, value = $10
Token type OPCODE, value = BEQ
Token type ADDRESS, value = $0716
Token type OPCODE, value = RTS
Token type OPCODE, value = LDA
Token type ADDRESS, value = $10
Token type OPCODE, value = CLC
Token type OPCODE, value = ADC
Token type CONST, value = #$20
Token type OPCODE, value = STA
Token type ADDRESS, value = $10
Token type OPCODE, value = BCS
Token type ADDRESS, value = $0702
Token type OPCODE, value = RTS
Token type OPCODE, value = INC
Token type ADDRESS, value = $11
Token type OPCODE, value = LDA
Token type CONST, value = #$06
Token type OPCODE, value = CMP
Token type ADDRESS, value = $11
Token type OPCODE, value = BEQ
Token type ADDRESS, value = $0716
Token type OPCODE, value = RTS
Token type OPCODE, value = DEC
Token type ADDRESS, value = $10
Token type OPCODE, value = LDA
Token type ADDRESS, value = $10
Token type OPCODE, value = AND
Token type CONST, value = #$1f
Token type OPCODE, value = CMP
Token type CONST, value = #$1f
Token type OPCODE, value = BEQ
Token type ADDRESS, value = $0716
Token type OPCODE, value = RTS
Token type OPCODE, value = JMP
Token type ADDRESS, value = $0735
Token type OPCODE, value = LDY
Token type CONST, value = #$00
Token type OPCODE, value = LDA
Token type ADDRESS, value = $fe
Token type OPCODE, value = STA
Token type LPAREN
Token type ADDRESS, value = $00
Token type RPAREN
Token type COMMA
Token type REGISTER, value = Y
Token type OPCODE, value = RTS
Token type OPCODE, value = LDX
Token type ADDRESS, value = $03
Token type OPCODE, value = LDA
Token type CONST, value = #$00
Token type OPCODE, value = STA
Token type LPAREN
Token type ADDRESS, value = $10
Token type COMMA
Token type REGISTER, value = X
Token type RPAREN
Token type OPCODE, value = LDX
Token type CONST, value = #$00
Token type OPCODE, value = LDA
Token type CONST, value = #$01
Token type OPCODE, value = STA
Token type LPAREN
Token type ADDRESS, value = $10
Token type COMMA
Token type REGISTER, value = X
Token type RPAREN
Token type OPCODE, value = RTS
Token type OPCODE, value = LDX
Token type CONST, value = #$00
Token type OPCODE, value = NOP
Token type OPCODE, value = NOP
Token type OPCODE, value = DEX
Token type OPCODE, value = BNE
Token type ADDRESS, value = $072f
Token type OPCODE, value = RTS
Token type OPCODE, value = LDX
Token type ADDRESS, value = $03
Token type OPCODE, value = DEX
Token type OPCODE, value = TXA
Token type OPCODE, value = LDA
Token type ADDRESS, value = $10
Token type COMMA
Token type REGISTER, value = X
Token type OPCODE, value = STA
Token type ADDRESS, value = $12
Token type COMMA
Token type REGISTER, value = X
Token type OPCODE, value = DEX
Token type OPCODE, value = BPL
Token type ADDRESS, value = $06c7
Token type OPCODE, value = LDA
Token type ADDRESS, value = $02
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $06dc
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $06ef
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $06f8
Token type OPCODE, value = LSR
Token type REGISTER, value = A
Token type OPCODE, value = BCS
Token type ADDRESS, value = $070b
Token type OPCODE, value = LDA
Token type ADDRESS, value = $10
Token type OPCODE, value = SEC
Token type OPCODE, value = SBC
Token type CONST, value = #$20
Token type OPCODE, value = STA
Token type ADDRESS, value = $10
Token type OPCODE, value = BCC
Token type ADDRESS, value = $06e6
Token type OPCODE, value = RTS

Submit the project to the Learning Management System as Project2_lastname.java.

Post navigation

❮ Previous Post: CISS-110 Project 4
Next Post: Your first example of a clever solution. ❯

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Copyright © 2018 – 2025 Programming by Design.