CISS-111 Project 2
Write a Java program to demonstrate using Vector
, ArrayList, Arrays enumerations, regular expressions, HashMap
, and record
. You will create a program to perform basic parsing of a 6502 assembly language program.
Learning outcomes
- Working with files.
- Building enumerations.
- Working with enumeration methods.
- Working with strings and parsing.
- Working with Java records.
- Exposure to predefined data structures.
- Exposure to Vector, ArrayList.
- Exposure to regular expressions.
Token Types
Recall that enumerations are special forms of classes and, therefore, are subclasses of Object
. Using your knowledge of enumerations, create the following enumeration with the constants defined below:
package hvcc.ciss111;
public enum TokenType {
// instructions
OPCODE,
// punctuation
LPAREN, COMMA, RPAREN,
// numerics
ADDRESS, CONST,
// registers
REGISTER
}
This enumeration centers around reserved words, some punctuation, numbers, and a couple of registers.
Now, write Java code to read a program written in 6502 assembly language. This will be an easy introduction to parsing. Everything is space-delimited so that you can use split()
to help you out. The entire exercise is about identification.
Numbers
The following Java regex will help identify the values you may find in the program.
#?\$\p{XDigit}{2}(?:\p{XDigit}{2})?
#? -> match a # zero or one time. \$ -> match exactly a $ \p{XDigit}{2} -> match exactly 2 ({2}) hexadecimal digits (\p{XDigit}) (?:\p{XDigit}{2}) -> This one takes a little more to explain...
The final piece matches everything enclosed in the parentheses, the (?:). Inside of that, we are trying to, again, match exactly 2 ({2}) hexadecimal digits (\p{XDigit}). The key is the final question mark that says to do the parenthetic match zero or one time. The following will match this regex:
$89 #$89 $Ff8b #$ff8b
Pasting the regex into your Java IDE (especially IntelliJ) will add more backslashes to get the escapes right. So, when you paste it into the IDE, it will look more like this:
#?\\$\\p{XDigit}{2}(?:\\p{XDigit}{2})?
We will not be using the default namespace for this project. We will be creating our own based on the package details noted below.
src
folder (right click, select New, and then Package) and give the name hvcc.ciss111
. Then, you are ready to start writing code by creating a class in that package!package hvcc.ciss111; // <<--- Hey! We're making a package!
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.Vector;
import java.util.ArrayList;
import java.util.Arrays;
import java.io.FileReader;
import static hvcc.ciss111.TokenType.*;
public class Project2_lastname {
public record Token (TokenType type, String value) {
// Override the toString() method
}
public static void main(String[] args) throws FileNotFoundException {
Scanner inFile;
Vector<Token> tokens = new Vector<>();
String[] opstring = {"ADC", "AND", "ASL", "BCC", "BCS", "BEQ", "BIT", "BMI", "BNE", "BPL", "BRK", "BVC", "BVS", "CLC",
"CLD", "CLI", "CLV", "CMP", "CPX", "CPY", "DEC", "DEX", "DEY", "EOR", "INC", "INX", "INY", "JMP",
"JSR", "LDA", "LDX", "LDY", "LSR", "NOP", "ORA", "PHA", "PHP", "PLA", "PLP", "ROL", "ROR", "RTI",
"RTS", "SBC", "SEC", "SED", "SEI", "STA", "STX", "STY", "TAX", "TAY", "TSX", "TXA", "TXS", "TYA"};
ArrayList<String> opcodes = new ArrayList<>(Arrays.asList(opstring));
/* YOUR CODE HERE! */
}
}
Start by opening the file, setting up an EOF
loop, and reading lines. Each line can be split()
and then begin the identification process. You don’t need to create methods, but it may make things easier.
The ArrayList
is there to help you lookup strings more quickly for opcode matching. With each token from the line, classify it and create a Token
object and add it to the Vector
.
When the file is exhausted, iterate over the Vector
and write the objects to the screen.
Test Program
JSR $0606
LDX #$02
LDA $10 , X
CMP $10
BNE $06b6
LDA $11 , X
CMP $11
BEQ $06bf
INX
INX
CPX $03
BEQ $06c2
JMP $06aa
JMP $0735
RTS
LDX $03
DEX
TXA
LDA $10 , X
STA $12 , X
DEX
BPL $06c7
LDA $02
LSR A
BCS $06dc
LSR A
BCS $06ef
LSR A
BCS $06f8
LSR A
BCS $070b
LDA $10
SEC
SBC #$20
STA $10
BCC $06e6
RTS
DEC $11
LDA #$01
CMP $11
BEQ $0716
RTS
INC $10
LDA #$1f
BIT $10
BEQ $0716
RTS
LDA $10
CLC
ADC #$20
STA $10
BCS $0702
RTS
INC $11
LDA #$06
CMP $11
BEQ $0716
RTS
DEC $10
LDA $10
AND #$1f
CMP #$1f
BEQ $0716
RTS
JMP $0735
LDY #$00
LDA $fe
STA ( $00 ) , Y
RTS
LDX $03
LDA #$00
STA ( $10 , X )
LDX #$00
LDA #$01
STA ( $10 , X )
RTS
LDX #$00
NOP
NOP
DEX
BNE $072f
RTS
LDX $03
DEX
TXA
LDA $10 , X
STA $12 , X
DEX
BPL $06c7
LDA $02
LSR A
BCS $06dc
LSR A
BCS $06ef
LSR A
BCS $06f8
LSR A
BCS $070b
LDA $10
SEC
SBC #$20
STA $10
BCC $06e6
RTS
Your output will go to the screen. Since we are using a record and the default output is messy, you will override the toString()
method provided to look more like the output below.
Token type OPCODE, value = JSR Token type ADDRESS, value = $0606 Token type OPCODE, value = LDX Token type CONST, value = #$02 Token type OPCODE, value = LDA Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type OPCODE, value = CMP Token type ADDRESS, value = $10 Token type OPCODE, value = BNE Token type ADDRESS, value = $06b6 Token type OPCODE, value = LDA Token type ADDRESS, value = $11 Token type COMMA Token type REGISTER, value = X Token type OPCODE, value = CMP Token type ADDRESS, value = $11 Token type OPCODE, value = BEQ Token type ADDRESS, value = $06bf Token type OPCODE, value = INX Token type OPCODE, value = INX Token type OPCODE, value = CPX Token type ADDRESS, value = $03 Token type OPCODE, value = BEQ Token type ADDRESS, value = $06c2 Token type OPCODE, value = JMP Token type ADDRESS, value = $06aa Token type OPCODE, value = JMP Token type ADDRESS, value = $0735 Token type OPCODE, value = RTS Token type OPCODE, value = LDX Token type ADDRESS, value = $03 Token type OPCODE, value = DEX Token type OPCODE, value = TXA Token type OPCODE, value = LDA Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type OPCODE, value = STA Token type ADDRESS, value = $12 Token type COMMA Token type REGISTER, value = X Token type OPCODE, value = DEX Token type OPCODE, value = BPL Token type ADDRESS, value = $06c7 Token type OPCODE, value = LDA Token type ADDRESS, value = $02 Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $06dc Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $06ef Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $06f8 Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $070b Token type OPCODE, value = LDA Token type ADDRESS, value = $10 Token type OPCODE, value = SEC Token type OPCODE, value = SBC Token type CONST, value = #$20 Token type OPCODE, value = STA Token type ADDRESS, value = $10 Token type OPCODE, value = BCC Token type ADDRESS, value = $06e6 Token type OPCODE, value = RTS Token type OPCODE, value = DEC Token type ADDRESS, value = $11 Token type OPCODE, value = LDA Token type CONST, value = #$01 Token type OPCODE, value = CMP Token type ADDRESS, value = $11 Token type OPCODE, value = BEQ Token type ADDRESS, value = $0716 Token type OPCODE, value = RTS Token type OPCODE, value = INC Token type ADDRESS, value = $10 Token type OPCODE, value = LDA Token type CONST, value = #$1f Token type OPCODE, value = BIT Token type ADDRESS, value = $10 Token type OPCODE, value = BEQ Token type ADDRESS, value = $0716 Token type OPCODE, value = RTS Token type OPCODE, value = LDA Token type ADDRESS, value = $10 Token type OPCODE, value = CLC Token type OPCODE, value = ADC Token type CONST, value = #$20 Token type OPCODE, value = STA Token type ADDRESS, value = $10 Token type OPCODE, value = BCS Token type ADDRESS, value = $0702 Token type OPCODE, value = RTS Token type OPCODE, value = INC Token type ADDRESS, value = $11 Token type OPCODE, value = LDA Token type CONST, value = #$06 Token type OPCODE, value = CMP Token type ADDRESS, value = $11 Token type OPCODE, value = BEQ Token type ADDRESS, value = $0716 Token type OPCODE, value = RTS Token type OPCODE, value = DEC Token type ADDRESS, value = $10 Token type OPCODE, value = LDA Token type ADDRESS, value = $10 Token type OPCODE, value = AND Token type CONST, value = #$1f Token type OPCODE, value = CMP Token type CONST, value = #$1f Token type OPCODE, value = BEQ Token type ADDRESS, value = $0716 Token type OPCODE, value = RTS Token type OPCODE, value = JMP Token type ADDRESS, value = $0735 Token type OPCODE, value = LDY Token type CONST, value = #$00 Token type OPCODE, value = LDA Token type ADDRESS, value = $fe Token type OPCODE, value = STA Token type LPAREN Token type ADDRESS, value = $00 Token type RPAREN Token type COMMA Token type REGISTER, value = Y Token type OPCODE, value = RTS Token type OPCODE, value = LDX Token type ADDRESS, value = $03 Token type OPCODE, value = LDA Token type CONST, value = #$00 Token type OPCODE, value = STA Token type LPAREN Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type RPAREN Token type OPCODE, value = LDX Token type CONST, value = #$00 Token type OPCODE, value = LDA Token type CONST, value = #$01 Token type OPCODE, value = STA Token type LPAREN Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type RPAREN Token type OPCODE, value = RTS Token type OPCODE, value = LDX Token type CONST, value = #$00 Token type OPCODE, value = NOP Token type OPCODE, value = NOP Token type OPCODE, value = DEX Token type OPCODE, value = BNE Token type ADDRESS, value = $072f Token type OPCODE, value = RTS Token type OPCODE, value = LDX Token type ADDRESS, value = $03 Token type OPCODE, value = DEX Token type OPCODE, value = TXA Token type OPCODE, value = LDA Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type OPCODE, value = STA Token type ADDRESS, value = $12 Token type COMMA Token type REGISTER, value = X Token type OPCODE, value = DEX Token type OPCODE, value = BPL Token type ADDRESS, value = $06c7 Token type OPCODE, value = LDA Token type ADDRESS, value = $02 Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $06dc Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $06ef Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $06f8 Token type OPCODE, value = LSR Token type REGISTER, value = A Token type OPCODE, value = BCS Token type ADDRESS, value = $070b Token type OPCODE, value = LDA Token type ADDRESS, value = $10 Token type OPCODE, value = SEC Token type OPCODE, value = SBC Token type CONST, value = #$20 Token type OPCODE, value = STA Token type ADDRESS, value = $10 Token type OPCODE, value = BCC Token type ADDRESS, value = $06e6 Token type OPCODE, value = RTS
Submit the project to the Learning Management System as Project2_lastname.java.