CISS-111 Project 2
Write a Java program to demonstrate using Vector
, enumerations, regular expressions, HashMap
, and record
. You will create a program to perform basic parsing of a 6502 assembly language program.
Learning outcomes
- Working with files.
- Building enumerations.
- Working with enumeration methods.
- Working with strings and parsing.
- Working with Java records.
- Exposure to predefined data structures.
- Exposure to Vectors.
- Exposure to regular expressions.
Token Types
Recall that enumerations are special forms of classes and therefore are subclasses of Object
. Using your knowledge of enumerations, create the following enumeration with the constants defined below:
package hvcc.ciss111; // <<--- Hey! We're making a package!
public enum TokenType {
// instructions
ADC, AND, ASL, BCC, BCS, BEQ, BIT, BMI, BNE, BPL, BRK, BVC, BVS, CLC,
CLD, CLI, CLV, CMP, CPX, CPY, DEC, DEX, DEY, EOR, INC, INX, INY, JMP,
JSR, LDA, LDX, LDY, LSR, NOP, ORA, PHA, PHP, PLA, PLP, ROL, ROR, RTI,
RTS, SBC, SEC, SED, SEI, STA, STX, STY, TAX, TAY, TSX, TXA, TXS, TYA,
// punctuation
LPAREN, COMMA, RPAREN,
// numerics
ADDRESS, CONST,
// registers X and Y
REGISTER
}
This enumeration is centered around a ton of reserved words, some punctuation, numbers, and a couple of registers.
Now write java code to read a program written in 6502 assembly language. This will be an easy introduction to parsing. Everything is space-delimited so that you can use split()
to help you out. The entire exercise is about identification.
Numbers
The following Java regex will help identify the values you may find in the program.
#?\$\p{XDigit}{2}(?:\p{XDigit}{2})?
#? -> match a # zero or one time. \$ -> match exactly a $ \p{XDigit}{2} -> match exactly 2 ({2}) hexadecimal digits (\p{XDigit}) (?:\p{XDigit}{2}) -> This one takes a little more to explain...
The final piece matches everything enclosed in the parentheses, the (?:). Inside of that, we are trying to, again, match exactly 2 ({2}) hexadecimal digits (\p{XDigit}). The key is the final question mark that says to do the parenthetic match zero or one time. The following will match this regex:
$89 #$89 $Ff8b #$ff8b
Pasting the regex into your Java IDE (especially IntelliJ) will add more backslashes to get the escapes right. So, when you do paste it into the IDE, it will look more like this:
#?\\$\\p{XDigit}{2}(?:\\p{XDigit}{2})?
We will not be using the default namespace for this project. We will be creating our own based on the package details noted below.
src
folder (right click, select New, and then Package) and give the name hvcc.ciss111
. Then, you are ready to start writing code by creating a class in that package!package hvcc.ciss111; // <<--- Hey! We're making a package!
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.io.FileReader;
import java.util.Vector;
import java.util.HashMap;
import static hvcc.ciss111.TokenType.*;
public class Project2 {
public record Token (TokenType type, String value) {
// Override the toString() method
}
public static void main(String[] args) throws FileNotFoundException {
Scanner inFile;
Vector<Token> tokens = new Vector<>();
HashMap<String, TokenType> opcodes = new HashMap<>();
TokenType[] list = TokenType.values();
TokenType tt;
/* YOUR CODE HERE! */
}
}
Start by opening the file, setup an EOF
loop, and reading lines. Each line can be split()
and then begin the identification process. You don't need to create methods, but it may make things easier.
The HashMap
is there to help you map strings to enumerations. With each token from the line, classify it and create a Token
object and add it to the Vector
.
When the file is exhausted, iterate over the Vector
and write the objects to the screen.
Test Program
JSR $0606
LDX #$02
LDA $10 , X
CMP $10
BNE $06b6
LDA $11 , X
CMP $11
BEQ $06bf
INX
INX
CPX $03
BEQ $06c2
JMP $06aa
JMP $0735
RTS
LDX $03
DEX
TXA
LDA $10 , X
STA $12 , X
DEX
BPL $06c7
LDA $02
LSR A
BCS $06dc
LSR A
BCS $06ef
LSR A
BCS $06f8
LSR A
BCS $070b
LDA $10
SEC
SBC #$20
STA $10
BCC $06e6
RTS
DEC $11
LDA #$01
CMP $11
BEQ $0716
RTS
INC $10
LDA #$1f
BIT $10
BEQ $0716
RTS
LDA $10
CLC
ADC #$20
STA $10
BCS $0702
RTS
INC $11
LDA #$06
CMP $11
BEQ $0716
RTS
DEC $10
LDA $10
AND #$1f
CMP #$1f
BEQ $0716
RTS
JMP $0735
LDY #$00
LDA $fe
STA ( $00 ) , Y
RTS
LDX $03
LDA #$00
STA ( $10 , X )
LDX #$00
LDA #$01
STA ( $10 , X )
RTS
LDX #$00
NOP
NOP
DEX
BNE $072f
RTS
LDX $03
DEX
TXA
LDA $10 , X
STA $12 , X
DEX
BPL $06c7
LDA $02
LSR A
BCS $06dc
LSR A
BCS $06ef
LSR A
BCS $06f8
LSR A
BCS $070b
LDA $10
SEC
SBC #$20
STA $10
BCC $06e6
RTS
Your output will go to the screen. Since we are using a record and the default output is messy, you will override the toString()
method provided to look more like the output below.
Token type JSR Token type ADDRESS, value = $0606 Token type LDX Token type CONST, value = #$02 Token type LDA Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type CMP Token type ADDRESS, value = $10 Token type BNE Token type ADDRESS, value = $06b6 Token type LDA Token type ADDRESS, value = $11 Token type COMMA Token type REGISTER, value = X Token type CMP Token type ADDRESS, value = $11 Token type BEQ Token type ADDRESS, value = $06bf Token type INX Token type INX Token type CPX Token type ADDRESS, value = $03 Token type BEQ Token type ADDRESS, value = $06c2 Token type JMP Token type ADDRESS, value = $06aa Token type JMP Token type ADDRESS, value = $0735 Token type RTS Token type LDX Token type ADDRESS, value = $03 Token type DEX Token type TXA Token type LDA Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type STA Token type ADDRESS, value = $12 Token type COMMA Token type REGISTER, value = X Token type DEX Token type BPL Token type ADDRESS, value = $06c7 Token type LDA Token type ADDRESS, value = $02 Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $06dc Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $06ef Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $06f8 Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $070b Token type LDA Token type ADDRESS, value = $10 Token type SEC Token type SBC Token type CONST, value = #$20 Token type STA Token type ADDRESS, value = $10 Token type BCC Token type ADDRESS, value = $06e6 Token type RTS Token type DEC Token type ADDRESS, value = $11 Token type LDA Token type CONST, value = #$01 Token type CMP Token type ADDRESS, value = $11 Token type BEQ Token type ADDRESS, value = $0716 Token type RTS Token type INC Token type ADDRESS, value = $10 Token type LDA Token type CONST, value = #$1f Token type BIT Token type ADDRESS, value = $10 Token type BEQ Token type ADDRESS, value = $0716 Token type RTS Token type LDA Token type ADDRESS, value = $10 Token type CLC Token type ADC Token type CONST, value = #$20 Token type STA Token type ADDRESS, value = $10 Token type BCS Token type ADDRESS, value = $0702 Token type RTS Token type INC Token type ADDRESS, value = $11 Token type LDA Token type CONST, value = #$06 Token type CMP Token type ADDRESS, value = $11 Token type BEQ Token type ADDRESS, value = $0716 Token type RTS Token type DEC Token type ADDRESS, value = $10 Token type LDA Token type ADDRESS, value = $10 Token type AND Token type CONST, value = #$1f Token type CMP Token type CONST, value = #$1f Token type BEQ Token type ADDRESS, value = $0716 Token type RTS Token type JMP Token type ADDRESS, value = $0735 Token type LDY Token type CONST, value = #$00 Token type LDA Token type ADDRESS, value = $fe Token type STA Token type LPAREN Token type ADDRESS, value = $00 Token type RPAREN Token type COMMA Token type REGISTER, value = Y Token type RTS Token type LDX Token type ADDRESS, value = $03 Token type LDA Token type CONST, value = #$00 Token type STA Token type LPAREN Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type RPAREN Token type LDX Token type CONST, value = #$00 Token type LDA Token type CONST, value = #$01 Token type STA Token type LPAREN Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type RPAREN Token type RTS Token type LDX Token type CONST, value = #$00 Token type NOP Token type NOP Token type DEX Token type BNE Token type ADDRESS, value = $072f Token type RTS Token type LDX Token type ADDRESS, value = $03 Token type DEX Token type TXA Token type LDA Token type ADDRESS, value = $10 Token type COMMA Token type REGISTER, value = X Token type STA Token type ADDRESS, value = $12 Token type COMMA Token type REGISTER, value = X Token type DEX Token type BPL Token type ADDRESS, value = $06c7 Token type LDA Token type ADDRESS, value = $02 Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $06dc Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $06ef Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $06f8 Token type LSR Token type REGISTER, value = A Token type BCS Token type ADDRESS, value = $070b Token type LDA Token type ADDRESS, value = $10 Token type SEC Token type SBC Token type CONST, value = #$20 Token type STA Token type ADDRESS, value = $10 Token type BCC Token type ADDRESS, value = $06e6 Token type RTS
Submit the project to the Learning Management System as Project2_lastname.java.