CISS-110 Project 5
For this project, you will continue to work with files.
Learning outcomes
- Implementing user-defined methods.
- Working with regular expressions.
- Working with files.
- Using
Scannerwith files. - Confirmation program produces desired results.
The file format will be similar to a source code program. You will tokenize the contents. Simply put, tokenizing is recognizing the value of each component of a source program by classifying each token. The file will be formatted such that each token is surrounded by whitespace.
Create many methods to identify punctuation, integer constants, etc, based on the table below. Perhaps each method could be an “is” method, such as isPunct, isIntegral, isDouble, isReserved, or isIdentifier, and so on. Each of these will take a token of type String and return a boolean.
In addition to the methods noted above, create another method called classify that takes a String token and prints its classification and value. Example output is provided at the end.
Using the matches() method of the String class, you will be able to identify specific patterns in strings. This will be discussed during the lecture.
The following patterns are provided for identification:
Punctuation -> \p{Punct}
Operator -> !=|<|>|<=|>=|==|\+\+|--|\+|-|/|\*|%|=
Integral -> \d+
Floating -> \d+\.\d+
Identifier -> [a-zA-Z]+[0-9a-zA-Z_]*
Reserved -> int|double|long|bool|true|false|if|else|void|while|for|return
Literal -> \".*\"
Directive -> #[a-zA-Z]+
DFile -> <[a-zA-Z]+\.h>
Using the following test data for your input file:
#include <stdio.h>
#include <stdbool.h>
int main ( void ) {
bool prime ;
for ( int number = 2 ; number <= 100 ; number ++ ) {
if ( number % 2 != 0 )
prime = true ;
else
prime = false ;
for ( int range = 3 ; prime && range < number ; range = range + 2 ) {
if ( number % range == 0 )
prime = false ;
}
if ( prime )
printf ( "%d\n" , number ) ;
}
return 0 ;
}
Your program will produce the following output to the screen:
DIRECTIVE #include
DFILE <stdio.h>
DIRECTIVE #include
DFILE <stdbool.h>
RESERVED int
IDENT main
PUNCT (
RESERVED void
PUNCT )
PUNCT {
RESERVED bool
IDENT prime
PUNCT ;
RESERVED for
PUNCT (
RESERVED int
IDENT number
OPERATOR =
INTCONST 2
PUNCT ;
IDENT number
OPERATOR <=
INTCONST 100
PUNCT ;
IDENT number
OPERATOR ++
PUNCT )
PUNCT {
RESERVED if
PUNCT (
IDENT number
OPERATOR %
INTCONST 2
OPERATOR !=
INTCONST 0
PUNCT )
IDENT prime
OPERATOR =
RESERVED true
PUNCT ;
RESERVED else
IDENT prime
OPERATOR =
RESERVED false
PUNCT ;
RESERVED for
PUNCT (
RESERVED int
IDENT range
OPERATOR =
INTCONST 3
PUNCT ;
IDENT prime
UNKNOWN &&
IDENT range
OPERATOR <
IDENT number
PUNCT ;
IDENT range
OPERATOR =
IDENT range
OPERATOR +
INTCONST 2
PUNCT )
PUNCT {
RESERVED if
PUNCT (
IDENT number
OPERATOR %
IDENT range
OPERATOR ==
INTCONST 0
PUNCT )
IDENT prime
OPERATOR =
RESERVED false
PUNCT ;
PUNCT }
RESERVED if
PUNCT (
IDENT prime
PUNCT )
IDENT printf
PUNCT (
SLITERAL "%d\n"
PUNCT ,
IDENT number
PUNCT )
PUNCT ;
PUNCT }
RESERVED return
INTCONST 0
PUNCT ;
PUNCT }
Submit the project to the Learning Management System as Project6_lastname.java.
EXTRA CREDIT! (+5 points!)
Make modifications to the project to allow for more direct parsing. Switch to reading lines and process the lines character by character, building tokens based on when categories of text change (whitespace, alphabetic, numeric, etc.).
Create three additional global variables:
private static String line;
private static int start, current;
Now create these suggested additional methods:
private static String string() {}
private static String ident() {}
private static String number() {}
private static String op() {}
With these methods, directly manipulate the start, and current variables using line mas your guide, so you can find the beginning and end of a token. These will be responsible for carving out the piece you need to identify. Before leaving each method, be sure to adjust the value of start so you can begin finding the next token.
Without giving away too much about the solution, it's recommended to restructure your main() code to resemble nested while loops:
while (inFile.hasNext()) {
line = inFile.nextLine();
start = 0;
while (start < line.length()) {
// YOUR CODE HERE
classify(t);
}
}
Keep in mind that you are now responsible for ignoring the whitespace characters since you can no longer rely on Scanner to do it for you.
Here is the new sample data file:
#include <stdio.h>
#include <stdbool.h>
int main (void) {
bool prime;
for (int number=2; number<=100; number++) {
if (number % 2 != 0)
prime = true;
else
prime = false;
for (int range = 3; prime && range < number; range = range + 2) {
if (number % range == 0)
prime = false;
}
if (prime)
printf("%d\n" ,number) ;
}
return 0;
}
The output will look a little different than before:
PUNCT #
IDENT include
OPERATOR <
IDENT stdio
PUNCT .
IDENT h
OPERATOR >
PUNCT #
IDENT include
OPERATOR <
IDENT stdbool
PUNCT .
IDENT h
OPERATOR >
RESERVED int
IDENT main
PUNCT (
RESERVED void
PUNCT )
PUNCT {
RESERVED bool
IDENT prime
PUNCT ;
RESERVED for
PUNCT (
RESERVED int
IDENT number
OPERATOR =
INTCONST 2
PUNCT ;
IDENT number
OPERATOR <=
INTCONST 100
PUNCT ;
IDENT number
OPERATOR ++
PUNCT )
PUNCT {
RESERVED if
PUNCT (
IDENT number
OPERATOR %
INTCONST 2
OPERATOR !=
INTCONST 0
PUNCT )
IDENT prime
OPERATOR =
RESERVED true
PUNCT ;
RESERVED else
IDENT prime
OPERATOR =
RESERVED false
PUNCT ;
RESERVED for
PUNCT (
RESERVED int
IDENT range
OPERATOR =
INTCONST 3
PUNCT ;
IDENT prime
PUNCT &
PUNCT &
IDENT range
OPERATOR <
IDENT number
PUNCT ;
IDENT range
OPERATOR =
IDENT range
OPERATOR +
INTCONST 2
PUNCT )
PUNCT {
RESERVED if
PUNCT (
IDENT number
OPERATOR %
IDENT range
OPERATOR =
OPERATOR =
INTCONST 0
PUNCT )
IDENT prime
OPERATOR =
RESERVED false
PUNCT ;
PUNCT }
RESERVED if
PUNCT (
IDENT prime
PUNCT )
IDENT printf
PUNCT (
SLITERAL "%d\n"
PUNCT ,
IDENT number
PUNCT )
PUNCT ;
PUNCT }
RESERVED return
INTCONST 0
PUNCT ;
PUNCT }