CISS-111 Project 3
Write a Java program to demonstrate using bitmaps and bitwise operators to sort and remove duplicates from a file of random phone numbers. Do not confuse the term bitmap used for compressing data into smaller spaces with the bitmap that has come to mean a graphic image.
Learning outcomes
- Working with command line arguments.
- Exposure to rudimentary bitwise operators.
- Introduction to data (de)compression concepts.
- Working with side-effects to achieve design goals.
- Confirmation program produces desired results.
You will use this file of 4 million random phone numbers with possible repeats passed in as args[0]
. In addition, you will produce an output filename provided in args[1]
, into which you will write the sorted list of phone numbers without duplicates.
Each phone number begins with the digit 2 or greater, so phone numbers are 2000000 through 9999999. There are no area codes or country code prefixes to deal with. The file looks something like this:
9827019 5655875 2593305 8586163 6115967 9128969 9998369 8551496 ...
Here are some declarations to get you started:
Scanner inFile = new Scanner(new FileReader(args[0]));
PrintWriter outFile = new PrintWriter(args[1]);
byte[] bitmap = new byte[1000000];
You will use a bitmap as an array of 1,000,000 bytes (that is, the type byte
). Each bit in the array of one million bytes will represent one phone number. The bit will be turned on if the number is read from the file. Those remaining bits will indicate the number was never found in the file.
You will want to review Bitwise Operators.
Using this bitmap data structure, we depend on several side effects:
- This takes about 1/32nd the amount of space than the original file and also if stored as
int
s. - We will guarantee no repeats in the resultant file because we only know if we saw the number.
- We automatically sort the data since values are represented in ascending bits.
Process all values before writing to the output file.
Your output file will have all the found numbers with no duplicates in ascending, sorted order.
As a guide, pn.txt
is provided above and is 32,000,000 bytes. Your result file should be 25,195,720 bytes on Mac/Unix or 28,345,185 in Windows/DOS. Values other than this indicate an error in processing, and you may want to consider a simpler file for diagnosis.
HINT: Use / to determine byte and % to determine bit within the byte. Bitwise or (|) should be used to set the bit, and bitwise and (&) should be used to check a bit.
Submit the project to the Learning Management System as Project3_lastname.java.