Converting Strings – Programming by Design

(Updated November 19, 2024)

Overview

We often have data in a form that is not immediately useful. If we need to do mathematical calculations and all we have is a string, we must first take another step. Here, we explore some of the finer details of how this can be done by examining the algorithm to convert from String to int.

Existing Tools

In Java, we would use Integer.parseInt(). This method takes a String argument and returns a 32-bit int.

int x = Integer.parseInt("4302");

int y = Integer.parseInt("10101101", 2);

In C, we would use something like atoi() or strtol().

int x = atoi("4302");

long y = strtol("10101101", NULL, 2);

Note that these do not consider data validation.

The Algorithm

First, we must understand that the string cannot be used in arithmetic calculations. It’s a group of characters.

Converting a string to a number is a straightforward process. We look at each character in the string and “add” it to a running total. Of course, adding that character requires that we convert it from char to int.

Knowing the ASCII table can help us here. For example, ‘4’ has a value of 52 – not very helpful. We need an integer of 4, not the char ‘4’. However, with some simple subtraction, we can get there. Since ‘0’ is 48, we can do the following:

'4' - '0' = 4
52  - 48  = 4

Yes, we can subtract the chars, and we get a legitimate value that can be used in the calculation. Further, if we start at the beginning of a decimal string, we can keep multiplying by 10 to arrive at the final value.

Let's convert "4302":

let sum = 0

    '4' - '0' = 4

sum = sum * 10 + 4, the result is 0 * 10 + 4 or 4.

    '3' - '0' = 3

sum = sum * 10 + 3, the result is 4 * 10 + 3 or 43.

    '0' - '0' = 0

sum = sum * 10 + 0, the result is 43 * 10 + 0 or 430.

    '2' - '0' = 2

sum = sum * 10 + 2, the result is 430 * 10 + 2 or 4302.

So, the mechanical steps need to be codified. In the next section, we see how we can accomplish that in Java, C, and assembly language.

The Code

Java Version

In Java, we are using a for loop with the length() method to determine when to stop. We access each character with charAt().

Throughout the examples, both the decimal and binary numbers are converted and then added together to show the efficacy of the conversion.

Convert.java

public class Convert {
    public static void main(String[] args) {

        String b10 = "4302";
        String b2 = "10101101";
        int x, num1, num2;

        // convert number multiplying by 10
        num1 = 0;
        for ( x = 0; x < b10.length(); x++)
            num1 = num1 * 10 + b10.charAt(x) - '0';

        System.out.println("Converted string \"" + b10 + "\" to " + num1);

        // convert using a bit shift
        num2 = 0;
        for (x = 0; x < b2.length(); x++)
            num2 = (num2 << 1) + b2.charAt(x) - '0';

        System.out.println("Converted string \"" + b2 + "\" to " + num2);

        System.out.println(num1 + " + " + num2 + " = " + (num1+num2));
    }
}

C Version

Now, we provide a C version of the algorithm. Note that the while loop accentuates the use of non-zero values as true. So, while the current character is not the null character ('\0') which is a zero in the ASCII table, we still have characters left in the string.

convert.c

#include <stdio.h>

int main(void) {

    char *b10 = "4302";
    char *b2 = "10101101";
    int x, num1, num2;

    // convert number to int by multiplying by 10
    num1 = 0;
    x = 0;
    while( b10[x] ) {
        num1 = num1 * 10 + (b10[x] - '0');
        x++;
    }
    printf("Converted string \"%s\" to %d\n", b10, num1);

    // convert using a running multiplier
    num2 = 0;
    x = 0;
    while ( b2[x] ) {
        num2 = (num2 << 1) + (b2[x] - '0');
        x++;
    }
    printf("Converted string \"%s\" to %d\n", b2, num2);

    printf("%d + %d = %d\n", num1, num2, num1+num2);
}

Assembler Version

Here, we have an assembly language version of the solution that uses the 6502/6510 CPU. This code is specifically written for a Commodore 64 and will run in any emulator.

There is much to unpack here. The accumulator (A register) is doing all of the math. The X register is our index into the string, and the Y register is used for counting down the repeated addition for the base-10 conversion.

Historical Note

There is no multiplication instruction on this CPU. Multiplication is achieved through repeated addition and division through repeated subtraction.

convert.asm

define linprt $bdcd   ; print XA (LE) as int
define chrout $ffd2

        lda #147
        jsr chrout

; convert decimal
one:    ldx #0
p1:     lda ns1,x
        beq print1
        ; multiply by 10; copy value
        lda num1
        sta tmp
        lda num1+1
        sta tmp+1
        ; add 9 more times
        ldy #9
more:   clc
        lda tmp
        adc num1
        sta num1
        lda tmp+1
        adc num1+1
        sta num1+1
        dey
        bne more
        ; add new digit
        lda ns1,x
        ; subtract the '0' from the digit
        sec
        sbc #$30
        clc
        adc num1
        sta num1
        lda #0
        adc num1+1
        sta num1+1
        inx
        jmp p1

print1: ldx num1
        lda num1+1
        jsr linprt
        lda #13         ; CR
        jsr chrout

; convert binary
two:    ldx #0
p2:     lda ns2,x
        beq print2
        ; subtract the '0' from the digit
        sec
        sbc #$30
        ; multiply by 2
        asl num2
        rol num2+1
        ; add the new digit
        clc
        adc num2
        sta num2
        lda #0
        adc num2+1
        sta num2+1
        inx
        jmp p2

print2: ldx num2
        lda num2+1
        jsr linprt
        lda #13         ; CR
        jsr chrout

add:    clc
        lda num1
        adc num2
        sta sum
        lda num1+1
        adc num2+1
        sta sum+1

print3: ldx sum
        lda sum+1
        jsr linprt
        lda #13         ; CR
        jsr chrout

        ; and we're done!
end:    rts

; data area

ns1:    txt "4302"
        dcb 0
ns2:    txt "10101101"
        dcb 0

num1:   dcw 0
num2:   dcw 0
tmp:    dcw 0
sum:    dcw 0