(Updated February 2, 2025)
Table of Contents
Overview
Overview
As we build the program to score and print a series of marks representing a complete game, we should also consider how to measure whether the scoring is legitimate. The simplest (Ha ha! Right, simplest, he says!) way is to consider a regular expression (regex). This short document explains a straightforward but incorrect approach followed by a much more precise (hopefully complete) version.
The First Regex
Ok, we begin with a couple of links that will take you to one of the best regex tools on the Internet: regex101.com!
The first is a very easy regex. It goes like this:
[\-0-9xX\/]{12,21}
The square brackets mark a set. This lets us say which characters can be used in one or more positions.
For this set, we allow the characters -,0,1,2,3,4,5,6,7,8,9,/,x, and X. Two characters are escaped. The dash (-) and forward slash (/) have special meanings in regex, so they must be escaped with a backslash (\) to indicate that they are literal characters to be matched.
Uppercase and lowercase X are pretty explanatory. The 0-9 specifies a range. So, now you understand why the dash (-) needed to be escaped. This range expression says we can include all characters 0 through 9 inclusive. The shorthand is very convenient.
The curly braces allow us to specify the number of times the previous sequence may be repeated. In this case, it is 12 and 21 times inclusive.
So why this regex? Well, the shortest string for a bowling score is nine strikes, followed by 3 in the tenth frame. That’s 12 total marks or 12 X’s.
The longest is two marks in the first nine frames (18 marks so far) plus 3 in the tenth frame, for a total of 21 marks.
The problem is we can validate the length and content, not the accuracy. In other words, the number of marks is within the normal boundaries, and the marks used are in the correct set of characters. However, the marks are not being validated for their position in such a way as would properly represent an actual game.
For example, “////////////” will pass this regex. It’s at least 12 characters long and comprises all slashes (/). However, from a scoring perspective, this is pure gibberish.
We need a better solution.
The Better Regex – Part I
While perfect may be the enemy of good, simplicity is the enemy of validity (a take on simplicity is the enemy of justice). We need to make sure the string representing the marks is valid. One issue immediately comes to mind is the scoring of the tenth frame. It doesn’t follow the same rules as the previous nine frames. This will require special treatment. And, as it turns out, it comprises more than half of the regex notation.
The next version is a more complete regex. It goes like this:
(?:[Xx]|[\-0-9][\-0-9\/]){9}(?:[\-0-9]{2}|[\-0-9]\/[\-0-9Xx]|[Xx][\-0-9][\-0-9\/]|[Xx]{2}[\-0-9Xx])
Yeah. That’s a lot to take in, especially compared to the previous version. Let’s take the first half:
(?:[Xx]|[\-0-9][\-0-9\/]){9}
Remember from the previous example how the curly brackets represent the number of repeats? Here, the last sequence has precisely nine repeats. But we need to describe what’s inside the parentheses.
Let’s break this up a bit more:
(?: [Xx] | [\-0-9][\-0-9\/] ) {9}
While this looks complex, it’s relatively simple. The (?: and ) allow us to create a group. (Technically, it’s a non-capturing group, but that’s not important now.) Inside the group, we are providing two patterns separated by the or operator (|).
Inside the group, we have [Xx] and [\-0-9][\-0-9\/]. This means:
- We can strike (x or X)
- Or have any combination of 0-9 pins for the first ball followed by 0-9 pins or a spare for the second ball.
Remember, we are applying this to the first nine frames. Now, we move on to the all-important tenth frame!
The Better Regex – Part II
Let’s recap that the complete regex is:
(?:[Xx]|[\-0-9][\-0-9\/]){9}(?:[\-0-9]{2}|[\-0-9]\/[\-0-9Xx]|[Xx][\-0-9][\-0-9\/]|[Xx]{2}[\-0-9Xx])
Now, we will look at the second half, which is actually the last two-thirds. All of it describes the tenth frame! Let’s break this down further, like we did for the first nine frames.
(?: [\-0-9]{2} | [\-0-9]\/[\-0-9Xx] | [Xx][\-0-9][\-0-9\/] | [Xx]{2}[\-0-9Xx] )
This frame differs from the other nine because of the potential for a third ball. If you look closely at the breakdown, there are four possible ways to complete the tenth frame.
- We can have two balls with 0-9 pins each but no spare.
- We can have 0-9 pins, followed by a spare, then 0-9 pins or a strike.
- We can strike, then 0-9 pins, followed by a spare.
- Finally, we can have two strikes, followed by 0-9 pins or a strike.
That is a pretty complex solution for one frame!
Conclusion
More needs to be done, but it can’t be done with the regex. Let’s explain. This regex does a lot to ensure we accurately represent strikes, spares, and frame markings. Consider:
8/, 9-, -5, x, -/
These are valid scores for a given frame. However, any open frame with two numbers exceeding 10 pins will not be caught. To prevent this, the program would need to use more logic.
While regular expressions can be incredibly powerful, they can’t solve all our problems. Use them efficiently and effectively, but understand that they may not always meet all the validation requirements.
So, while simplicity may be the enemy of validity, complexity doesn’t solve all problems. Realize that solutions are often the result of multiple tools working together. In this case, regex checks the syntax and some programming logic tests to ensure the semantics are sound.