A Puzzle of Numbers
RasterMaster engineers at Snowbound have an unwritten (and quite obvious) goal when converting formats: be as accurate as possible. However, this typically means diving into myriad subsystems.
The Microsoft Office Open XML (MSOOXML) specifications are free to download for anyone, but oftentimes lack a certain amount of detail on many systems, Excel’s Custom Number Formats, for example.
I am not the first to write code that interprets this system, and provided that I am not the last, hopefully this guide will serve future engineers well.
How To Read This Article
The pain of months (now years) of trial-and-error and guesswork inspired this series of articles. This is not a guide on how to write custom number formats, but a connected web of my understanding of how they work for the purposes of programming a parser for RasterMaster. You will not find RasterMaster code in this article.
If you’ve worked with any other kind of number formatting system before, a lot of this is going to be familiar to you. There are many other kinds of better documented formatters for languages and other tools. This is one of those times where I told you this story, so I can tell you the next one where Excel’s oddness comes into full form. The second part should follow in a few weeks if you want to subscribe or follow Snowbound on Twitter for updates.
Periodically, I will write examples to explain the function of certain systems or characters. From left to right, there will be a written number, a number format, and a result. A Written Number is the number as one would type it into a cell in Excel. The Number Format is a written custom number format. And the Result is the string that the written number gets displayed as when a number format is applied. The number format will never contain unescaped plain text characters to reduce ambiguity.
WrittenNumber --> NumberFormat --> Result
1234 --> #,###.0 --> 1,234.0
What Is a Custom Number Format and What Does It Do
In Excel, cells contain unicode characters as text content. However, if these characters are all numeric digits (0-9), then Excel will automatically designate the cell as a number cell and choose a Number Format to apply to it. A distinction: there are number formats, and custom number formats. Excel comes stocked with 37 built in formats (inexplicably taking up ID slots between 0 and 49, but that’s another story), and any more that you create will be the custom formats.
Anyways, applying a number format to a cell will change the way the number is displayed in Excel. Here is a simple series of written numbers and a custom format that changes them into a result number different from what was written.
Custom number formats are primarily used to add extra characters, comma formatting, leading or trailing zeroes, and differentiating between categories. This is the basic use cases of number formats.
The most common elements of number format are
?. These are special characters used to actually represent numbers that come from the source number. I’ll refer to these as source characters.
0 will display either a digit from the source, or a zero.
# will display a digit from the source, or a null space.
? will display a digit from the source, or an empty space. This is primarly used to space out numbers to align them in adjacent rows.
, will separate every multiple of 103 with a locale specific separator.
A custom format is broken up by semicolons. There can be up to 4 pieces of a format. By default these pieces will affect, in order of left to right: positive, negative, zero, and not numbers. The format I wrote follows the default and can be broken up this way. The first piece is also the default and will be applied if no other piece matches the number requirements.
#,##0 is first in the list, and by default applies to the positive numbers in the first row. Note that even though there is a combination of four source characters, there is only one
0, so no extra characters will be displayed.
- #,##0 applies to the negative numbers in the second row. Note that the minus sign is included along with a space. These unescaped, undesignated plain text characters will be included in result as they are written in the same place as they appear in the number format. Since those characters are to the left of all source characters, the minus sign and the space will always appear furthest to the left of the result. As a second note, removing the minus sign will make negative numbers you write in cells appear positive, even though Excel remembers what you actually wrote inside the cell.
0.0 represents the third row of zeroes. This third piece only applies to zeroes, so if any of those numbers were different, they’d take the properties of the positive or negative pieces. This is also the first code in the list that adds a number not present in the source, namely a trailing zero in the decimal place.
0 can be used to increase the of digits in the result. If the number of zeroes in your number format exceed the size of the length of the written whole number, leading zeroes will be added. If the number of zeroes in your number format exceed the length of the written decimal number, trailing zeroes will be added.
? can be used the same way, except for with whitespace in the result.
Implied in the last bullet point, whole numbers and decimal numbers are treated differently in Excel. For now, know that you can never reduce the number of digits of resulting whole number by providing less source characters than are written. You can reduce the number of resulting decimal digits.
3000000000 --> #,# --> 3,000,000,000
.0000000003 --> .00### --> .00
"not a number" is the most obvious of the bunch, and these are all pretty obvious. In short, you can apply a number format to any cell by using a custom format. It should come as no surprise that everything inside quotation marks will be added to the result as a plain string.
The comma character is normally used to apply comma formatting to the result so long as it is placed in between any two source characters. However, if a comma is placed immediately to the right of the rightmost, whole number source character, the written number will be divided by 1,000 for the result. Comma formatting will still apply to the result.
24000000000 --> #,#,, "Million" --> 24,000 Million
If a number format has less than four pieces, all the same rules described in the list about positive/negative/zero will apply, with one exception. If a number format does not specify a negative format piece, and the piece is applied to a negative number, excel will automatically add a minus sign to the result in the leftmost character. However, if the written number is zero or the result equals zero, no minus sign will be added.
-12345 --> "number "#,0.0 --> -number 12,345.0
-0.0005 --> "no minus ".0 --> no minus .0
\ is the escape character. The character to the right of the
\ will be added to the result as plain text, like a single character quotation.
3000 --> \#\0\""text"\"\ 0.0 --> #0"text" 3000.0
And that’s the first part! Simple, but annoying stuff already filled with exceptions.
Tyler Brogna currently works at Snowbound as an Engineer on RasterMaster. He attended Rochester Institute of Technology and graduated with a BS in Game Design and Development. He collects video games and sometimes designs them as well.