My last few articles have given me a chance to relive my undergraduate computer science degree and code a Roman numeral to decimal converter. It's quite handy when you're watching old movies (when was MCMLVII anyway?), and the basic coding algorithm was reasonably straightforward. (See Dave's "Roman Numerals and Bash" and "More Roman Numerals and Bash".)
The trick with Roman numerals, however, is that it's what's known as a subtractive notation. In other words, it's not a position → value or even symbol → value notation, but a sort of hybrid. MM = 2000, and C = 100, but MMC and MCM are quite different: the former is 2100, and the latter is 1000 + (–100 + 1000) = 1900.
This means that the conversion isn't quite as simple as a mapping table, which makes it a good homework assignment for young comp-sci students!
In the Roman numeral to decimal conversion, a lot of the key work was done by this simple function:
mapit() {
case $1 in
I|i) value=1 ;;
V|v) value=5 ;;
X|x) value=10 ;;
L|l) value=50 ;;
C|c) value=100 ;;
D|d) value=500 ;;
M|m) value=1000 ;;
* ) echo "Error: Value $1 unknown" >&2 ; exit 2 ;;
esac
}
You'll need this function to proceed, but as a cascading set of conditional statements. Indeed, in its simple form, you could code a decimal to Roman numeral converter like this:
while [ $decvalue -gt 0 ] ; do
if [ $decvalue -gt 1000 ] ; then
romanvalue="$romanvalue M"
decvalue=$(( $decvalue - 1000 ))
elif [ $decvalue -gt 500 ] ; then
romanvalue="$romanvalue D"
decvalue=$(( $decvalue - 500 ))
elif [ $decvalue -gt 100 ] ; then
romanvalue="$romanvalue C"
decvalue=$(( $decvalue - 100 ))
elif [ $decvalue -gt 50 ] ; then
romanvalue="$romanvalue L"
decvalue=$(( $decvalue - 50 ))
elif [ $decvalue -gt 10 ] ; then
romanvalue="$romanvalue X"
decvalue=$(( $decvalue - 10 ))
elif [ $decvalue -gt 5 ] ; then
romanvalue="$romanvalue V"
decvalue=$(( $decvalue - 5 ))
elif [ $decvalue -ge 1 ] ; then
romanvalue="$romanvalue I"
decvalue=$(( $decvalue - 1 ))
fi
done
This actually works, though the results are, um, a bit clunky:
$ sh 2roman.sh 25
converts to roman numeral X X I I I I I
Or, more overwhelming:
$ sh 2roman.sh 1900
converts to roman numeral M D C C C L X X X X V I I I I I
I suppose there is some sort of charm to the latter, but there also are much, much better ways to simplify this. You can do all the math, but since my approach to coding is often "be lazy, get it done, move on", let's recognize that there are a very small number of special case numeric values:
900 = CM
400 = CD
90 = XC
40 = XL
9 = IX
4 = IV
That's really it. The notation allows only a single character subtracting from another, so you can't have CCM or IIX (the latter being correctly written as VIII), and some of the other possible two-character notations don't make sense. For example, why use VX when V is the same value?
So given that, all you really need to do is expand the
if-elseif
block to
add the five possible values above, which makes for a pretty darn long code
block. But before I share it, did you catch the error in the above code?
It's the other reason that the resultant Roman numerals are so darn long, actually. Let's look at just the very first conditional statement:
if [ $decvalue -gt 1000 ] ; then
romanvalue="$romanvalue M"
decvalue=$(( $decvalue - 1000 ))
Here's the question to ask yourself: what happens if the
$decvalue
is
exactly 1000? Isn't that "M"? Yes, it is. Which means that all
these conditionals are wrong; instead of being -gt
they should
be -ge
.
With that fix, here's the big block of code:
while [ $decvalue -gt 0 ] ; do
if [ $decvalue -ge 1000 ] ; then
romanvalue="$romanvalue M"
decvalue=$(( $decvalue - 1000 ))
elif [ $decvalue -ge 900 ] ; then
romanvalue="$romanvalue CM"
decvalue=$(( $decvalue - 900 ))
elif [ $decvalue -ge 500 ] ; then
romanvalue="$romanvalue D"
decvalue=$(( $decvalue - 500 ))
elif [ $decvalue -ge 400 ] ; then
romanvalue="$romanvalue CD"
decvalue=$(( $decvalue - 400 ))
elif [ $decvalue -ge 100 ] ; then
romanvalue="$romanvalue C"
decvalue=$(( $decvalue - 100 ))
elif [ $decvalue -ge 90 ] ; then
romanvalue="$romanvalue XC"
decvalue=$(( $decvalue - 90 ))
elif [ $decvalue -ge 50 ] ; then
romanvalue="$romanvalue L"
decvalue=$(( $decvalue - 50 ))
elif [ $decvalue -ge 40 ] ; then
romanvalue="$romanvalue XL"
decvalue=$(( $decvalue - 40 ))
elif [ $decvalue -ge 10 ] ; then
romanvalue="$romanvalue X"
decvalue=$(( $decvalue - 10 ))
elif [ $decvalue -ge 9 ] ; then
romanvalue="$romanvalue IX"
decvalue=$(( $decvalue - 9 ))
elif [ $decvalue -ge 5 ] ; then
romanvalue="$romanvalue V"
decvalue=$(( $decvalue - 5 ))
elif [ $decvalue -ge 4 ] ; then
romanvalue="$romanvalue IV"
decvalue=$(( $decvalue - 4 ))
elif [ $decvalue -ge 1 ] ; then
romanvalue="$romanvalue I"
decvalue=$(( $decvalue - 1 ))
fi
done
It works (albeit with some easily removed spaces) with some basic numeric tests:
$ sh 2roman.sh 71
converts to roman numeral L X X I
$ sh 2roman.sh 1997
converts to roman numeral M CM XC V I I
$ sh 2roman.sh 666
converts to roman numeral D C L X V I
The problem is, that's a long and graceless block of code, even if it does solve the problem.
Obviously, every block of code has the very same format:
elif [ $decvalue -ge VALUE ] ; then
romanvalue="$romanvalue NOTATION-FOR-VALUE"
decvalue=$(( $decvalue - VALUE ))
As highlighted, there are only two values to consider: the numeric value
VALUE
and the one or two character
NOTATION-FOR-VALUE
. As an example,
VALUE=90, NOTATION=XC
. The logical function then is:
SubIfValue()
{
# if $decvalue >= $2 then add $3 to romanvalue
# and subtract $2 from decvalue
if [ $decvalue -ge $1 ] ; then
romanvalue="${romanvalue}$2"
decvalue=$(( $decvalue - $1 ))
fi
}
This would produce a series of invocations like this:
SubIfValue 500 "D"
SubIfValue 400 "CD"
SubIfValue 100 "C"
SubIfValue 90 "XC"
But there's a problem with this. The loop has to iterate and subtract the largest possible value each time through; otherwise, you get very odd results.
So algorithmically, you still need to have the if-then-elif
loop anyway:
if ( SubIfValue 500 "D" fails ) then
The problem is, that's really hard to do cleanly because you can't actually return values from a Bash shell function. Rather than be too clunky, therefore, I'm going to find a compromise on my quest to clean up the code. I can leave the function mostly the same:
SubValue()
{
# add $3 to romanvalue and subtract $2 from decvalue
romanvalue="${romanvalue}$2"
decvalue=$(( $decvalue - $1 ))
}
The sequence of calls is going to look more succinct, however:
if [ $decvalue -ge 1000 ] ; then
SubValue 1000 "M"
elif [ $decvalue -ge 900 ] ; then
SubValue 900 "CM"
elif [ $decvalue -ge 500 ] ; then
SubValue 500 "D"
elif [ $decvalue -ge 400 ] ; then
SubValue 400 "CD"
elif [ $decvalue -ge 100 ] ; then
SubValue 100 "C"
elif [ $decvalue -ge 90 ] ; then
SubValue 90 "XC"
elif [ $decvalue -ge 50 ] ; then
SubValue 50 "L"
elif [ $decvalue -ge 40 ] ; then
SubValue 40 "XL"
elif [ $decvalue -ge 10 ] ; then
SubValue 10 "X"
elif [ $decvalue -ge 9 ] ; then
SubValue 9 "IX"
elif [ $decvalue -ge 5 ] ; then
SubValue 5 "V"
elif [ $decvalue -ge 4 ] ; then
SubValue 4 "IV"
elif [ $decvalue -ge 1 ] ; then
SubValue 1 "I"
fi
And, that's the fully functional script once you wrap it in the while;do
/ done
loop.
A few tests:
$ sh 2roman.sh 1991
converts to roman numeral MCMXCI
$ sh 2roman.sh 2222
converts to roman numeral MMCCXXII
$ sh 2roman.sh 1234
converts to roman numeral MCCXXXIV
So there you go. Solved. Now I can't recall why it seemed so daunting when I was in college. I will note that in a more sophisticated programming language, you could come up with a considerably shorter solution, particularly if you could utilize a two-dimensional array for number/characters pairs. But, that's not the Bash shell, so we work with what we have, right?
See you next time. Meanwhile, do you have an interesting programming puzzle? Drop me a note, and I'll have a look at it!