Monoids as a Way To Keep Code DRY
We have a maxim in programming circles: Don’t repeat yourself, commonly abbreviated as “DRY.” It’s not uncommon to see a comment on your PR asking you pull something out into a repeatable function in order to “keep your code DRY.”
function getFileContents(system) {
return readFilePromise('logs/' + system + '.log')
}
function writeFilecontents(system, data) {
return writeFilePromise('logs/' + system + '.log', data)
}
See the repeated code? You could extract out the part that builds the file name, making sure that if (for example) the logs change location or naming scheme we only have one place in our code to change.
function getLogPath(system) {
return 'logs/' + system + '.log
}
function getFileContents(system) {
return readFilePromise(getLogPath(system))
}
function writeFilecontents(system, data) {
return writeFilePromise(getLogPath(system), data)
}
This is one of the first principles of writing new code that a fledgling coder will learn. We have to get comfortable with this skill pretty quickly if we want to be successful. Now that we have tools like prettier to automatically format our code, feedback like this is probably the most common type of feedback I see in code reviews.
When I mention to people my newfound love for math and its relationship to functional programming, especially when talking to veteran coders, I sometimes get a response indicating disbelief that such tools are necessary or even particularly useful in “real world coding.” But what if I told you that category theory and abstract algebra provide valuable tools for indentifying common behaviors that you can abstract around to make your code even more DRY?
Let’s start with some basic arithmetic on integers.
1 + 1 = 2
2 + 1 = 3
1 + 1 + 1 = 3
We’re all very familiar with addition over integers. It’s one of the first things we learn. We even get so comfortable with it that we start building on the idea to add very large numbers together in our minds. We even learn about a special number, zero, that can be added to a number without changing anything at all.
1 + 0 = 1
0 + 1 = 1
1 + 1 + 0 = 2
Eventually, we also learn about multiplication as another operation we can apply to integers.
2 * 2 = 4
2 * 3 = 6
2 * 2 * 2 = 8
And almost immediately, given the way we learn about multiplication tables, we discover that any number multiplied by 1 without changing that number at all.
2 * 1 = 2
3 * 1 = 3
2 * 2 * 1 = 4
Now as exciting as basic arithmetic is, I haven’t encountered many professional programmers whose programs consist primarily of adding and multiplying integers together. But what about strings? We combine strings all the time. Heck, we did it above in that getLogPath
function.
'a' + 'bc' // 'abc'
'ab' + 'c' // 'abc'
'a' + 'b' + 'c' // 'abc'
Is there anything we can combine with a string that won’t change the string at all? In fact, yes, we have the empty string.
'abc' + '' // 'abc'
'' + 'abc' // 'abc'
I’m belaboring the point a bit, and you’ve probably already started to see that these operations behave in very similar ways. In fact, if we go up a layer of abstraction, we could maintain that in actually they are not just similar; they behave so similarly that we could replace them all with a single operation. In fact, the commonality of these three is that they all describe “monoids,” sets with an associative binary operation and a special identity element that doesn’t change the elements when combined with the binary operation.
For addition, our associative binary operation is addition. We can always add two numbers, and parentheses don’t matter:
(1 + 2) + 3 = 1 + (2 + 3)
Our neutral identity element for this group is 0.
1 + 0 = 1
0 + 1 = 1
Turning our attention to multiplication, our associative binary operation is addition, and our identity element is 1.
(1 * 2) * 3 = 1 * (2 * 3)
1 * 2 = 2
2 * 1 = 2
Describing strings, our associative binary operation is concatenation, and our identity element is the empty string.
('a' + 'b') + ’c' == 'a' + ('b' + 'c') // true
'abc' + '' // 'abc'
'' + 'abc' // 'abc'
Once you grok monoids, you will start seeing them everywhere in programming. (Combining lists/arrays are another easy example.) If we could describe these operations one abstraction layer up, we could literally use the same code for all of these operations. In Haskell, this is accomplished via type classes. To be a monoid in Haskell, a type needs to implement two functions: mappend
(our associative binary operation) and mempty
(the neutral identity element). Here’s what those typeclass instances would look like:
instance Num a => Monoid (Sum a) where
mempty = Sum 0
mappend = (+)
instance Num a => Monoid (Product a) where
mempty = Product 1
mappend = (*)
instance Monoid (String a) where
mempty = ""
mappend = (++)
Now think for a minute about reduce
in JavaScript or foldr
in Haskell in reference to these monoids. You could have a single function that operated in this way on sums, products, strings, lists…literally any monoid:
foldr mappend mempty ["a“, "b", "c"] – "abc"
foldr mappend mempty [Sum 1, Sum 1, Sum 1] – Sum 3
foldr mappend mempty [Product 2, Product 2, Product 2] – Product 8
In fact, this is such a general case that Haskell has the fold
function does this foldr mappend mempty
repeated code:
fold ["a“, "b", "c"] – "abc"
fold [Sum 1, Sum 1, Sum 1] – Sum 3
fold [Product 2, Product 2, Product 2] – Product 8
That’s all well and good for Haskell, but what about JavaScript. Well, if you’re following the recommendations of the Fantasyland Spec, you can definitely do something similar:
String.empty = () => ''
const Sum = x => ({
value: x,
concat: y => Sum(x + y.value),
toString: () => `Sum(${x})`,
inspect: () => `Sum(${x})`
})
Sum.empty = () => Sum(0)
const Product = x => ({
value: x,
concat: y => Product(x * y.value),
toString: () => `Product(${x})`,
inspect: () => `Product(${x})`
})
Product.empty = () => Product(1)
const fold = (type, xs) => xs.reduce(
(acc, val) => acc.concat(val),
type.empty()
)
fold(String, ['a', 'b', 'c']) // 'abc'
fold(Sum, [Sum(1), Sum(1), Sum(1)]) // Sum(3)
fold(Product, [Product(2), Product(2), Product(2)]) // Product(8)
Let’s go further. What if we wanted a new monoid for Max where our associative binary operation was a function that returned the higher of two numbers?
const Max = x => ({
value: x,
concat: y => Max(x > y.value ? x : y.value),
toString: () => `Max(${x})`,
inspect: () => `Max(${x})`
})
But what would we use for our neutral identity value? We need a number that would literally be less than any other number we would put in. Conveniently, JavaScript has a constant for Infinity
.
Max.empty = () => Max(-Infinity)
It works with the exact same fold
function.
fold(Max, [Max(1), Max(2), Max(3)]) // Max(3)
If Max exists, then we can clearly do Min as well.
const Min = x => ({
value: x,
concat: y => Min(x < y.value ? x : y.value),
toString: () => `Min(${x})`,
inspect: () => `Min(${x})`
})
Min.empty = () => Min(Infinity)
fold(Min, [Min(1), Min(2), Min(3)]) // Min(1)
These levels of abstraction are important for a couple of reasons. Firstly, they let us write more DRY code, giving all the benefits that come with that, including easier code maintenance and more clear testing requirements. Secondly and perhaps more subtly, they make our code easier to reason about. As long as our implementation of a particular data type meets all the requirements of a monoid (including the mathematical laws that govern them), then we can just use them without stressing about implementation details. If I know that I can always combine two monoids with a concat
or mappend
to get an appropriate result, then processing large lists of data in an appropriate way become a lot easier. You can even start building additional layers of complexity on top.
For example, imagine if we had user account objects, and we wanted to be able to merge two accounts with the following rules:
- New username becomes the two usernames concatenated together.
- Age of account should be the age of the oldest account.
- Account balance should be the sum both account balances.
const Person = (username, ageOfAccount, accountBalance) => ({
username,
ageOfAccount: Max(ageOfAccount),
accountBalance: Sum(accountBalance),
concat: secondAccount =>
Person(
username.concat(secondAccount.username),
ageOfAccount.concat(secondAccount.ageOfAccount.value),
accountBalance.concat(secondAccount.accountBalance.value)
),
toString: () =>
`Person(${username}, ${ageOfAccount} days, ${accountBalance} USD)`,
inspect: () =>
`Person(${username}, ${ageOfAccount} days, ${accountBalance} USD)`
})
Person.empty = () => Person(String.empty(), Max.empty(), Sum.empty())
const person1 = Person('alpha', Max(300), Sum(107.1))
const person2 = Person('beta', Max(5), Sum(5))
fold(Person, [person1, person2]) // Person(alphabeta, Max(300) days, Sum(112.1) USD)
Because they’re a mathematical concept, you can identify monoids, reason about them, and write code based on the larger abstract concept without worrying about the specific implementation details. The interface to the concept is ultimately what matters. When we can see the larger connections between seemingly disparate data structures, it lets us process data by describing what we want to do rather than having to tell the computer how to do what we want to do each time.
This is especially powerful because monoids are everywhere. While preparing for a recent talk on the subject with my dev team, I thought of these examples with very little effort…
- Lists and arrays.
- Binary trees.
- JavaScript Objects under
Object.assign
or Ramda’smerge
. - Vectors describing motion in 2D or 3D.
- Function composition.
As long as you can describe your data in terms of an associative binary combination method and you can describe a neutral identity element that can be combined with any other piece of data to return it unchanged, you no longer need to think about how to reduce a list of those elements down to a single result. And the fact that the operation is associative even means that you or your very smart compiler could separate the calculation into pieces so that they’re using multiple threads or even multiple servers. After all, you can always mathematically guarantee that parentheses don’t matter because, as along as you maintain the correct ordering, you’re going to get the same result.
If you identify and use monoids in your code, you will produce cleaner code that took you less time to write and is more likely to handle concurrency gracefully. And this is just one example of how mathematical concepts from abstract algebra or category theory can help us reason about our code. They let us identify common patterns and abstract them out into shared interfaces, and that ultimately leads to better and more maintainable code.