Recently at work I had a task of developing a small tool that could display charts with progress of our data extraction over time.

After each iteration, results of data extraction were stored in several directories and compared against a handwritten reference set.

In short, this is the directory structure:

```
|--+ batch1
| |--+ correct
| | `--- data.txt
| |--+ guessed
| | |--- data.00000145a8ab68b9.txt
| | |--- data.00000145a92a530f.txt
| | `--- data.0000014594039f5b.txt
| |--- input1.pdf
| |--- input2.pdf
| `--- input3.pdf
`--+ batch2
|--+ correct
| `--- data.txt
|--+ guessed
| |--- data.00000145a8ab68b9.txt
| |--- data.00000145a92a530f.txt
| `--- data.0000014594039f5b.txt
|--- input4.pdf
|--- input5.pdf
`--- input6.pdf
```

The hexadecimal numbers are timestamps in milliseconds since Unix epoch.

Each file consisted of records in the format:

```
input1.pdf value from the first file
input2.pdf value from the second file
input3.pdf value from the third file
```

Long story short, there was no difference what programming language I would write the tool in, so I picked Haskell. Let’s ignore most details of the implementation. What’s important for this entry, is that I implemented the following functions:

```
allCorrectFiles :: [FilePath] -> IO [FilePath]
allGuessedFiles :: [FilePath] -> IO [(LocalTime, FilePath)]
readDataFile :: FilePath -> IO [(Entry, String)]
getAllData :: [FilePath] -> IO ([(Entry, String)], [(LocalTime, Entry, String)])
```

The initial implementation of the `getAllData`

function was straightforward, yet a bit clunky:

```
getAllData subDirs = do
correctFiles <- allCorrectFiles subDirs
guessedFiles <- allGuessedFiles subDirs
correctData <- mapM readDataFile correctFiles
guessedData <- forM guessedFiles $ \(t,f) ->
x <- readDataFile f
return $ map (\(e,s) -> (t,e,s)) x
return (concat correctData, concat guessedData)
```

This code is quite ugly. The especially jarring were the `return $ map`

combination and `concat`

s in the final line.

After a while, I noticed that all the functions I call from the `getData`

function are of type `a -> IO [b]`

. A double monad. So, I added `transformers`

library to my project and rewritten that function as:

```
getAllData subDirs = do
correctData <- runListT $ do
f <- ListT $ allCorrectFiles subDirs
x <- ListT $ readDataFile f
return x
guessedData <- runListT $ do
(t,f) <- ListT $ allGuessedFiles subDirs
x <- ListT $ readDataFile f
return (t, fst x, snd x)
return (correctData, guessedData)
```

Now it looks way more uniform.

This was my first piece of code using monad transformers, so I will explain what’s going on to everyone who was in the same situation as me before understanding it.

`IO`

is a monad, so is `[]`

. For some monads, if you wrap them in another monad, you still get something that can be used like a monad. One of such monads is `[]`

.

For every monad `m`

and type `a`

, `m [a]`

is a monad over `a`

. But in order for Haskell to treat it as one singular monad, we need to wrap it. And that’s what `ListT`

is for.

So `allCorrectFiles subDirs`

returns `IO [FilePath]`

, and after wrapping it with `ListT`

we get `ListT IO FilePath`

.

This way, we stack our monadic effects: we both iterate over elements of a list, and track impure side effects. If the `readDataFile`

or `allGuessedFiles`

were pure, we would use `[]`

monad. If they were impure, but returned only one element, we’d use `IO`

monad. By wrapping `IO [a]`

into `ListT`

, we can use them both.

Since the results of inner `do`

blocks are now of type `ListT IO a`

, we unwrap them with `runListT`

.

Note that since `ListT`

allowed `<-`

operator to unwrap not only the `IO`

monad, but also the list monad, we didn’t end up with lists of lists and we no longer needed to `concat`

those lists together.

The code got less ugly, but it also got a bit longer. Could I get it shorter?

The answer was yes, but only a teeny tiny bit, and involved some more complicated machinery. Arrows.

The trivial arrow instance for functions is easy to understand: `>>>`

behaves like flipped function composition, and there are several combinators, like `&&&`

, `***`

, `first`

and `second`

that make it easier to work with 2-element tuples.

Since the return value from the `allGuessedFiles`

was a list of 2-element tuples in an IO monad, and I only operated on the second element before fusing them together, I decided to use arrows. Luckily, the `base`

library contains definitions for monadic arrows, which are to `>>=`

as trivial arrows are to `.`

.

The code had to involve even more wrapping and unwrapping. Here it is:

```
getAllData subDirs = do
correctData <- runKL subDirs $
kl allCorrectFiles >>> kl readDataFile
guessedData <- runKL subDirs $
kl allGuessedFiles >>> second (kl readDataFile) >>^ \(a,(b,c)) -> (a,b,c)
return (correctData, guessedData)
where
kl f = Kleisli $ ListT . f
runKL x ar = runListT $ runKleisli ar x
```

Now I do believe I owe you some explanations.

Let’s start with the definition of `kl`

. `f`

is of type `a -> m [b]`

, where `m`

is a monad. After composing it with `ListT`

, we get `ListT . f`

of type `a -> ListT m b`

, where `ListT m`

is a monad now.

`Kleisli`

wraps our function into a Kleisli arrow (of type `Kleisli (ListT m) a b`

), so Haskell knows that when we join arrows together, we don’t want to abstract `.`

but `>>=`

. This way, `kl f >>> kl g`

means the same as `\x -> f x >>= g`

, where `f >>> g`

would mean `\x -> (g . f) x`

.

`runKL`

applies our arrow and unwraps the `ListT`

transformer. I reversed the argument order of `runKleisli`

for convenience.

So how is the `guessedData`

calculated? `allGuessedFiles`

returns `IO [(LocalTime, FilePath)]`

. `second`

makes its argument to be applied only to the second argument of the input tuple, so `second readDataFile`

would be of type `(a, FilePath) -> IO[(a,(Entry, String))]`

*, but since we used `kl`

, it’s actually `Kleisli (ListT IO) (a, FilePath) (a,(Entry, String))`

.

** Note: I kinda simplified here a bit and technically this type is wrong.*

The last function, which is a lambda expression, was lifted into a Kleisli arrow with the `>>^`

operator. In case of Kleisli arrows, such lifting is equivalent to `\f -> (return . f)`

. It turns a function of type `a->b`

into a function of type `a -> m b`

wrapped into a Kleisli.

I also took a peek at the `arrow-list`

package, which provides its own definitions of equivalents for `kl`

and `runKL`

, but I decided to not use it, since it would give little value.

**So, was it worth it?**

In some sense, yes. I have learnt what monad transformers and Kleisli arrows can be used for. The main problem is that the problem I had to solve was too small for such complex machinery. The transformers made the code more uniform, which allowed application of arrows, while arrows would have made code mode compact if it was more complex.