Recently at work I had a task of developing a small tool that could display charts with progress of our data extraction over time.
After each iteration, results of data extraction were stored in several directories and compared against a handwritten reference set.
In short, this is the directory structure:
|--+ batch1
| |--+ correct
| | `--- data.txt
| |--+ guessed
| | |--- data.00000145a8ab68b9.txt
| | |--- data.00000145a92a530f.txt
| | `--- data.0000014594039f5b.txt
| |--- input1.pdf
| |--- input2.pdf
| `--- input3.pdf
`--+ batch2
|--+ correct
| `--- data.txt
|--+ guessed
| |--- data.00000145a8ab68b9.txt
| |--- data.00000145a92a530f.txt
| `--- data.0000014594039f5b.txt
|--- input4.pdf
|--- input5.pdf
`--- input6.pdf
The hexadecimal numbers are timestamps in milliseconds since Unix epoch.
Each file consisted of records in the format:
input1.pdf value from the first file
input2.pdf value from the second file
input3.pdf value from the third file
Long story short, there was no difference what programming language I would write the tool in, so I picked Haskell. Let’s ignore most details of the implementation. What’s important for this entry, is that I implemented the following functions:
allCorrectFiles :: [FilePath] -> IO [FilePath]
allGuessedFiles :: [FilePath] -> IO [(LocalTime, FilePath)]
readDataFile :: FilePath -> IO [(Entry, String)]
getAllData :: [FilePath] -> IO ([(Entry, String)], [(LocalTime, Entry, String)])
The initial implementation of the getAllData
function was straightforward, yet a bit clunky:
getAllData subDirs = do
correctFiles <- allCorrectFiles subDirs
guessedFiles <- allGuessedFiles subDirs
correctData <- mapM readDataFile correctFiles
guessedData <- forM guessedFiles $ \(t,f) ->
x <- readDataFile f
return $ map (\(e,s) -> (t,e,s)) x
return (concat correctData, concat guessedData)
This code is quite ugly. The especially jarring were the return $ map
combination and concat
s in the final line.
After a while, I noticed that all the functions I call from the getData
function are of type a -> IO [b]
. A double monad. So, I added transformers
library to my project and rewritten that function as: