Recently, I had an opportunity to fix two applications to get them running on Java 10 while preserving compatibility with Java 8. Migrating to the module system was not on the table, the goal was just to run everything on a newer OpenJDK. It was not a trivial task: neither of programs worked on Java 10 out of the box. Here’s how I managed to fix the applications so the same binary code could run on Java 8, 10 and the release candidate version of Java 11.
By default, Google Analytics fires once per page load and therefore it only adds entries when the user navigates between different documents. But today, in the era of frequent aingle-page applications, this might not be enough. A web app might have multiple different sections, displayed conceptually from within the same document, and Google Analytics won’t track where the use is navigating.
In search of a solution for tracking user behaviour accross the document, I found several guides that required to set up the Google Tag Manager. But I couldn’t be bothered figuring yet another tool, so I decided to look for a more hacky and a simpler solution.
One of the main reasons I haven’t posted a lot on this blog is that the whole process of building the website was quite cumbersome and required a fully configured Ruby environment. Thankfully, Github offers its own Jekyll installation that automatically builds the website and allows people to submit files straight from the browser. So I decided to spend few hours and unshackle myself from the confines of my local and soon-to-be ditched Ubuntu 14.04 installation that hosted necessary tools. The result is, as you can see, a fully working and functionally-equivalent blog, but which I can edit in a browser. In fact, I’m writing this post in Firefox right now.
Of course, migration was not trivial. I had to make multiple changes:
Suppose you have two ZIP archives on two different machines and you want to synchronize their contents with minimal network traffic, without much regard for the CPU usage, and you know that the target machine already contains the older version of the archive, which is mostly identical to the new version. For example, you have to upload a fat JAR with the new build over a metered connection, and the server literally sits in a garage at the other end of the country.
If the archive is small, uploading it in its entirety looks simple enough, but if it grows into dozens of megabytes, of which 90% is non-changing 3rd party code, it quickly becomes a waste of both time and bandwidth.
Time is money. And depending on your ISP, bandwidth can be money too.
Long story short, I’m going to assume that the target machine runs Linux, and the source machine runs either Windows or Linux. I am going to use
Why those two? Because
fuse-zip can convert a ZIP archive into a writeable filesystem, and
rsync can sync a writeable filesystem with another. Details below.
Java 8 introduced basic support for first-class functions. The functions, unlike in other languages, aren’t represented by only handful of types. Instead, Java 8 uses dozens of various types depending on arity, parameter types and the return type. The standard documentation is pretty unwieldy, so for my and your convenience, I prepared a list of all functional interfaces in a more useful order.
Few things to have in mind:
All interfaces are in
java.util.functionpackage unless otherwise noted.
Interfaces are specialized only for
double, and only sometimes. Other primitive types, and all primitive types in certain situations, will have to be boxed.
|function type||Java type|
|() → void||
|() → boolean||
|() → int||
|() → long||
|() → double||
|() → A||
Here comes a list of things that I would love to see in Scala 3.0. Some of them are breaking changes, hence 3.0 not 2.13 or anything like that. Some of them are about the compiler, some of them are about the library, some of them are about the external tools. Some of those ideas are different solutions for the same problem.
Warning: I’m now going to wake up the ghosts of the past. The past which includes your undergraduate abstract algebra lectures.
This post is mostly a result of my boredom and my willingness to show that you can shoehorn almost every abstraction into almost every programming language, but it’s not exactly the best idea to do so.
Also, I simply wanted to say a word or two about several abstract mathematical concepts. It can be a worthwhile intellectual exercise.
Before I talk about catamorphisms, mentioned in the title, I’d like to have a look at a very abstract and general mathematical structure: an algebra.
An algebra is a tuple that contains:
some sets (called carrier sets), most often one
usually some operations, each of them from some Cartesian product of the carrier sets to one of the carrier sets
sometimes some distinguished elements from those sets (they’re often superfluous, but they will be required later; you can also think about them as of nullary operations)
a set of strings
Strand the operation of concatenation
+: Str × Str → Str
a set of natural numbers
Natand the operation of addition:
+: Nat × Nat → Nat
a set of finite subsets of natural numbers
Natand a set of natural numbers
P(Nat), a distinguished empty set
Øand the operations of union and intersection
P(Nat) × P(Nat) → P(Nat)and the largest element operation
max: P(Nat) → Nat(with
max(Ø) = 0)
Note: I’m using the
+ operator for string concatenation because the article is supposed to end up with creating some Java code, and Java uses
+ for string concatenation.
Recently at work I had a task of developing a small tool that could display charts with progress of our data extraction over time.
After each iteration, results of data extraction were stored in several directories and compared against a handwritten reference set.
In short, this is the directory structure:
|--+ batch1 | |--+ correct | | `--- data.txt | |--+ guessed | | |--- data.00000145a8ab68b9.txt | | |--- data.00000145a92a530f.txt | | `--- data.0000014594039f5b.txt | |--- input1.pdf | |--- input2.pdf | `--- input3.pdf `--+ batch2 |--+ correct | `--- data.txt |--+ guessed | |--- data.00000145a8ab68b9.txt | |--- data.00000145a92a530f.txt | `--- data.0000014594039f5b.txt |--- input4.pdf |--- input5.pdf `--- input6.pdf
The hexadecimal numbers are timestamps in milliseconds since Unix epoch.
Each file consisted of records in the format:
input1.pdf value from the first file input2.pdf value from the second file input3.pdf value from the third file
Long story short, there was no difference what programming language I would write the tool in, so I picked Haskell. Let’s ignore most details of the implementation. What’s important for this entry, is that I implemented the following functions:
allCorrectFiles :: [FilePath] -> IO [FilePath] allGuessedFiles :: [FilePath] -> IO [(LocalTime, FilePath)] readDataFile :: FilePath -> IO [(Entry, String)] getAllData :: [FilePath] -> IO ([(Entry, String)], [(LocalTime, Entry, String)])
The initial implementation of the
getAllData function was straightforward, yet a bit clunky:
getAllData subDirs = do correctFiles <- allCorrectFiles subDirs guessedFiles <- allGuessedFiles subDirs correctData <- mapM readDataFile correctFiles guessedData <- forM guessedFiles $ \(t,f) -> x <- readDataFile f return $ map (\(e,s) -> (t,e,s)) x return (concat correctData, concat guessedData)
This code is quite ugly. The especially jarring were the
return $ map combination and
concats in the final line.
After a while, I noticed that all the functions I call from the
getData function are of type
a -> IO [b]. A double monad. So, I added
transformers library to my project and rewritten that function as:
I would like to announce the release of a Java library for parsing hOCR documents: hOCR4J. You can download it from here. I’m planning to get it to Sonatype too, so you may be able to get it from there in the near future.
hOCR is an output format used by OCR programs, including Tesseract. It contains information about all the OCR’d words, their position, and their assumed organisation into lines and paragraphs. Currently, hOCR4J was tested to work with Tesseract-generated hOCR’s, I plan to test other OCR programs in the future.
hOCR4J parses hOCR documents, creates an immutable model for them (nice when using functional programming style), and provides various tools to manipulate and modify them.
hOCR4J makes a good starting point when developing an application which extracts data from OCR’d documents that have non-trivial layouts.
After several months of not-so-intensive work, I present the version 0.1 of the Units library: https://github.com/KarolS/units.
Units is a Scala library for providing type-level units of measurements checked on compile time. The goal of the library was to provide as seamless as possible way to check if the units used in arithmetic expressions are correct.
If you have tried to compile it: Yes it does compile that long. A clean build takes 100 seconds on my i7.
Compile-time unit checking has multiple applications:
scientists will be able to distinguish values in metres per second, metres, metres per second squared instead of crashing expensive space exploration equipment or drugging a patient
engineers will be able to distinguish values in metres, centimetres, feet, inches, litres, gallons instead of running out of fuel in the middle of a flight or thinking the distance to travel is many times shorter
designers will be able to distinguish values in millimetres, inches, pixels, points
economists will be able to distinguish values in euros, dollars, dollars per hour, ounces of gold
game developers will be able to distinguish values in pixels, tiles, damage points, minerals, barrels of vespen gas
network software developers will be able to distinguish values in kilobytes, kibibytes, kilobits, kilobits per second
and so on and on
There are not many languages with units of measurement support, the first that comes to mind is F#. I must admit that it is great at this. There also other languages that support units as a first-class language feature, and many that support units with a library. Those libraries vary in their expressibility and versatility, some of them only allow SI units, some of them require you to explicitly express relations between multiplied values, and some of them only support a limited subset of units. There have been earlier Scala libraries with units of measurements, but they all had severe limitations. Units library tries to be both expressive and versatile. While it’s not as powerful as F# built-in unit support, it definitely allows for quite a bit.
Enough of that, time for some examples that will showcase the main features.