Suppose you have two ZIP archives on two different machines and you want to synchronize their contents with minimal network traffic, without much regard for the CPU usage, and you know that the target machine already contains the older version of the archive, which is mostly identical to the new version. For example, you have to upload a fat JAR with the new build over a metered connection, and the server literally sits in a garage at the other end of the country.
If the archive is small, uploading it in its entirety looks simple enough, but if it grows into dozens of megabytes, of which 90% is non-changing 3rd party code, it quickly becomes a waste of both time and bandwidth.
Time is money. And depending on your ISP, bandwidth can be money too.
Long story short, I’m going to assume that the target machine runs Linux, and the source machine runs either Windows or Linux. I am going to use
Why those two? Because
fuse-zip can convert a ZIP archive into a writeable filesystem, and
rsync can sync a writeable filesystem with another. Details below.
Task 1 – Mount the archives
Create a mountpoint on the target machine, I’ll use
/tmp/z. Then mount your archive using
$ sudo fuse-zip -o allow_other,rw,uid=`id -u`,gid=`id -g` /stuff/target.zip /tmp/z
To test if it works, you create a file in the mounted filesystem. It would appear in the ZIP file after you unmount it.
$ echo TEST > /tmp/z/TEST_FILE_PLEASE_IGNORE $ sudo umount /tmp/z $ unzip -l /stuff/target.zip | grep TEST_FILE_PLEASE_IGNORE
Do not modify the ZIP archive while it’s being mounted and do not interrupt the unmounting process if the archive was mounted for writing, as this can corrupt it.
If your source machine runs Linux, you can do the same, but mount the archive read-only:
$ sudo fuse-zip -o allow_other,ro,uid=`id -u`,gid=`id -g` /stuff/source.zip /tmp/z
You should experiment a bit if
gid are necessary.
If your souce machine runs Windows, you can use Pismo File Mount to mount the archive as a drive and assign it a letter. It’s a GUI program, I haven’t found a scriptable alternative yet.
Task 2 – Synchronize the mounted filesystems
On the source machine:
$ rsync --delete -zrucv -e ssh /tmp/z/ remoteuser@targethost:/tmp/z/
Explanation for the parameters:
-cso that the files will be compared using checksums instead of modification dates. We do not care about nor trust the dates here, what matters is content.
-zfor the compression. Do not use the flag if the files themselves won’t compress well.
-rfor recursive synchronization.
-uso only modified files are uploaded.
--deleteso the files that are absent in the source archive will be deleted from the target archive. We want the target archive to have the same content as the source one, not more.
-vstands for verbose. Used once, makes
rsynclist all the files as they are being sent. You can skip this option, or repeat it several times, depending on your preference.
-e sshpicks the shell used to synchronize the files. SSH is the recommended one for most cases.
Note that the paths end with slashes, so
rsync synchronizes the contents of the directories, not the directories themselves.
If the source mashine is running Windows, you can use Cygwinhttps://www.cygwin.com/, just install appropriate packages (you’ll need
rsync and most likely also
openssh). If you used Pismo File Mount to mount the ZIP to e.g. J: drive, use
/cygdrive/j/ as your source directory.
Task 3 – Unmount the filesystems
On Linux (both source and target machine), do
$ sudo umount /tmp/z
You have to unmount the target archive so the changes take place, and the source archive so the next time you mount it, it would be current again.
Putting it all together
Here is a small script that can be used to synchronize two ZIP archives:
#!/bin/bash # put all the necessary values here REMOTEUSER=... REMOTEHOST=... SOURCEZIP=... TARGETZIP=... OS=`uname -o` if [ "$OS" = Cygwin ] ; then # depends on where you mount it; assuming here the J: drive P=/cygdrive/j else P=/tmp/z mkdir -p "$P" sudo umount "$P" sudo fuse-zip -o allow_other,uid=`id -u`,gid=`id -g`,ro "$SOURCEZIP" "$P" fi ssh $REMOTEUSER@$REMOTEHOST sudo jar_mount_rw rsync --delete -zrucv -e "ssh" "$P/" $REMOTEUSER@$REMOTEHOST:/tmp/z/ ssh $REMOTEUSER@$REMOTEHOST sudo umount /tmp/z if [ "$OS" = Cygwin ] ; then echo 'Finished. Unmount the J: drive.' else PID="`pgrep fuse-zip`" if [ -z "$PID" ] ; then exit else sudo umount "$P" fi fi
It requires this script to be present as
jar_mount_rw on the target machine:
#!/bin/bash # put all the necessary values here TARGETZIP=... umount /tmp/z mkdir -p /tmp/z fuse-zip -o allow_other,rw,uid=`id -u`,gid=`id -g` "$TARGETZIP" /tmp/z
I don’t have any hard data, but the overall effect is that a 40 MB JAR archive with only few files modified in it gets synchronized in few seconds, while it could take several minutes on slower connections to upload it whole.