Well after the Xdelta adventure from last month I was wondering if there was anyway to speedup doing the CRC32 checksums on files. First I started by looking for parallel implementations, unfortunately all I could find were hardware implementations (FPGAs, etc). Then I stumbled upon an Intel paper for a faster CRC generator.
Unfortunately the code they put up on SF was really only a benchmark to compare it against the Sarwate algorithm, was written for icc instead of gcc and it used the generator polynomial for iSCSI instead of the IEEE 802.3 that most CRC32 commands use (had to email one of the paper's authors to notice that one).
After modifying the Zlib lookup table generator code to create the expected tables, modifying the code to compile with gcc and adding some input file handling the end result is a working CRC32 program that is
potentially 2-3X faster than the ones most computers ship with. I only say potentially because almost always the checksum time is IO bound so in real life usage you'll probably only a small (a few seconds) difference. Unfortunately I don't have an SSD to test with :-(
- Source Code - compile with gcc -O2 crc32.c tables.c
Note: The make_tables.c code is there if you want to create a new tables.c with a different generator polynomial. - OS X Universal Binary - x86_64, i386, ppc (again only tested on Intel Macs)
Checksums for UB
- CRC32: 3588b2f9
- MD5: 42bfecc06191772185c23135a2569b43
- SHA1: 7e9cdab72539a840f8ef44d8a3d621433dc9a0db
After OS X being sneaky and caching the files to RAM we see the following results. The top is the Perl script that uses the Zlib implementation (which is what OS X calls when you run the crc32 command), the bottom is the slicing by eight algorithm.
$ time crc32 7.2-RELEASE-amd64-disc1.iso
1ec96f77
real 0m1.638s
user 0m1.182s
sys 0m0.446s
$ time ./crc32 7.2-RELEASE-amd64-disc1.iso
1ec96f77
real 0m0.755s
user 0m0.512s
sys 0m0.237s