head 1.29; access; symbols act_first:1.1.1.1 act:1.1.1; locks; strict; comment @ * @; 1.29 date 2003.01.06.12.10.57; author rse; state Exp; branches; next 1.28; 1.28 date 2002.01.16.19.30.36; author rse; state Exp; branches; next 1.27; 1.27 date 2002.01.02.17.05.53; author rse; state Exp; branches; next 1.26; 1.26 date 2001.03.21.16.08.22; author rse; state Exp; branches; next 1.25; 1.25 date 2001.03.21.15.59.01; author rse; state Exp; branches; next 1.24; 1.24 date 2001.03.17.10.52.27; author rse; state Exp; branches; next 1.23; 1.23 date 2001.03.17.10.39.34; author rse; state Exp; branches; next 1.22; 1.22 date 2000.09.02.16.39.29; author rse; state Exp; branches; next 1.21; 1.21 date 2000.08.18.19.36.10; author rse; state Exp; branches; next 1.20; 1.20 date 2000.08.18.15.58.09; author rse; state Exp; branches; next 1.19; 1.19 date 2000.08.18.15.35.57; author rse; state Exp; branches; next 1.18; 1.18 date 2000.07.19.06.34.28; author rse; state Exp; branches; next 1.17; 1.17 date 2000.07.14.19.46.03; author rse; state Exp; branches; next 1.16; 1.16 date 2000.07.14.19.36.36; author rse; state Exp; branches; next 1.15; 1.15 date 2000.05.28.13.21.12; author rse; state Exp; branches; next 1.14; 1.14 date 2000.05.28.13.18.10; author rse; state Exp; branches; next 1.13; 1.13 date 2000.05.21.12.28.44; author rse; state Exp; branches; next 1.12; 1.12 date 2000.02.11.19.47.49; author rse; state Exp; branches; next 1.11; 1.11 date 2000.01.26.10.43.08; author rse; state Exp; branches; next 1.10; 1.10 date 99.11.24.08.38.22; author rse; state Exp; branches; next 1.9; 1.9 date 99.11.24.08.36.44; author rse; state Exp; branches; next 1.8; 1.8 date 99.11.21.12.18.38; author rse; state Exp; branches; next 1.7; 1.7 date 99.11.21.12.13.43; author rse; state Exp; branches; next 1.6; 1.6 date 99.11.20.23.03.17; author rse; state Exp; branches; next 1.5; 1.5 date 99.11.13.15.58.07; author rse; state Exp; branches; next 1.4; 1.4 date 99.11.13.15.47.43; author rse; state Exp; branches; next 1.3; 1.3 date 99.11.13.15.32.36; author rse; state Exp; branches; next 1.2; 1.2 date 99.11.13.15.31.57; author rse; state Exp; branches; next 1.1; 1.1 date 99.11.12.20.37.20; author rse; state Exp; branches 1.1.1.1; next ; 1.1.1.1 date 99.11.12.20.37.20; author rse; state Exp; branches; next ; desc @@ 1.29 log @cleanup source tree @ text @/* ** OSSP act - Abstract Container Types ** Copyright (c) 1999-2003 Ralf S. Engelschall ** Copyright (c) 1999-2003 The OSSP Project ** ** This file is part of OSSP act, an abstract container type library ** which can be found at http://www.ossp.org/pkg/lib/act/. ** ** Permission to use, copy, modify, and distribute this software for ** any purpose with or without fee is hereby granted, provided that ** the above copyright notice and this permission notice appear in all ** copies. ** ** THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED ** WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF ** MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. ** IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR ** CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, ** SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT ** LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF ** USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ** ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, ** OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT ** OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF ** SUCH DAMAGE. ** ** act_hash_fct.c: hash functions */ /* ** This is a large collection of more or less reasonable hash functions ** for use in conjunction with hash table lookups. One should not use ** these functions for cryptography, of course. They do only fulfill the ** (weaker) requirements of the hash table lookup discipline: ** ** 1. the function must be deterministic and stateless ** 2. the function must be very fast to compute ** 3. the function must distribute the keys very good ** ** Every function in this piece of source has the following signature: ** ** act_uint32_t act_hash_fct_(act_uint8_t *key, act_size_t len) ** ** This means that every function takes a pointer to the key data ** (accessed byte-wise) plus the key length (in bytes) and gives back an ** integer at least 32 bit in size (ANSI C requires `long' to be greater ** than `int' and `int' to be greater than `char', so `long' is at least ** 32 bits if one assumes that only 2^k aligned sizes exist). ** ** That a hash function causes collisions is clear already from the ** famous Birthday Paradoxon (if 23 people are in a room, the chance ** is already over 50% that the birthday of two people falls onto the ** same day of the year; or in oder words: if you hash 23 keys into 365 ** buckets you already have to expect a collision). ** ** Usually there are a gazillion more possible keys than buckets, so ** the best any hash function can do is to map an equal number of those ** gazillion keys to each bucket. The number of collisions you get is ** expected to follow the Chi^2 distribution. ** ** Here's how Chi^2 is computed: ** 1. Lookup: b = total number of buckets ** 2. Lookup: k = total number of keys ** 3. Lookup: b_i = number of buckets which have i keys ** 4. Compute: p = k/b (the expected number of keys per bucket) ** 5. Compute: Chi^2 = sum (over all i) (b_i*((i-p)^2)/p) ** ** The distribution is expected to have a result close to b, i.e., ** within 3*sqrt(b) of b. Chi^2 measures are usually reported in units of ** standard deviations. That is, if the formula above gives b+c*sqrt(b), ** they report c, and c is expected to be between -3 and 3. ** ** For comparing the hash functions (including the calculation of Chi^2) ** run `make test-hash-fct' which compiles the test suite at the end ** of this source. The order of writing down of the hash functions in ** this source follows the results of the comparisons, i.e. the hash ** functions are ordered strongest to weakest. On a Pentium-II/800 ** under FreeBSD 4.3 the table looks approximately as following: ** ** +-----------------------------------------------------------------------------+ ** | Hash Func Time Coll00 Coll55 CollNN Used Min Max Diff Chi2/S Chi2/B | ** + ---------- ------ ------ ------ ------ ----- ---- ---- ---- ------- ------- + ** | DJBX33A 0.62 0 0 0 85.80 0 9 9 0.19 2.72 | ** | DJBX33X 0.61 0 0 0 86.30 0 7 7 -2.24 -0.79 | ** | JEDI 0.65 0 0 0 87.90 0 7 7 0.13 -1.68 | ** | VOCONG 0.84 0 0 0 86.40 0 7 7 -0.92 -0.95 | ** | CDT 0.99 0 0 0 88.50 0 7 7 0.35 -2.09 | ** | JOTCL 0.85 0 0 0 85.50 0 11 11 -2.12 2.84 | ** | BJDDJ 1.17 0 0 0 87.20 0 7 7 1.90 -1.93 | ** | CRC32 1.48 0 0 0 86.40 0 9 9 0.85 0.44 | ** | TEADM 1.99 0 0 32 86.60 0 7 7 -0.70 -2.15 | ** | CPOAAT 1.16 0 0 0 85.20 0 8 8 0.70 2.02 | ** | FNV 2.05 0 0 0 85.80 0 8 8 -0.79 0.82 | ** | OZSDBM 1.18 999000 0 2 85.30 0 8 8 0.47 0.85 | ** | KAZLIB 2.71 0 0 0 85.50 0 8 8 7.74! 2.75 | ** | BUZHASH 1.38 30256 30256 976 86.90 0 8 8 -1.80 1.07 | ** | PEARSON 2.69 3160 5238 0 85.30 0 9 9 -1.64 1.80 | ** | RIFKIN 1.23 999000 124000 10754 86.00 0 9 9 0.85 0.63 | ** | ASU 1.56 0 0 0 85.50 0 7 7 441.58! -1.33 | ** | HOLUB 1.64 999000 82170 2 87.70 0 8 8 441.17! -2.31 | ** | CBU 0.56 999000 967272 2834 86.20 0 8 8 216.29! 0.73 | ** | CVS 1.64 999000 82170 2 87.70 0 8 8 441.17! -2.31 | ** +-----------------------------------------------------------------------------+ ** ** For further reading on the topic, start at Bob Jenkins ``Hashing ** Frequently Asked Questions'' and ``Hash Evaluation'' Paper: ** o http://burtleburtle.net/bob/hash/hashfaq.html ** o http://burtleburtle.net/bob/hash/evahash.html */ #include "act.h" #include "act_p.h" /* * DJBX33A (Daniel J. Bernstein, Times 33 with Addition) * * This is Daniel J. Bernstein's popular `times 33' hash function as * posted by him years ago on comp.lang.c. It basically uses a function * like ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best * known hash functions for strings. Because it is both computed very * fast and distributes very well. * * The magic of number 33, i.e. why it works better than many other * constants, prime or not, has never been adequately explained by * anyone. So I try an explanation: if one experimentally tests all * multipliers between 1 and 256 (as RSE did now) one detects that even * numbers are not useable at all. The remaining 128 odd numbers * (except for the number 1) work more or less all equally well. They * all distribute in an acceptable way and this way fill a hash table * with an average percent of approx. 86%. * * If one compares the Chi^2 values of the variants, the number 33 not * even has the best value. But the number 33 and a few other equally * good numbers like 17, 31, 63, 127 and 129 have nevertheless a great * advantage to the remaining numbers in the large set of possible * multipliers: their multiply operation can be replaced by a faster * operation based on just one shift plus either a single addition * or subtraction operation. And because a hash function has to both * distribute good _and_ has to be very fast to compute, those few * numbers should be preferred and seems to be the reason why Daniel J. * Bernstein also preferred it. * * Below there are two variants: the original variant with only the * multiplication optimized via bit shifts and additionally a variant * which has the hash unrolled eight times for speed. Both additionally * are optimized for speed even more by unrolling the loop. */ intern act_uint32_t act_hash_fct_djbx33a( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 5381; #ifdef ACT_NON_OPTIMIZE while (len-- > 0) hash = ((hash << 5) + hash) + *key++; #else /* variant with the hash unrolled eight times */ for (; len >= 8; len -= 8) { hash = ((hash << 5) + hash) + *key++; hash = ((hash << 5) + hash) + *key++; hash = ((hash << 5) + hash) + *key++; hash = ((hash << 5) + hash) + *key++; hash = ((hash << 5) + hash) + *key++; hash = ((hash << 5) + hash) + *key++; hash = ((hash << 5) + hash) + *key++; hash = ((hash << 5) + hash) + *key++; } switch (len) { case 7: hash = ((hash << 5) + hash) + *key++; /* fallthrough... */ case 6: hash = ((hash << 5) + hash) + *key++; /* fallthrough... */ case 5: hash = ((hash << 5) + hash) + *key++; /* fallthrough... */ case 4: hash = ((hash << 5) + hash) + *key++; /* fallthrough... */ case 3: hash = ((hash << 5) + hash) + *key++; /* fallthrough... */ case 2: hash = ((hash << 5) + hash) + *key++; /* fallthrough... */ case 1: hash = ((hash << 5) + hash) + *key++; break; default: /* case 0: */ break; } #endif return hash; } /* * DJBX33X (Daniel J. Bernstein, Times 33 with Exclusive-Or) * * This is Daniel J. Bernstein's revised `times 33' hash function * which is currently favored by him (see his CDB package): it uses * exclusive-or instead of addition to merge in the key information. * It behaves mostly equal to the DJBX33A hash, i.e. it is also a very * good hash (both fast and with good distribution). It can be found for * instance in his CDB package (see cdb_hash.c). */ intern act_uint32_t act_hash_fct_djbx33x( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 5381; #ifdef ACT_NON_OPTIMIZE while (len-- > 0) hash = ((hash << 5) + hash) ^ *key++; #else /* variant with the hash unrolled eight times */ for (; len >= 8; len -= 8) { hash = ((hash << 5) + hash) ^ *key++; hash = ((hash << 5) + hash) ^ *key++; hash = ((hash << 5) + hash) ^ *key++; hash = ((hash << 5) + hash) ^ *key++; hash = ((hash << 5) + hash) ^ *key++; hash = ((hash << 5) + hash) ^ *key++; hash = ((hash << 5) + hash) ^ *key++; hash = ((hash << 5) + hash) ^ *key++; } switch (len) { case 7: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 6: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 5: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 4: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 3: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 2: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 1: hash = ((hash << 5) + hash) ^ *key++; break; default: /* case 0: */ break; } #endif return hash; } /* * JEDI (Frank Denis ) * * The Jedi hash is a variant of Daniel Bernstein's `times 33' hash * function (using exclusive-or), which Frank 'Jedi' Denis created for * a patch to Linux's ReiserFS. It assumes that the key is a filesystem * path and this way attempts to achieve a better distribution by * hashing from right to left and by treating the key as a text string. * So, this variant of DJB's original hash function is intended for * hashing filesystem path like strings. Below there are two variants: * the original variant from Frank Denis and additionally a variant * which has the hash unrolled eight times for speed. */ intern act_uint32_t act_hash_fct_jedi( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 5381; #ifdef ACT_NON_OPTIMIZE while (len-- > 0) hash = ((hash << 5) + hash) ^ (key[len] - '0'); #else /* variant with the hash unrolled eight times */ while (len >= 8) { hash = ((hash << 5) + hash) ^ (key[--len] - '0'); hash = ((hash << 5) + hash) ^ (key[--len] - '0'); hash = ((hash << 5) + hash) ^ (key[--len] - '0'); hash = ((hash << 5) + hash) ^ (key[--len] - '0'); hash = ((hash << 5) + hash) ^ (key[--len] - '0'); hash = ((hash << 5) + hash) ^ (key[--len] - '0'); hash = ((hash << 5) + hash) ^ (key[--len] - '0'); hash = ((hash << 5) + hash) ^ (key[--len] - '0'); } switch (len) { case 7: hash = ((hash << 5) + hash) ^ (key[--len] - '0'); /* fallthrough... */ case 6: hash = ((hash << 5) + hash) ^ (key[--len] - '0'); /* fallthrough... */ case 5: hash = ((hash << 5) + hash) ^ (key[--len] - '0'); /* fallthrough... */ case 4: hash = ((hash << 5) + hash) ^ (key[--len] - '0'); /* fallthrough... */ case 3: hash = ((hash << 5) + hash) ^ (key[--len] - '0'); /* fallthrough... */ case 2: hash = ((hash << 5) + hash) ^ (key[--len] - '0'); /* fallthrough... */ case 1: hash = ((hash << 5) + hash) ^ (key[--len] - '0'); break; default: /* case 0: */ break; } #endif return hash ^ ((hash & 0x7f) << 24); } /* * VOCONG (Phong Vo, Congruential Hash) * * This is Phong Vo 's linear congruential hash. * It's a very fast one and (although of its simplicity) it distributes * surprisingly well. It can be found for instance in the Berkeley-DB 3.x * package (hash/hash_func.c). */ intern act_uint32_t act_hash_fct_vocong( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0; while (len-- > 0) hash = hash * 0x63c63cd9 + 0x9c39c33d + *key++; return hash; } /* * CDT (Container Data Type, Congruential Hash) * * This is the linear congruential hash ``h * 17 + c + 97531'' as used * in the AT&T's Cdt library. It is very fast and distributes very well. * For details see Cdt's cdt.h file. */ intern act_uint32_t act_hash_fct_cdt( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0; while (len-- > 0) hash = (hash << 4) + hash + *key++ + 97531; return hash; } /* * JOTCL (John Ousterhout, Tcl) * * This is John Ousterhout's hash from his Tcl 8.2's tclHash.c. He * said, he has chosen this particular hash (multiply by 9 and add new * character) because of the following reasons: * 1. Multiplying by 10 is perfect for keys that are decimal strings, * and multiplying by 9 is just about as good. * 2. Times-9 is (shift-left-3) plus (old). This means that each * character's bits hang around in the low-order bits of the hash * value for ever, plus they spread fairly rapidly up to the * high-order bits to fill out the hash value. This seems works well * both for decimal and non-decimal strings * In fact this is a fast hash, but the original version which * initializes the hash with 0 causes collissions for keys with * increasing same bytes. So our variant here uses the golden ratio (but * every arbitrary value != 0 should work) instead. */ intern act_uint32_t act_hash_fct_jotcl( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0x9e3779b9; while (len-- > 0) hash = ((hash << 3) + hash) + *key++; return hash; } /* * BJDDJ (Bob Jenkins, Dr. Dobbs Journal) * * This is a very complex but also very good hash function, as proposed * in the March'97 issue of Dr. Dobbs Journal (DDJ) by Bob Jenkins (see * http://burtleburtle.net/bob/hash/doobs.html for online version). He * showed that this hash function has both very good distribution and * performance and our own hash function comparison confirmed this. The * only difference to the original function of B.J. here is that our * version doesn't provide the `level' (= previous hash) argument for * consistency reasons with the other hash functions (i.e. same function * signature). It can be definetely recommended as a good general * purpose hash function. */ intern act_uint32_t act_hash_fct_bjddj( register act_uint8_t *k, register act_size_t length) { register act_uint32_t a,b,c,len; /* some abbreviations */ #define ub4 act_uint32_t #define mix(a,b,c) { \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<< 8); \ c -= a; c -= b; c ^= (b>>13); \ a -= b; a -= c; a ^= (c>>12); \ b -= c; b -= a; b ^= (a<<16); \ c -= a; c -= b; c ^= (b>> 5); \ a -= b; a -= c; a ^= (c>> 3); \ b -= c; b -= a; b ^= (a<<10); \ c -= a; c -= b; c ^= (b>>15); \ } /* setup the internal state */ len = length; a = b = 0x9e3779b9; /* the golden ratio; an arbitrary value */ c = 0; /* handle most of the key */ while (len >= 12) { a += (k[0] +((ub4)k[1]<<8) +((ub4)k[ 2]<<16) +((ub4)k[ 3]<<24)); b += (k[4] +((ub4)k[5]<<8) +((ub4)k[ 6]<<16) +((ub4)k[ 7]<<24)); c += (k[8] +((ub4)k[9]<<8) +((ub4)k[10]<<16) +((ub4)k[11]<<24)); mix(a,b,c); k += 12; len -= 12; } /* handle the last 11 bytes */ c += length; switch(len) { /* all the case statements fall through */ case 11: c+=((ub4)k[10]<<24); case 10: c+=((ub4)k[ 9]<<16); case 9 : c+=((ub4)k[ 8]<< 8); /* the first byte of c is reserved for the length */ case 8 : b+=((ub4)k[ 7]<<24); case 7 : b+=((ub4)k[ 6]<<16); case 6 : b+=((ub4)k[ 5]<< 8); case 5 : b+=k[4]; case 4 : a+=((ub4)k[ 3]<<24); case 3 : a+=((ub4)k[ 2]<<16); case 2 : a+=((ub4)k[ 1]<< 8); case 1 : a+=k[0]; /* case 0: nothing left to add */ } mix(a,b,c); #undef ub4 #undef mix /* report the result */ return c; } /* * CRC32 (Cyclic Redundancy Check 32-Bit) * * This hash function is based on the CRC-32 (Cyclic Redundancy Check * with 32 Bit) algorithm as invented by Mark Adler. It one of the hash * functions with medium performance but with very good distribution. So * it can be considered as a rock solid general purpose hash function. */ intern act_uint32_t act_hash_fct_crc32( register act_uint8_t *key, register act_size_t len) { static act_uint32_t tab[256] = { 0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL, 0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L, 0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL, 0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L, 0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L, 0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L, 0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL }; register act_uint32_t hash; hash = 0xffffffff; while (len-- > 0) hash = tab[(hash ^ *key++) & 0xff] ^ (hash >> 8); hash ^= 0xffffffff; return hash; } /* * CPOAAT (Colin Plumb, One-At-A-Time) * * This hash function was derived by Bob Jenkins from requirements posed * by Colin Plumb. It was named `One At A Time' by Bob Jenkins. Analysis * suggested that there were no funnels in this hash, i.e. every input * bit affects every output bit. Additionally it's a very fast hash and * only the original function (which started with "hash = 0") disliked * progressing keys a little bit (which doesn't hurt in practice). Our * variant above uses the value of the DJBX33A hash (but any arbitrary * value should work) and this way avoid this, too. */ intern act_uint32_t act_hash_fct_cpoaat( register act_uint8_t *ptr, register act_size_t len) { register act_uint32_t hash = 5381; while (len-- > 0) { hash += *ptr++; hash += (hash << 10); hash ^= (hash >> 6); } hash += (hash << 3); hash ^= (hash >> 11); hash += (hash << 15); return hash; } /* * TEADM (Tiny Encryption Algorithm & Davis-Meyer) * * The TEA hash is a keyed 32-bit hash function using Tiny Encryption * Algorithm (TEA) in a Davis-Meyer function (H0 = Key, Hi = E * Mi(Hi-1) + Hi-1). For details see Applied Cryptography, 2nd * edition, p448. It was found in ReiserFS's hashing code as written * by Jeremy Fitzhardinge . This hash is actually a * cryptographically strong hash and this way not really optimimal for * use inside hash data structures. Because it is slower than most of * the other functions, although it distributes very well. */ intern act_uint32_t act_hash_fct_teadm( register act_uint8_t *key, register act_size_t len) { act_uint32_t k[] = { 0x9464a485, 0x542e1a94, 0x3e846bff, 0xb75bcfc3 }; act_uint32_t h0 = k[0], h1 = k[1]; act_uint32_t a, b, c, d; act_uint32_t pad; int i; #define TEAFULLROUNDS 10 /* 32 is overkill, 16 is strong crypto */ #define TEAPARTROUNDS 6 /* 6 gets complete mixing */ /* a, b, c, d - data; h0, h1 - accumulated hash */ #define TEACORE(rounds) \ do { \ act_uint32_t sum = 0; \ int n = rounds; \ act_uint32_t b0, b1; \ b0 = h0; \ b1 = h1; \ do { \ sum += 0x9E3779B9; \ b0 += ((b1 << 4)+a) ^ (b1+sum) ^ ((b1 >> 5)+b); \ b1 += ((b0 << 4)+c) ^ (b0+sum) ^ ((b0 >> 5)+d); \ } while(--n); \ h0 += b0; \ h1 += b1; \ } while(0) pad = (act_uint32_t)len | ((act_uint32_t)len << 8); pad |= pad << 16; while (len >= 16) { a = (act_uint32_t)key[ 0] | (act_uint32_t)key[ 1] << 8 | (act_uint32_t)key[ 2] << 16| (act_uint32_t)key[ 3] << 24; b = (act_uint32_t)key[ 4] | (act_uint32_t)key[ 5] << 8 | (act_uint32_t)key[ 6] << 16| (act_uint32_t)key[ 7] << 24; c = (act_uint32_t)key[ 8] | (act_uint32_t)key[ 9] << 8 | (act_uint32_t)key[10] << 16| (act_uint32_t)key[11] << 24; d = (act_uint32_t)key[12] | (act_uint32_t)key[13] << 8 | (act_uint32_t)key[14] << 16| (act_uint32_t)key[15] << 24; TEACORE(TEAPARTROUNDS); len -= 16; key += 16; } if (len >= 12) { if (len >= 16) *(int *)0 = 0; a = (act_uint32_t)key[ 0] | (act_uint32_t)key[ 1] << 8 | (act_uint32_t)key[ 2] << 16| (act_uint32_t)key[ 3] << 24; b = (act_uint32_t)key[ 4] | (act_uint32_t)key[ 5] << 8 | (act_uint32_t)key[ 6] << 16| (act_uint32_t)key[ 7] << 24; c = (act_uint32_t)key[ 8] | (act_uint32_t)key[ 9] << 8 | (act_uint32_t)key[10] << 16| (act_uint32_t)key[11] << 24; d = pad; for (i = 12; i < len; i++) { d <<= 8; d |= key[i]; } } else if (len >= 8) { if (len >= 12) *(int *)0 = 0; a = (act_uint32_t)key[ 0] | (act_uint32_t)key[ 1] << 8 | (act_uint32_t)key[ 2] << 16| (act_uint32_t)key[ 3] << 24; b = (act_uint32_t)key[ 4] | (act_uint32_t)key[ 5] << 8 | (act_uint32_t)key[ 6] << 16| (act_uint32_t)key[ 7] << 24; c = d = pad; for (i = 8; i < len; i++) { c <<= 8; c |= key[i]; } } else if (len >= 4) { if (len >= 8) *(int *)0 = 0; a = (act_uint32_t)key[ 0] | (act_uint32_t)key[ 1] << 8 | (act_uint32_t)key[ 2] << 16| (act_uint32_t)key[ 3] << 24; b = c = d = pad; for (i = 4; i < len; i++) { b <<= 8; b |= key[i]; } } else { if (len >= 4) *(int *)0 = 0; a = b = c = d = pad; for (i = 0; i < len; i++) { a <<= 8; a |= key[i]; } } TEACORE(TEAFULLROUNDS); return h0^h1; } /* * FNV (Glenn Fowler, Landon Curt Noll, Phong Vo) * * This is the Fowler-Noll-Vo (FNV) hash. The basis of the hash algorithm * was taken from an idea sent by Email to the IEEE Posix P1003.2 * mailing list from Phong Vo and Glenn Fowler * . Landon Curt Noll later * improved on their algorithm. The magic is in the interesting relationship * between the special prime 16777619 (2^24 + 403) and 2^32 and 2^8 (although * the description of the magic I couldn't find any longer). This hash * produces only very few collisions for real world keys and works well on * both numbers and strings. But it's one of the slower hashes. The variant * below uses the recommended FNV-1 initialization. For more details see * http://www.isthe.com/chongo/tech/comp/fnv/. */ intern act_uint32_t act_hash_fct_fnv( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0x811c9dc5L; while (len-- > 0) { hash *= 0x01000193L; hash ^= (act_uint32_t)(*key++); } return hash; } /* * OZSDBM (Ozan 'Oz' Yigit, SDBM) * * This is the hashing function which was originally designed * by Ozan (Oz) Yigit for his popular SDBM library (see hash.c * in http://www.cs.yorku.ca/~oz/sdbm.bun). It works relatively * well in scrambling bits. The actual function is in the form * ``hash(i) = hash(i-1) * 65599 + str[i]''. What is used here is the * faster version as found for instance in GNU awk (see array.c in * ftp://ftp.gnu.org/gnu/gawk/gawk-3.0.4.tar.gz), because 65599 = 2^6 + * 2^16- 1. The magic constant 65599 was picked out of thin air by Oz, * but turns out to be a prime. The number 65587 was claimed to be even * better by him, but was not actually used in SDBM. The optimized * variant below is very ugly to read, but fast. It breaks the key * up into 8 byte units. On the first time through the loop get the * "leftover bytes" (len % 8). On every other iteration, perform 8 * HASHC's so we handle all 8 bytes. Essentially, this saves 7 compare & * branch instructions. */ intern act_uint32_t act_hash_fct_ozsdbm( register act_uint8_t *ptr, register act_size_t len) { register act_uint32_t hash = 0; #ifdef ACT_NON_OPTIMIZE while (len-- > 0) hash = ((hash << 6) + (hash << 16) - hash) + *ptr++; #else if (len > 0) { register int loop = (len + 8 - 1) >> 3; #define HASHC hash = ((hash << 6) + (hash << 16) - hash) + *ptr++; switch (len & (8 - 1)) { case 0: do { HASHC; case 7: HASHC; case 6: HASHC; case 5: HASHC; case 4: HASHC; case 3: HASHC; case 2: HASHC; case 1: HASHC; } while (--loop); } } #endif return hash; } /* * KAZLIB (Kaz Kylheku, Hash Library) * * This is Kaz Kylheku's hash function as used in his kazlib (see * http://users.footprints.net/~kaz/kazlib.html) package. It has a very * good distribution, but unfortunately it is one of the slowest hash * functions. */ intern act_uint32_t act_hash_fct_kazlib( register act_uint8_t *key, register act_size_t len) { static act_uint32_t tab[] = { 0x49848f1bL, 0xe6255dbaL, 0x36da5bdcL, 0x47bf94e9L, 0x8cbcce22L, 0x559fc06aL, 0xd268f536L, 0xe10af79aL, 0xc1af4d69L, 0x1d2917b5L, 0xec4c304dL, 0x9ee5016cL, 0x69232f74L, 0xfead7bb3L, 0xe9089ab6L, 0xf012f6aeL, }; register act_uint32_t hash = 0; register act_uint8_t k; while (len-- > 0) { k = *key++; hash ^= tab[(k + hash) & 0x0f]; hash = (hash << 1) | (hash >> 31); /* hash &= 0xffffffffL; removed, because not necessary in Act */ hash ^= tab[((k >> 4) + hash) & 0x0f]; hash = (hash << 2) | (hash >> 30); /* hash &= 0xffffffffL; removed, because not necessary in Act */ } return hash; } /* * BUZHASH (Robert 'BUZ' Uzgalis, Hash) * * This is Robert 'BUZ' Uzgalis's hash function he published as * `buzhash' (see http://serve.net/buz/) The main difference in our * version is just that we use only 32 bits while the original uses * actually 64 bits (both in the table and the output). For the table * I've just stripped of the upper 32 bits of the values in the original * Java implementation. The table consists of random values, but if you * write down the values one per line, then in each bit column there are both * 128 one and 128 zero bits (which is for a good statistically expected * distribution). */ intern act_uint32_t act_hash_fct_buzhash( register act_uint8_t *key, register act_size_t len) { static act_uint32_t tab[256] = { 0x043a46fL, 0x6e7eac19L, 0xcf055952L, 0xf010101L, 0x128e8a64L, 0xadcfef2L, 0x42e20c6cL, 0xb1095c58L, 0x5361d67L, 0xc7a4b199L, 0x2f24df2L, 0xd0549327L, 0x9a3b180fL, 0xb21f2ebL, 0x3cff1325L, 0x7b575b9L, 0x8a23b7e2L, 0xfbd9091dL, 0x34dbdf9L, 0xb68d6313L, 0x6d06b93L, 0xeba548afL, 0xacc917c9L, 0xdffbcfaL, 0xd301f3b5L, 0x1663592L, 0xf6ce9e4fL, 0x13206f02L, 0x2dc50f7L, 0x3e880a87L, 0xbbf065dL, 0x8fabcb6dL, 0x9116f2d0L, 0xb9af152L, 0xe85aec09L, 0xc4fc987L, 0xa9ce535eL, 0xb849398eL, 0xd2e70d8L, 0xae19b18fL, 0x7d5ebeaL, 0xfdc60511L, 0x3fcc44afL, 0x4a68f17L, 0xa09aafdcL, 0x94a3294L, 0xae1de1b9L, 0xfd1c1dd0L, 0x8b98ee6L, 0xd357dabcL, 0xe8826aaL, 0xec4055f1L, 0x4c34f8a9L, 0x170e402L, 0x55eca72eL, 0x1bde03fL, 0x25e368ffL, 0x0b120f4aL, 0x028f728L, 0x14df0433L, 0xdd3601eL, 0xaa052772L, 0xe427f736L, 0x3e35041L, 0x69b76914L, 0x3b3c01cL, 0x307d6fafL, 0xc221deccL, 0x4281a5dL, 0xa2fcaba7L, 0x66d4a9fL, 0x02c4be93L, 0x332ecb2fL, 0x6f74ab0L, 0x2f1dfe8fL, 0x152a6f9L, 0xc2ea9be7L, 0x86c1899eL, 0x3bdefd7L, 0x7512901bL, 0x94a1fbdL, 0x3d47ff0dL, 0xc6f78e66L, 0xe2d25d2L, 0x0134d573L, 0x1023afaL, 0xc8c66c0aL, 0xd54c12edL, 0xf6689f0L, 0x67f7677aL, 0x67b9867L, 0xcd5b2341L, 0x1733f9bcL, 0xbc867bfL, 0xd9418811L, 0x7499083L, 0xdf9b12e8L, 0xec3e0928L, 0x6d08914L, 0x758e524aL, 0x000f455L, 0x1a786c79L, 0x8e012db1L, 0xd7b42faL, 0x25cda5f0L, 0xfba9220L, 0x605a11e1L, 0x6cb23e6cL, 0xb483b87L, 0xb997ee22L, 0x77f7362L, 0x2c1768d4L, 0x1673f9adL, 0x11fe93dL, 0x04e1cde4L, 0x0747250L, 0x005b5db6L, 0xbbaf4817L, 0x379e196L, 0xaca98701L, 0x24bde84L, 0x9fabbcb6L, 0x4a97882bL, 0x59a1fd8L, 0x7ec7ce10L, 0x780f244L, 0x2f61b3ffL, 0xa1c71c95L, 0xb2d765cL, 0xf988514dL, 0xa98e840L, 0x1411bc42L, 0xaa4482c2L, 0xd9d47daL, 0xf128a622L, 0x5ba5647L, 0x18962dbdL, 0x70f6d242L, 0x7635d81L, 0x43753680L, 0xaeaab4cL, 0x810f2220L, 0x65d9c0b1L, 0x8356c94L, 0x30f27e2fL, 0xd16b440L, 0x35771070L, 0xe9bc2336L, 0x935d2fdL, 0xf4720cffL, 0x975173cL, 0x520e2405L, 0xa9e73ce2L, 0x62623a7L, 0x18e26104L, 0x0e4f061L, 0x464cfee9L, 0xccdc534aL, 0xf192a14L, 0x94b71649L, 0xaeb0675L, 0x4647e040L, 0x397f1004L, 0x4ec8dfcL, 0xbfd0006bL, 0x5b4ed0fL, 0xba6bccbeL, 0x2e03fe1bL, 0x6f0b363L, 0xc3392942L, 0x2d6cf9cL, 0xce55fec5L, 0x09b40463L, 0x14a310bL, 0x7bbcf76bL, 0xc249602L, 0xc4e99555L, 0xde625355L, 0xc1aa55dL, 0x3eaced91L, 0x3b9ff1eL, 0x8d381f2dL, 0xbcdb5ba8L, 0xf7792bcL, 0xf05a19a0L, 0x60ffb0cL, 0x1a68fa68L, 0x02b1ce1aL, 0xa610474L, 0xd1a0fecbL, 0x90e8533L, 0x23f84d95L, 0x83c110c4L, 0xf90588dL, 0x9ee04455L, 0x40504baL, 0xfee93369L, 0x85804099L, 0xbe5d01bL, 0x4b3865d6L, 0xa5c108fL, 0x9654f2dcL, 0xb0d19772L, 0x406152bL, 0x7be2b8a5L, 0x92967adL, 0x6308e597L, 0xe874e16aL, 0xd2e274fL, 0x6007fc05L, 0x230fc39L, 0x99144de1L, 0x8dcc89b3L, 0x4161bfdL, 0x498cd270L, 0xdbbd9f8L, 0x5628d7d0L, 0x205d9ea4L, 0x214ebfaL, 0xd1ebedafL, 0x237002fL, 0x147e6e5eL, 0x4483ebd3L, 0x9b05aa6L, 0x3517c363L, 0x8e9e8a2L, 0x19d89df6L, 0x62defab3L, 0x4f4e201L, 0x57c48f3fL, 0x8e6e5dcL, 0x5fa6d27aL, 0x1dc3078eL, 0xca367f9L, 0xfdcbb7ccL, 0xf36414bL, 0x1d3a034fL, 0x122d654fL, 0xb336078L, 0x3a8b9600L, 0xb5f1484L, 0x3ccfb7c6L, 0x2ff89cf1L, 0x09919a6L, 0xfa83287eL, 0x694b7cdL, 0x77df5aeaL, 0x944508ccL, 0x581fbb8L, 0x728a05cbL, 0x4a31712L, 0xc2f6acfaL, 0x6e560b10L, 0xd8d7ce1L, 0x0d2b2adeL, 0x0bbaa936L }; register act_uint32_t hash = 0xe9ae3b8aL /* random init */; while (len-- > 0) hash = ((hash<<1)^((hash>>31)&1)) ^ tab[*key++]; return hash; } /* * PEARSON (Peter K. Pearson) * * This historical hash function was published by Peter K. Pearson. * He claimed this algorithm worked well for text strings (with 8-bit * bytes). The used table is an arbitrary permutation of the values * 0x00..0xff I've calculated with a small Perl script for ACT. * Additionally the version below contains actually four Pearson hash * functions combined, one for each byte of the 32 bit hash in order * to really generate a 32 bit hash (and not just an 8 bit hash as the * original). As a result its now unfortunately a slower hash (because * of the four byte output) but distributes real world keys very well. * OTOH progressing keys consisting of the same bytes it dislikes very * much and then produces lots of collisions (but that doesn't matter * usually). */ intern act_uint32_t act_hash_fct_pearson( register act_uint8_t *key, register act_size_t len) { static unsigned char ptab[256] = { 0xd0, 0x24, 0x61, 0x1f, 0x65, 0xfb, 0xe1, 0x12, 0x64, 0xa7, 0xd9, 0x7f, 0x49, 0xf1, 0xfc, 0x89, 0xd8, 0x57, 0x03, 0xda, 0x4a, 0x4e, 0xc8, 0xb9, 0x42, 0x7b, 0x44, 0x88, 0x3e, 0x6e, 0x1d, 0xc2, 0x96, 0x5d, 0x10, 0x67, 0x2b, 0x31, 0x5f, 0x2c, 0xfe, 0x4f, 0x01, 0x7d, 0xf6, 0xe7, 0x15, 0x54, 0xaa, 0x29, 0x81, 0x0b, 0xde, 0xc1, 0xc0, 0x16, 0x35, 0xf2, 0xc5, 0x43, 0x22, 0x41, 0xc9, 0x5a, 0xc6, 0x6a, 0x04, 0xb8, 0x94, 0xac, 0xc4, 0x1c, 0x36, 0x71, 0xaf, 0x17, 0xfd, 0xe6, 0x20, 0x56, 0x38, 0xbf, 0x55, 0xdf, 0x3d, 0x98, 0x40, 0x09, 0x0d, 0x33, 0xb7, 0x90, 0x76, 0xca, 0xff, 0x9c, 0x73, 0x7e, 0xa6, 0x6d, 0xcb, 0x39, 0xc3, 0xd5, 0xce, 0xa4, 0xc7, 0x27, 0xcf, 0x58, 0x1b, 0xb2, 0x8d, 0x11, 0x0c, 0x0f, 0x34, 0xb4, 0x69, 0xd6, 0x2f, 0xa5, 0x51, 0x32, 0x37, 0x6f, 0x8c, 0xcd, 0xba, 0x5e, 0x82, 0x1a, 0xa9, 0x46, 0x91, 0x93, 0xbc, 0xbe, 0xe2, 0x4b, 0x18, 0xdc, 0xeb, 0x3c, 0x21, 0x47, 0x70, 0x4d, 0xae, 0xf9, 0xee, 0xa3, 0xec, 0x97, 0x08, 0xab, 0xad, 0xbd, 0x48, 0xb0, 0xa0, 0xb3, 0x68, 0xd7, 0xe4, 0xe3, 0x79, 0x4c, 0x95, 0x8b, 0xb1, 0xf8, 0x2a, 0xa8, 0x9a, 0x30, 0xf3, 0xf5, 0xd3, 0x50, 0xf0, 0x9e, 0x63, 0x9d, 0x72, 0x3f, 0xd2, 0x85, 0x60, 0x3b, 0x0e, 0x6b, 0x19, 0x52, 0xe0, 0xef, 0x13, 0x6c, 0xb5, 0x8e, 0x00, 0x14, 0x8a, 0x1e, 0x06, 0xa2, 0xfa, 0x0a, 0x8f, 0x80, 0x86, 0x07, 0xed, 0x84, 0x92, 0x45, 0x26, 0xf7, 0x75, 0xd4, 0x83, 0x7a, 0xdd, 0x62, 0x7c, 0x9b, 0xe5, 0xa1, 0x2e, 0xdb, 0xea, 0x25, 0x5c, 0x87, 0x74, 0x5b, 0x99, 0x9f, 0xe8, 0x3a, 0x66, 0x02, 0x59, 0x28, 0xb6, 0xcc, 0x53, 0xf4, 0xe9, 0x05, 0xd1, 0x78, 0xbb, 0x77, 0x2d, 0x23 }; register unsigned char h1,h2,h3,h4; register unsigned char c; act_uint32_t hash; h1 = 0x00; h2 = 0x33; h3 = 0x99; h4 = 0xaa; /* arbitrary random init */ while (len-- > 0) { c = *key++; h1 = ptab[h1 ^ c]; h2 = ptab[h2 ^ c]; h3 = ptab[h3 ^ c]; h4 = ptab[h4 ^ c]; } hash = (h4 << 24) ^ (h3 << 16) ^ (h2 << 8) ^ h1; return hash; } /* * RIFKIN (Jon Rifkin Hash) * * This is the hash function from Jon Rifkin * as found in a similar form in hash.c inside his ipaudit package. * It's an average hash function. Neither very good nor very bad. */ intern act_uint32_t act_hash_fct_rifkin( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0; register int ishift = 0; while (len-- > 0) { hash ^= (((act_uint32_t)(*key++)) << ishift); ishift += 8; if (ishift >= 32) ishift = 0; } hash = hash + (hash << 16) - (hash >> 16) - 1; return hash; } /* * ASU (Aho, Sethi, Ullman) * * This is the hashing algorithm as proposed by Aho, Seti and Ullmann in * their algorithm books. It is not very fast, but distributes well. */ intern act_uint32_t act_hash_fct_asu( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash; register act_uint32_t g; hash = (act_uint32_t)len; while (len-- > 0) { hash = (hash << 4) + *key++; g = hash & 0xf0000000; if (g != 0) hash = (hash ^ (g >> 24)) ^ g; } return hash; } /* * HOLUB (Holub Generic Hash) * * This is Weinberger's generic hash algorithm, as adapted and published * by Holub. It was extracted from PHP4's mod_session. It is actually a * not one of the best hash functions in the set, but might have some * particular uses. */ intern act_uint32_t act_hash_fct_holub( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash; register act_uint32_t i; hash = 0; while (len-- > 0) { hash = (hash << 4) + *key++; if ((i = hash & 0xf0000000) != 0) hash = (hash ^ (i >> 24)) & 0x0fffffff; } return hash; } /* * CBU (CanterBury University) * * McKenzie et all concluded in a paper (B J McKenzie, R Harries & T * Bell, Selecting a hashing algorithm, Software practice & experience * 20, 2 (Feb 1990), 209-224.) that for hashing program identifiers, the * following linear hash function is a good one. It was developed at * the CanterBury University, in Christchurch, New Zealand. It is also * used in the GNU RCS package (see rcs-5.7.tar.gz and there rcslex.c). * It is very fast, but horribly dislikes progressing keys and also * showed a bad distribution for real world keys. So take this hash very * carefully and test whether it works for your data. */ intern act_uint32_t act_hash_fct_cbu( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0; while (len-- > 0) hash = (hash << 2) + *key++; return hash; } /* * CVS (Concurrent Version System) * * This is the hash function found in CVS 1.10.x (see src/hash.c). It * is the same as the published elf_hash(3) function for use in the * UNIX ELF format for object files. It works fine for binary keys but * very bad for strings where it distributes horribly and has a bad * Chi^2 value. The reason might be that our tests use larger (non-prime * sized) hash tables while CVS actually uses this function with a 151 * byte long hash table only. So for special situations this hash might * be reasonable. But as a general purpose hash function for arbitrary * hash table lookups it is bad. Additionally it hates progressing keys. * So this hash is only useful if one really knows the keys one has to * hash and also tests whether this hash works for them. */ intern act_uint32_t act_hash_fct_cvs( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0; register act_uint32_t g; while (len-- > 0) { hash = (hash << 4) + *key++; /* rotation */ if ((g = (hash & 0xf0000000)) != 0) hash = (hash ^ (g >> 24)) ^ g; } return hash; } /* ** ====================================================================== ** Hash Function Test and Comparison Suite ** ====================================================================== */ #ifdef ACT_TEST #include #include #include #include typedef act_uint32_t ub4; #define hashsize(n) ((ub4)1<<(n)) #define hashmask(n) (hashsize(n)-1) /* table of hash functions */ typedef struct { char *name; act_hash_fct_t hash; struct { double t; long coll00; long coll55; long collNN; double used; long min; long max; long delta; double s_chi2; double b_chi2; } stat; } table_entry; #define EMPTY_STAT { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } table_entry table[] = { { "DJBX33A", act_hash_fct_djbx33a, EMPTY_STAT }, { "DJBX33X", act_hash_fct_djbx33x, EMPTY_STAT }, { "JEDI", act_hash_fct_jedi, EMPTY_STAT }, { "VOCONG", act_hash_fct_vocong, EMPTY_STAT }, { "CDT", act_hash_fct_cdt, EMPTY_STAT }, { "JOTCL", act_hash_fct_jotcl, EMPTY_STAT }, { "BJDDJ", act_hash_fct_bjddj, EMPTY_STAT }, { "CRC32", act_hash_fct_crc32, EMPTY_STAT }, { "TEADM", act_hash_fct_teadm, EMPTY_STAT }, { "CPOAAT", act_hash_fct_cpoaat, EMPTY_STAT }, { "FNV", act_hash_fct_fnv, EMPTY_STAT }, { "OZSDBM", act_hash_fct_ozsdbm, EMPTY_STAT }, { "KAZLIB", act_hash_fct_kazlib, EMPTY_STAT }, { "BUZHASH", act_hash_fct_buzhash, EMPTY_STAT }, { "PEARSON", act_hash_fct_pearson, EMPTY_STAT }, { "RIFKIN", act_hash_fct_rifkin, EMPTY_STAT }, { "ASU", act_hash_fct_asu, EMPTY_STAT }, { "HOLUB", act_hash_fct_holub, EMPTY_STAT }, { "CBU", act_hash_fct_cbu, EMPTY_STAT }, { "CVS", act_hash_fct_cvs, EMPTY_STAT }, { NULL, NULL } }; /* used for timings */ void driver1(table_entry *te) { char buf[15000]; act_uint32_t h; struct timeval tv1, tv2; double td; int i; for (i = 0; i < 15000; i++) buf[i] = i; gettimeofday(&tv1, NULL); for (i = 0; i < 15000; i++) h = te->hash(buf,i); gettimeofday(&tv2, NULL); td = ((double)(tv2.tv_sec*1000000 + tv2.tv_usec) / 1000000) - ((double)(tv1.tv_sec*1000000 + tv1.tv_usec) / 1000000); te->stat.t = td; return; } /* check for problems with nulls */ int driver2b(table_entry *te, act_uint8_t *buf) { act_uint32_t brain[1000]; act_uint32_t h; int eq; int i, j; for (i = 0; i < 1000; i++) brain[i] = te->hash(buf, i); eq = 0; for (i = 0; i < 1000; i++) { for (j = 0; j < 1000; j++) { if (i == j) continue; if (brain[i] == brain[j]) eq++; } } return eq; } void driver2(table_entry *te) { unsigned char buf[1000]; int i; int eq; for (i = 0; i < 1000; i++) buf[i] = 0; eq = driver2b(te, buf); te->stat.coll00 = eq; for (i = 0; i < 1000; i++) buf[i] = 0x55; eq = driver2b(te, buf); te->stat.coll55 = eq; for (i = 0; i < 1000; i++) buf[i] = i; eq = driver2b(te, buf); te->stat.collNN = eq; return; } /* check for distribution */ void driver3(table_entry *te, char *file, int linewise) { #define TABLESIZE 1000 unsigned char buf[1024]; act_uint32_t htab[TABLESIZE]; act_uint32_t h; act_uint32_t min, max, exp; FILE *fp; int k; int i, j; int b; int nr; double p; double chi2; int bi; for (i = 0; i < TABLESIZE; i++) htab[i] = 0; if ((fp = fopen(file, "r")) == NULL) { perror("fopen"); return; } min = 0; max = 0; k = 0; if (linewise) { while (fgets(buf, sizeof(buf), fp) != NULL) { h = te->hash(buf, strlen(buf)) % TABLESIZE; htab[h]++; min++; k++; if (k > TABLESIZE*2) break; } } else { while ((b = fread(buf, 1, 10, fp)) > 0 && !feof(fp)) { h = te->hash(buf, b) % TABLESIZE; htab[h]++; min++; k++; if (k > TABLESIZE*2) break; } } fclose(fp); nr = 0; for (i = 0; i < TABLESIZE; i++) { min = _M_MIN(min, htab[i]); max = _M_MAX(max, htab[i]); if (htab[i] == 0) nr++; } /* Calculate Chi^2 value */ p = ((double)k)/TABLESIZE; chi2 = 0; for (i = 0; i < k; i++) { bi = 0; for (j = 0; j < TABLESIZE; j++) if (htab[j] == i) bi++; chi2 += (bi * ((double)((i-p)*(i-p))) / p); } chi2 -= TABLESIZE; chi2 /= sqrt((double)TABLESIZE); if (linewise) te->stat.s_chi2 = chi2; else te->stat.b_chi2 = chi2; te->stat.used = (nr == 0 ? 100 : 100-(((double)nr)/TABLESIZE)*100); te->stat.min = min; te->stat.max = max; te->stat.delta = max-min; return; } /* the driver program */ int main(int argc, char *argv[]) { int i; system("gzip -1 /tmp/x"); printf("Testing:"); for (i = 0; table[i].name != NULL; i++) { printf(" %s", table[i].name); fflush(stdout); driver1(&table[i]); driver2(&table[i]); driver3(&table[i], "/usr/share/dict/words", 1); driver3(&table[i], "/tmp/x", 0); } printf("\n"); fflush(stdout); printf("+-----------------------------------------------------------------------------+\n"); printf("| Hash Func Time Coll00 Coll55 CollNN Used Min Max Diff Chi2/S Chi2/B |\n"); printf("+ ---------- ------ ------ ------ ------ ----- ---- ---- ---- ------- ------- +\n"); for (i = 0; table[i].name != NULL; i++) { printf("| %-10s %6.2f %6d %6d %6d %5.2f %4d %4d %4d %7.2f%c%7.2f%c|\n", table[i].name, table[i].stat.t, table[i].stat.coll00, table[i].stat.coll55, table[i].stat.collNN, table[i].stat.used, table[i].stat.min, table[i].stat.max, table[i].stat.delta, table[i].stat.s_chi2, (table[i].stat.s_chi2 > 3 || table[i].stat.s_chi2 < -3) ? '!' : ' ', table[i].stat.b_chi2, (table[i].stat.b_chi2 > 3 || table[i].stat.b_chi2 < -3) ? '!' : ' '); } printf("+-----------------------------------------------------------------------------+\n"); return 0; } #endif /* ACT_TEST */ @ 1.28 log @typo @ text @d1 4 a4 3 /* ** Act - Abstract Container Type Library ** Copyright (c) 1999-2002 Ralf S. Engelschall d6 2 a7 2 ** This file is part of Act, a library for dealing with Abstract ** Container Types which can be found at http://www.ossp.org/pkg/act/. d37 1 a37 1 ** 2. the function must be very fast to compute d61 4 a64 4 ** Here's how Chi^2 is computed: ** 1. Lookup: b = total number of buckets ** 2. Lookup: k = total number of keys ** 3. Lookup: b_i = number of buckets which have i keys d66 1 a66 1 ** 5. Compute: Chi^2 = sum (over all i) (b_i*((i-p)^2)/p) d130 1 a130 1 * with an average percent of approx. 86%. d148 1 a148 1 intern act_uint32_t d150 1 a150 1 register act_uint8_t *key, d194 1 a194 1 intern act_uint32_t d196 1 a196 1 register act_uint8_t *key, d241 1 a241 1 * which has the hash unrolled eight times for speed. d243 1 a243 1 intern act_uint32_t d245 1 a245 1 register act_uint8_t *key, d287 1 a287 1 intern act_uint32_t d289 1 a289 1 register act_uint8_t *key, d306 1 a306 1 intern act_uint32_t d308 1 a308 1 register act_uint8_t *key, d336 1 a336 1 intern act_uint32_t d338 1 a338 1 register act_uint8_t *key, d362 1 a362 1 intern act_uint32_t d399 1 a399 1 switch(len) { d432 1 a432 1 intern act_uint32_t d492 1 a492 1 d512 1 a512 1 intern act_uint32_t d514 1 a514 1 register act_uint8_t *ptr, d542 1 a542 1 intern act_uint32_t d544 1 a544 1 register act_uint8_t *key, d547 1 a547 1 act_uint32_t k[] = { 0x9464a485, 0x542e1a94, 0x3e846bff, 0xb75bcfc3 }; d675 1 a675 1 intern act_uint32_t d677 1 a677 1 register act_uint8_t *key, d708 1 a708 1 intern act_uint32_t d710 1 a710 1 register act_uint8_t *ptr, d745 1 a745 1 intern act_uint32_t d747 1 a747 1 register act_uint8_t *key, d784 1 a784 1 intern act_uint32_t d786 1 a786 1 register act_uint8_t *key, d840 1 a840 1 0x4a31712L, 0xc2f6acfaL, 0x6e560b10L, 0xd8d7ce1L, 0x0d2b2adeL, d866 1 a866 1 intern act_uint32_t d868 1 a868 1 register act_uint8_t *key, d872 25 a896 25 0xd0, 0x24, 0x61, 0x1f, 0x65, 0xfb, 0xe1, 0x12, 0x64, 0xa7, 0xd9, 0x7f, 0x49, 0xf1, 0xfc, 0x89, 0xd8, 0x57, 0x03, 0xda, 0x4a, 0x4e, 0xc8, 0xb9, 0x42, 0x7b, 0x44, 0x88, 0x3e, 0x6e, 0x1d, 0xc2, 0x96, 0x5d, 0x10, 0x67, 0x2b, 0x31, 0x5f, 0x2c, 0xfe, 0x4f, 0x01, 0x7d, 0xf6, 0xe7, 0x15, 0x54, 0xaa, 0x29, 0x81, 0x0b, 0xde, 0xc1, 0xc0, 0x16, 0x35, 0xf2, 0xc5, 0x43, 0x22, 0x41, 0xc9, 0x5a, 0xc6, 0x6a, 0x04, 0xb8, 0x94, 0xac, 0xc4, 0x1c, 0x36, 0x71, 0xaf, 0x17, 0xfd, 0xe6, 0x20, 0x56, 0x38, 0xbf, 0x55, 0xdf, 0x3d, 0x98, 0x40, 0x09, 0x0d, 0x33, 0xb7, 0x90, 0x76, 0xca, 0xff, 0x9c, 0x73, 0x7e, 0xa6, 0x6d, 0xcb, 0x39, 0xc3, 0xd5, 0xce, 0xa4, 0xc7, 0x27, 0xcf, 0x58, 0x1b, 0xb2, 0x8d, 0x11, 0x0c, 0x0f, 0x34, 0xb4, 0x69, 0xd6, 0x2f, 0xa5, 0x51, 0x32, 0x37, 0x6f, 0x8c, 0xcd, 0xba, 0x5e, 0x82, 0x1a, 0xa9, 0x46, 0x91, 0x93, 0xbc, 0xbe, 0xe2, 0x4b, 0x18, 0xdc, 0xeb, 0x3c, 0x21, 0x47, 0x70, 0x4d, 0xae, 0xf9, 0xee, 0xa3, 0xec, 0x97, 0x08, 0xab, 0xad, 0xbd, 0x48, 0xb0, 0xa0, 0xb3, 0x68, 0xd7, 0xe4, 0xe3, 0x79, 0x4c, 0x95, 0x8b, 0xb1, 0xf8, 0x2a, 0xa8, 0x9a, 0x30, 0xf3, 0xf5, 0xd3, 0x50, 0xf0, 0x9e, 0x63, 0x9d, 0x72, 0x3f, 0xd2, 0x85, 0x60, 0x3b, 0x0e, 0x6b, 0x19, 0x52, 0xe0, 0xef, 0x13, 0x6c, 0xb5, 0x8e, 0x00, 0x14, 0x8a, 0x1e, 0x06, 0xa2, 0xfa, 0x0a, 0x8f, 0x80, 0x86, 0x07, 0xed, 0x84, 0x92, 0x45, 0x26, 0xf7, 0x75, 0xd4, 0x83, 0x7a, 0xdd, 0x62, 0x7c, 0x9b, 0xe5, 0xa1, 0x2e, 0xdb, 0xea, 0x25, 0x5c, 0x87, 0x74, 0x5b, 0x99, 0x9f, 0xe8, 0x3a, 0x66, 0x02, 0x59, 0x28, 0xb6, 0xcc, 0x53, 0xf4, 0xe9, 0x05, d922 1 a922 1 intern act_uint32_t d924 1 a924 1 register act_uint8_t *key, d940 1 a940 1 /* d946 1 a946 1 intern act_uint32_t d948 1 a948 1 register act_uint8_t *key, d972 1 a972 1 intern act_uint32_t d974 1 a974 1 register act_uint8_t *key, d1002 1 a1002 1 intern act_uint32_t d1004 1 a1004 1 register act_uint8_t *key, d1029 1 a1029 1 intern act_uint32_t d1031 1 a1031 1 register act_uint8_t *key, d1046 1 a1046 1 /* d1049 1 a1049 1 ** ====================================================================== d1249 1 a1249 1 int main(int argc, char *argv[]) d1269 10 a1278 10 printf("| %-10s %6.2f %6d %6d %6d %5.2f %4d %4d %4d %7.2f%c%7.2f%c|\n", table[i].name, table[i].stat.t, table[i].stat.coll00, table[i].stat.coll55, table[i].stat.collNN, table[i].stat.used, table[i].stat.min, table[i].stat.max, table[i].stat.delta, @ 1.27 log @bump copyright year @ text @d359 1 a359 1 * purpuse hash function. @ 1.26 log @*** empty log message *** @ text @d3 1 a3 1 ** Copyright (c) 1999-2001 Ralf S. Engelschall @ 1.25 log @*** empty log message *** @ text @d56 1 a56 1 ** the best any hash function can do is map an equal number of those d68 1 a68 1 ** within 3sqrt(b) of b. Chi^2 measures are usually reported in units of @ 1.24 log @*** empty log message *** @ text @d3 1 a3 1 ** Copyright (c) 1999-2000 Ralf S. Engelschall @ 1.23 log @*** empty log message *** @ text @d679 1 a679 1 register act_uint32_t hash = 0x811c9dc5; d682 1 a682 1 hash *= 0x01000193; @ 1.22 log @*** empty log message *** @ text @d76 2 a77 2 ** functions are ordered strongest to weakest. On a Pentium-II/400 ** under FreeBSD 4.1 the table looks approximately as following: d82 20 a101 20 ** | DJBX33A 1.23 0 0 0 86.80 0 8 8 0.19 0.41 | ** | DJBX33X 1.23 0 0 0 86.80 0 8 8 -2.24 0.38 | ** | JEDI 1.30 0 0 0 86.80 0 8 8 0.13 -0.70 | ** | VOCONG 1.41 0 0 0 85.90 0 9 9 -0.92 0.44 | ** | CDT 1.93 0 0 0 88.50 0 7 7 0.35 -2.18 | ** | JOTCL 1.69 0 0 0 85.80 0 7 7 -2.12 0.16 | ** | BJDDJ 2.34 0 0 0 85.30 0 8 8 1.90 1.23 | ** | CRC32 2.95 0 0 0 86.80 0 8 8 0.85 -0.89 | ** | TEADM 3.92 0 0 32 86.20 0 7 7 -0.70 -0.16 | ** | CPOAAT 2.36 0 0 0 87.20 0 8 8 0.70 -2.09 | ** | OZSDBM 2.36 999000 0 2 87.10 0 10 10 0.47 -1.55 | ** | FONOVO 4.10 999000 0 2 86.00 0 8 8 -0.60 -0.44 | ** | KAZLIB 5.43 0 0 0 87.40 0 8 8 7.74! -0.00 | ** | BUZHASH 2.74 30256 30256 976 86.20 0 9 9 -1.80 -2.28 | ** | PEARSON 5.10 3160 5238 0 88.10 0 8 8 -1.64 -0.60 | ** | RIFKIN 2.40 999000 124000 10754 86.90 0 8 8 0.85 2.31 | ** | ASU 3.12 0 0 0 86.40 0 8 8 441.58! 0.38 | ** | HOLUB 3.26 999000 82170 2 86.80 0 9 9 441.17! 0.85 | ** | CBU 1.13 999000 967272 2834 86.50 0 8 8 216.29! -0.51 | ** | CVS 3.28 999000 82170 2 86.80 0 9 9 441.17! 0.85 | d660 29 a736 29 * FONOVO (Glenn Fowler, Landon Curt Noll, Phong Vo) * * This is the Fowler-Noll-Vo hash. The basis of the hash algorithm * was taken from an idea sent by Email to the IEEE Posix P1003.2 * mailing list from Phong Vo and Glenn Fowler * . Landon Curt Noll later * improved on their algorithm. The magic is in the interesting * relationship between the special prime 16777619 (2^24 + 403) and 2^32 * and 2^8 (although the description of the magic I couldn't find any * longer). This hash produces only very few collisions for real world * keys and works well on both numbers and strings. But it's one of the * slower hashes and produces collisions for the null progressing test * (which usually doesn't hurt, of course). */ intern act_uint32_t act_hash_fct_fonovo( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0; while (len-- > 0) { hash *= 16777619; hash ^= *key++; } return hash; } /* d1094 1 a1095 1 { "FONOVO", act_hash_fct_fonovo, EMPTY_STAT }, @ 1.21 log @*** empty log message *** @ text @d77 1 a77 1 ** under FreeBSD 3.4 the table looks approximately as following: @ 1.20 log @*** empty log message *** @ text @d86 1 d299 19 d1088 1 @ 1.19 log @*** empty log message *** @ text @d1 27 a27 29 /* ==================================================================== * Copyright (c) 1999-2000 Ralf S. Engelschall. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * THIS SOFTWARE IS PROVIDED BY RALF S. ENGELSCHALL ``AS IS'' AND ANY * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RALF S. ENGELSCHALL OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. * ==================================================================== */ a29 2 ** act_hash_fct.c -- Hash Functions ** @ 1.18 log @*** empty log message *** @ text @d2 1 a2 1 * Copyright (c) 1999 Ralf S. Engelschall. All rights reserved. @ 1.17 log @*** empty log message *** @ text @d134 1 a134 1 * If one compares the Chi/2 values of the variants, the number 33 not @ 1.16 log @*** empty log message *** @ text @d81 1 a81 1 ** under FreeBSD 3.1 the table looks approximately as following: @ 1.15 log @*** empty log message *** @ text @d286 1 a286 1 * surprisingly well. It can be found for instance in the Berkely-DB 3.x @ 1.14 log @*** empty log message *** @ text @d286 2 a287 2 * surprisingly well. It can be found for instance in the Berkely-DB/2.x * package (hash/hash.c). @ 1.13 log @*** empty log message *** @ text @d34 3 a36 3 ** This is a collection of reasonable hash functions for use in ** conjunction with hash table lookups. One should not use these ** functions for cryptography, of course. They do only fulfill the d88 1 d93 1 d128 16 a143 14 * multipliers between 1 and 256 (as RSE did it now) one detects * that even numbers are not useable. The remaining 128 odd numbers * (except for the 1) work more or less all equally well. They all * distribute in an acceptable way and this way fill a hash table with * an average percent of approx. 86%. If one compares the Chi/2 values * of the variants the 33 not even has the best value. But the 33 and * a few other equally good values like 17, 31, 63, 127 and 129 have * nevertheless a great advantage to the remaining values in the large * set of possible multipliers: their multiply operation can be replaced * by a faster operation based on just one shift plus either a single * addition or subtraction operation. And because a hash function has * to both distribute good and has to be very fast to compute, those * few values should be preferred and seems to be the reason why Daniel * J. Bernstein also preferred it. d147 1 a147 1 * which has the hash unrolled four times for speed. Both additionally d233 49 d514 130 d1070 1 d1075 1 @ 1.12 log @*** empty log message *** @ text @d185 6 a190 6 * This is Daniel J. Bernstein's revised `times 33' hash function which * is currently favored by him: it uses exclusive-or instead of addition * to merge in the key information. It behaves mostly equal to the * DJBX33A hash, i.e. it is also a very good hash (both fast and with * good distribution). It can be found for instance in his CDB package * (see cdb_hash.c). d817 11 a827 10 * This is the hash function found in CVS 1.10.x (see src/hash.c). * It works fine for binary keys but very bad for strings where it * distributes horribly and has a bad Chi^2 value. The reason might be * that our tests use larger (non-prime sized) hash tables while CVS * actually uses this function with a 151 byte long hash table only. * So for special situations this hash might be reasonable. But as a * general purpose hash function for arbitrary hash table lookups it is * bad. Additionally it hates progressing keys. So this hash is only * useful if one really knows the keys one has to hash and also tests * whether this hash works for them. a844 1 @ 1.11 log @*** empty log message *** @ text @d86 17 a102 16 ** | DJBX33A 1.24 0 0 0 86.90 0 8 8 0.19 0.38 | ** | DJBX33X 1.24 0 0 0 86.80 0 8 8 -2.24 0.41 | ** | VOCONG 1.48 0 0 0 85.90 0 9 9 -0.92 0.51 | ** | JOTCL 1.70 0 0 0 85.80 0 7 7 -2.12 0.16 | ** | BJDDJ 2.34 0 0 0 85.30 0 8 8 1.90 1.26 | ** | CRC32 2.95 0 0 0 86.80 0 8 8 0.85 -0.95 | ** | CPOAAT 2.37 0 0 0 87.20 0 8 8 0.70 -2.18 | ** | OZSDBM 2.36 999000 0 2 87.10 0 10 10 0.47 -1.52 | ** | FONOVO 4.11 999000 0 2 86.00 0 8 8 -0.60 -0.44 | ** | KAZLIB 5.44 0 0 0 87.30 0 8 8 7.74! 0.03 | ** | BUZHASH 2.76 30256 30256 976 86.20 0 9 9 -1.80 -2.21 | ** | PEARSON 5.11 3160 5238 0 88.10 0 8 8 -1.64 -0.66 | ** | RIFKIN 2.41 999000 124000 10754 86.90 0 8 8 0.85 2.34 | ** | HOLUB 3.27 999000 82170 2 86.80 0 9 9 441.17! 1.01 | ** | CBU 1.14 999000 967272 2834 86.50 0 8 8 216.29! -0.44 | ** | CVS 3.69 999000 82170 2 86.80 0 9 9 441.17! 1.01 | d740 24 d898 1 @ 1.10 log @*** empty log message *** @ text @d137 2 a138 2 * few values should be preferred and seems to be the reason why Daniel. * J Bernstein also preferred it . @ 1.9 log @*** empty log message *** @ text @d184 6 a189 5 * This is Daniel J. Bernstein's revised `times 33' hash function which is * currently favored by him: it uses exclusive-or instead of addition to * merge in the key information. It behaves mostly equal to the DJBX33A * hash, i.e. it is also a very good hash (both fast and with good * distribution). @ 1.8 log @*** empty log message *** @ text @d114 1 a114 1 * DJBX33A (Dan J. Bernstein, Times 33 with Addition) d116 5 a120 5 * This is Dan J. Bernstein's popular `times 33' hash function as posted * by him years ago on comp.lang.c. It basically uses a function like * ``hash(i) = hash(i-1) * 33 + str[i]''. This is one of the best known * hash functions for strings. Because it is both computed very fast * and distributes very well. d137 2 a138 2 * few values should be preferred and seems to be the reason why Dan J. * Bernstein also preferred it. d182 1 a182 1 * DJBX33X (Dan J. Bernstein, Times 33 with Exclusive-Or) d184 1 a184 1 * This is Dan J. Bernstein's revised `times 33' hash function which is @ 1.7 log @*** empty log message *** @ text @d119 2 a120 1 * hash functions for strings. d140 4 a143 5 * So the following function is both a very fast hash and distributes * very well. Below there are two variants: the original variant with * only the multiplication optimized via bit shifts and additionally * a variant which has the hash unrolled four times for speed. Both * additionally are optimized for speed even more by unrolling the loop. @ 1.6 log @*** empty log message *** @ text @d98 1 d714 25 d871 1 @ 1.5 log @*** empty log message *** @ text @d98 1 d713 25 d845 1 @ 1.4 log @*** empty log message *** @ text @d88 2 a89 1 ** | VOCONG 1.42 0 0 0 85.90 0 9 9 -0.92 0.51 | d94 1 a94 1 ** | FONOVO 4.12 999000 0 2 86.00 0 8 8 -0.60 -0.44 | d96 2 a97 3 ** | BUZHASH 2.75 30256 30256 976 86.20 0 9 9 -1.80 -2.21 | ** | PEARSON 5.12 3160 5238 0 88.10 0 8 8 -1.64 -0.66 | ** | JOTCL 1.41 999000 0 2 86.30 0 8 8 -0.19 -0.41 | d99 1 a99 1 ** | CVS 3.29 999000 82170 2 86.80 0 9 9 441.17! 1.01 | d245 30 a711 28 * JOTCL (John Ousterhout, Tcl) * * This is John Ousterhout's hash from his Tcl 8.2's tclHash.c. He * said, he has chosen this particular hash (multiply by 9 and add new * character) because of the following reasons: * 1. Multiplying by 10 is perfect for keys that are decimal strings, * and multiplying by 9 is just about as good. * 2. Times-9 is (shift-left-3) plus (old). This means that each * character's bits hang around in the low-order bits of the hash * value for ever, plus they spread fairly rapidly up to the * high-order bits to fill out the hash value. This seems works well * both for decimal and non-decimal strings * In fact this is a fast hash, but it causes collissions for keys with * increasing same bytes. */ intern act_uint32_t act_hash_fct_jotcl( register act_uint8_t *key, register act_size_t len) { register act_uint32_t hash = 0; while (len-- > 0) hash = ((hash << 3) + hash) + *key++; return hash; } /* d810 1 a818 1 { "JOTCL", act_hash_fct_jotcl, EMPTY_STAT }, @ 1.3 log @*** empty log message *** @ text @d86 2 a87 2 ** | DJBX33A 1.30 0 0 0 86.90 0 8 8 0.19 0.38 | ** | DJBX33X 1.30 0 0 0 86.80 0 8 8 -2.24 0.41 | d89 3 a91 3 ** | BJDDJ 2.35 0 0 0 85.30 0 8 8 1.90 1.26 | ** | CRC32 2.96 0 0 0 86.80 0 8 8 0.85 -0.95 | ** | CPOAAT 2.37 999000 0 2 86.20 0 8 8 -1.86 -0.22 | d93 1 a93 1 ** | FONOVO 4.11 999000 0 2 86.00 0 8 8 -0.60 -0.44 | d98 2 a99 2 ** | CBU 1.13 999000 967272 2834 86.50 0 8 8 216.29! -0.44 | ** | CVS 3.28 999000 82170 2 86.80 0 9 9 441.17! 1.01 | d154 10 a163 6 /* variant with the hash unrolled four times */ for (; len >= 4; len -= 4) { hash = (hash << 5) + hash + *key++; hash = (hash << 5) + hash + *key++; hash = (hash << 5) + hash + *key++; hash = (hash << 5) + hash + *key++; d166 8 a173 11 case 3: hash = (hash << 5) + hash + *key++; /* fallthrough... */ case 2: hash = (hash << 5) + hash + *key++; /* fallthrough... */ case 1: hash = (hash << 5) + hash + *key++; break; default: /* case 0: */ break; d199 6 a204 2 /* variant with the hash unrolled four times */ for (; len >= 4; len -= 4) { d211 8 a218 11 case 3: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 2: hash = ((hash << 5) + hash) ^ *key++; /* fallthrough... */ case 1: hash = ((hash << 5) + hash) ^ *key++; break; default: /* case 0: */ break; d403 4 a406 2 * only dislikes progressing keys a little bit (which doesn't hurt in * practice). d413 1 a413 1 register act_uint32_t hash = 0; a807 1 #if 0 a818 1 #endif @ 1.2 log @*** empty log message *** @ text @d121 15 a135 15 * anyone. But if one experimentally tests all multipliers between 1 * and 256 (as RSE did it now) one detects that even numbers are not * useable. The remaining 128 odd numbers (except for the 1) work more * or less all equally well. They all distribute in an acceptable way * and this way fill a hash table with an average percent of approx. * 86%. If one compares the Chi/2 values of the variants the 33 not even * has the best value. But the 33 and a few other equally good values * like 17, 31, 63, 127 and 129 have nevertheless a great advantage * to the remaining values in the large set of possible multipliers: * their multiply operation can be replaced by a faster operation based * on just one shift plus either a single addition or subtraction * operation. And because a hash function has to both distribute good * and has to be very fast to compute, those few values should be * preferred and seems to be the reason why Dan J. Bernstein also * preferred it. @ 1.1 log @Initial revision @ text @d117 25 a141 7 * hash functions for strings. The magic of number 33, i.e. why it * works better than many other constants, prime or not, has never been * adequately explained by anyone. But it is both a very fast hash and * distributes very well. Above there are two variants: the original * variant with only the multiplication optimized via bit shifts and * additionally a variant which has the hash unrolled four times for * speed. d804 1 d816 1 @ 1.1.1.1 log @ @ text @@