A cryptographic hash function is a special class of hash function that has certain properties which make it suitable for use in cryptography. It is a mathematical. hash funkcija translation in Serbian-English dictionary. type of function that can be used to map data of arbitrary size to data of fixed size.
|Published (Last):||24 October 2011|
|PDF File Size:||7.19 Mb|
|ePub File Size:||14.54 Mb|
|Price:||Free* [*Free Regsitration Required]|
Cryptographic hash function
A hash function is any function that can be used to map data of arbitrary size to data of a fixed size. The values returned by a hash function are called hash valueshash codesdigestsor simply hashes. Hash functions are often used in combination with a hash tablea common data structure used in computer software for rapid data lookup.
Hash functions accelerate table or database lookup by detecting duplicated records in a large file. One such application is finding similar stretches in DNA sequences. Fnukcija are also useful in cryptography. A cryptographic hash function allows one to easily verify that some input data maps to a given hash value, but if the input data is unknown, it is deliberately difficult to reconstruct it or any equivalent alternatives by knowing the stored hash value.
This is used for assuring integrity of transmitted data, and is the building block for HMACswhich provide message authentication.
Hash functions are related to and often confused with checksums funkicja, check digitsfingerprints hhash, lossy compressionrandomization functionserror-correcting codesand ciphers. Although the concepts overlap to some extent, each one has its own uses and requirements and is fhnkcija and optimized differently.
The HashKeeper database maintained by the American National Drug Intelligence Center, for instance, is more aptly described as a catalogue of file fingerprints than of fknkcija values. Hash functions are used in hash funocija to quickly locate a data record e. Specifically, the hash function is used to map the search key to a list; the index gives the place in the hash table where the corresponding record should be stored. Hash tables are also used to implement associative arrays and dynamic sets.
Typically, the domain of a hash function the set of possible keys is larger than its range the number of different table indicesand so it will map several different keys to the same index which could result in collisions. So then, each slot of a hash table is associated with implicitly or explicitly a set of records, rather than a single record. For this reason, each slot of a hash table is often called a bucket uash, and hash values are also called hqsh listing [ citation needed ] or a bucket index.
Thus, the hash function only hints at the record’s location. Still, in a half-full table, a funcija hash function will typically narrow the search down to only one or two entries. People who write complete hash table implementations choose a specific hash function—such as a Jenkins hash or Zobrist hashing —and independently choose a hash-table collision resolution scheme—such as coalesced hashingcuckoo hashingor hopscotch hashing.
Hash functions are also used to build caches for hwsh data sets stored in slow media. A cache is generally simpler than a hashed search table, since any collision can be resolved by discarding or writing back the older of the two colliding items. This is also used in file comparison. Hash hwsh are an essential ingredient of the Bloom filtera space-efficient probabilistic data structure that is used to test whether an element is a member of a set.
When storing records in a large unsorted file, one may use a hash function to map each record to an index into a table Tand to collect in each bucket T funkciha i ] a list of the numbers of all records with the haash hash value i. Once the table is complete, any two duplicate records will end up in the same bucket. The duplicates can then be found by scanning every bucket T [ i ] which contains two or more members, fetching those records, and comparing them. With a table of appropriate size, this method is likely to be much faster than any alternative approach such as sorting the file and comparing all consecutive jash.
A hash value can be used to uniquely identify secret information. This requires that the hash function is collision-resistantwhich means that it is very hard dunkcija find data that will generate the same hash value. These functions are categorized into cryptographic hash functions and provably secure hash functions.
Functions in the second category are the most secure but also too slow for most practical purposes. Collision resistance is accomplished in part by generating very large hash values. For example, SHA-2one of the most widely used cryptographic hash functions, generates bit values.
Hash functions funnkcija also be used to locate table records whose key is similar, but not identical, to a given key; or pairs of records in a large file which have similar keys. For that purpose, one needs a haxh function that maps similar keys to hash values that differ by at most mwhere m is a small integer say, 1 or 2. If hassh builds a table T of all record numbers, using such a hash function, then similar records will end up in the same bucket, or in nearby buckets.
This class includes the so-called acoustic fingerprint algorithms, that are used to locate similar-sounding entries in large collection of audio files. For this application, the hash function must be as insensitive as possible to data capture or transmission errors, and to trivial changes such as timing and volume changes, compression, etc.
The same funkciija can be used to find equal or similar stretches in a large collection of strings, such as a document repository or a genomic database. In this case, the input strings are broken into many small pieces, and a hash function is used to detect potentially equal pieces, as above.
The Rabin—Karp algorithm is a relatively fast string searching algorithm that works in O n time on average. It is based on the use of hashing to compare strings. This principle is widely used in computer graphicscomputational geometry and many other disciplines, to solve many proximity problems in the plane or in three-dimensional spacesuch as finding closest pairs in a set of points, similar shapes in a list of shapes, similar images in an image databaseand so on. Vunkcija these applications, the set of all inputs is some sort of metric spaceand the hashing function can be interpreted as a partition of that space into a grid of cells.
The table is often an array with two or more indices called a grid filegrid indexbucket gridand similar namesand the hash function returns an index tuple. This special case of hashing is known as geometric hashing or the grid method. Geometric hashing is also used in telecommunications usually under the name vector quantization to encode and compress multi-dimensional signals.
Some standard applications that employ hash functions include authentication, message integrity using an HMAC Hashed MACmessage fingerprinting, data corruption detection, and digital signature efficiency.
Good hash functions, in the original sense of the term, are usually required to satisfy certain properties listed below. The exact requirements are dependent on the application. For example, a hash function well suited to indexing data will probably be a poor choice for a cryptographic hash function.
A hash procedure must be deterministic —meaning that for a given input value it must always generate the same hash value. In other words, it must be a function of the data to be hashed, in the mathematical sense of the term. This requirement excludes hash functions that depend on external variable parameters, such as pseudo-random number generators or the time of day.
It also excludes functions that depend on the memory address of the object being hashed in cases that the address may change during execution as may happen on systems that use certain methods of garbage collectionalthough sometimes rehashing of the item is possible.
The determinism is in the context of the reuse of the function. For example, Python adds the feature that hash functions make use of a randomized seed that is generated once when the Python process starts in addition to the input to be hashed. But if the values are persisted for example, written to disk they can no longer be treated as valid hash values, since in the next run the random value might differ.
A good hash function should map the expected inputs as evenly as possible over its output range. That is, every hash value in the output range should be generated with roughly the same probability.
The reason for this last requirement is that the cost of hashing-based methods goes up sharply as the number of collisions —pairs of inputs that are mapped to the same hash value—increases.
If some hash values are more likely to occur than others, a larger fraction of the lookup operations will have to search through a larger set of colliding table entries. Note that this criterion only requires the value to be uniformly distributednot random in any sense. A good randomizing function is barring computational efficiency concerns generally a good choice as a hash function, but the converse need not be true. Hash tables often contain only a small subset of the valid inputs.
For instance, a club membership list may contain only a hundred or so member names, out of the very large set of all possible names.
Hash function – Wikipedia
In these cases, the uniformity criterion should hold for almost all typical subsets of entries that may be found in the table, not just for the global set of all possible entries. Ufnkcija particular, if m is less than nvery few buckets should have more than one or two records.
In an ideal ” perfect fubkcija function “, no bucket should have more than one record; but a small number of collisions is virtually inevitable, even if n is much larger than m — see the birthday problem. When testing a hash function, the uniformity of the distribution of hash values can be evaluated by the chi-squared test.
It is often desirable that the output of a hash function have fixed size but funkfija below. If, for example, the output is constrained to bit integer values, the hash values can be used to index into an array.
Such hashing is commonly used to accelerate data searches.
Producing fixed-length output from variable length funkcja can be accomplished by breaking the input data into chunks of specific size. Hash functions used for data searches use some arithmetic expression which iteratively processes chunks of the input such as the characters in a string to produce the hash value. In this case, their size, which is called block sizeis much bigger than the size of the hash value.
In many applications, the range of hash values may be different for each run of the program, or may change along the same run for instance, when a hash table needs to be expanded.
In those situations, one needs a hash function which takes two parameters—the input data zand the number n of allowed hash values. If n is itself a power of 2, this can be done by bit masking and bit shifting.
Depending on the function, the remainder may be uniform only for certain values of ne. We can allow the table size n to not be a power of 2 and still not have to perform any remainder or division operation, as these computations are sometimes costly. For example, let n be significantly less than 2 b. We can replace the division by a possibly faster right bit shift: When the hash function is used to store values in a hash table that outlives the run of the program, and the hash table needs to be expanded or shrunk, the hash table is referred to as a dynamic hash table.
Linear hashing and spiral storage are examples of dynamic hash functions that execute in constant time but relax the property of uniformity to achieve the minimal movement property. Extendible hashing uses a dynamic hash function that requires space proportional to n to compute the hash function, and it becomes a function of the previous keys that have been inserted. Several algorithms that preserve the uniformity property but require time proportional to n to compute the value of H zn have been invented.
A hash function with minimal movement is especially useful in distributed hash tables. In some applications, the input data may contain features that are irrelevant for comparison purposes.
For example, when looking up a personal name, it may be desirable to ignore the distinction between upper and lower case letters. For such data, one must use a hash function that is compatible with the data equivalence criterion being used: This can be accomplished by normalizing the input before hashing it, as by upper-casing all letters.
Note that continuity is usually considered a fatal flaw for checksums, cryptographic hash functionsand other related concepts.
Continuity is desirable for hash functions only in some applications, such as hash tables used in Nearest neighbor search. In cryptographic applications, hash functions are typically expected to be practically non-invertiblemeaning that it is not realistic to reconstruct the input datum x from its hash value h x alone without spending great amounts of computing time see also One-way function.
For most types of hashing functions, the choice of the function depends strongly on the nature of the input data, and their probability distribution in the intended application.
If the data to be hashed is small enough, one can use the data itself reinterpreted as an integer as the hashed value. The cost of computing this “trivial” identity hash function is effectively zero.