Hash Function Ⅱ

SHA-256

SHA-256, or Secure Hash Algorithm 256-bit, is a hash algorithm in the SHA-2 series. The hash value generated by SHA-256 has a length of 256 bits (32 bytes). It is a cryptographic secure hash function widely used for digital signatures, data integrity verification, cryptographic protocols, and other security applications.

Features:

  1. Output Length: 256 bits, providing high security.

  2. Irreversibility: SHA-256 is an irreversible function, meaning it is impossible to deduce the original input from the hash value.

  3. Uniqueness: Even small changes in the original input lead to significant changes in the hash value, ensuring different inputs have different hash values.

  4. Fixed-Length Output: Regardless of the size of the input message, SHA-256 always produces a fixed-length output.

The operation of SHA-256 is based on the following steps:

  1. Initialization: Initialize 8 32-bit word registers, which store fixed initial hash values as part of the SHA-256 algorithm.

  2. Padding: Pad the input message to ensure its bit length is a multiple of 512 with a remainder of 448. During padding, append a '1' followed by several '0's to the end of the message to meet the specified length.

  3. Length Appending: Append the binary representation of the original message length (64 bits) to the end of the padded message to record the original length of the message.

  4. Chunk Processing: Split the padded and length-appended message into 512-bit (64-byte) chunks.

  5. Block Processing: Apply the SHA-256 algorithm to each chunk, generating a new hash value used for the processing of the next chunk.

  6. Final Hash: After processing all chunks, concatenate the values of the final 8 32-bit word registers to form the final 256-bit hash value.

SHA-256 is designed to provide sufficient security to withstand current known cryptographic analysis techniques. Due to its longer output length and widespread security applications, SHA-256 is currently considered a secure and reliable hash algorithm.

Next, we learn about SHA-512.

SHA-512, or Secure Hash Algorithm 512-bit, is a hash algorithm in the SHA-2 series. The hash value generated by SHA-512 has a length of 512 bits (64 bytes). It is a longer algorithm in the SHA-2 series, providing greater security and suitable for higher-level security requirements.

Features:

  1. Output Length: 512 bits, offering higher security strength compared to SHA-256.

  2. Irreversibility: SHA-512 is also an irreversible hash function, meaning it is impossible to deduce the original input from the hash value.

  3. Uniqueness: The likelihood of hash collisions is extremely low, even with small input changes resulting in a significant number of hash value changes.

  4. Fixed-Length Output: Regardless of the size of the input message, SHA-512 always produces a fixed-length output.

The operation of SHA-512 is similar to SHA-256 but involves longer word registers and more complex rounds:

  1. Initialization: Initialize 8 64-bit word registers, storing fixed initial hash values.

  2. Padding: Pad the input message to ensure its bit length is a multiple of 1024 with a remainder of 896. During padding, append a '1' followed by several '0's to the end of the message.

  3. Length Appending: Append the binary representation of the original message length (128 bits) to the end of the padded message to record the original length of the message.

  4. Chunk Processing: Split the padded and length-appended message into 1024-bit (128-byte) chunks.

  5. Block Processing: Apply the SHA-512 algorithm to each block, generating a new hash value used for the processing of the next block.

  6. Final Hash: After processing all blocks, concatenate the values of the final 8 64-bit word registers to form the final 512-bit hash value.

SHA-512 provides greater security and is suitable for scenarios requiring higher-level security, such as digital signatures, certificates, cryptographic protocols, etc. When selecting a hash algorithm, it's important to choose the appropriate algorithm based on specific security and performance requirements.

import hashlib

def hash_message(message, algorithm='sha256'):
    # Calculate the hash value of the message using the specified algorithm
    hasher = hashlib.new(algorithm)
    hasher.update(message.encode())
    return hasher.hexdigest()

if __name__ == "__main__":
    # Example message
    message = "This is a message to hash using SHA-1."

    # Calculate SHA-1 hash value
    sha1_hash_result = hash_message(message, 'sha1')
    print("SHA-1 Hash:", sha1_hash_result)

    # Calculate SHA-256 hash value
    sha256_hash_result = hash_message(message, 'sha256')
    print("SHA-256 Hash:", sha256_hash_result)

    # Calculate SHA-512 hash value
    sha512_hash_result = hash_message(message, 'sha512')
    print("SHA-512 Hash:", sha512_hash_result)

The provided Python code is a script that demonstrates how to calculate hash values of a given message using different hash algorithms from the hashlib library. The script defines a function called hash_message that takes a message and an optional algorithm parameter (defaulting to 'sha256') and returns the hexadecimal representation of the hash value.

Let's analyze the code:

  1. Import hashlib Library:

    import hashlib

    This line imports the hashlib library, which provides a collection of hash functions.

  2. Define hash_message Function:

    def hash_message(message, algorithm='sha256'):
        hasher = hashlib.new(algorithm)
        hasher.update(message.encode())
        return hasher.hexdigest()
    • The hash_message function takes a message and an optional algorithm as input parameters.

    • It creates a new hash object using the specified algorithm.

    • The update method is then used to update the hash object with the byte representation of the input message.

    • Finally, the function returns the hexadecimal digest of the hash value using the hexdigest method.

  3. Main Execution Section:

    if __name__ == "__main__":

    The script checks whether it is being run as the main program.

  4. Example Message and Hash Calculation:

        message = "This is a message to hash using SHA-1."

    An example message is defined.

        sha1_hash_result = hash_message(message, 'sha1')
        print("SHA-1 Hash:", sha1_hash_result)
    • The script calculates the SHA-1 hash value of the message using the hash_message function and prints the result.

        sha256_hash_result = hash_message(message, 'sha256')
        print("SHA-256 Hash:", sha256_hash_result)
    • Similarly, the script calculates the SHA-256 hash value of the message and prints the result.

        sha512_hash_result = hash_message(message, 'sha512')
        print("SHA-512 Hash:", sha512_hash_result)
    • The script also calculates the SHA-512 hash value and prints the result.

In summary, the code demonstrates how to use the hashlib library to compute hash values for a given message using different hashing algorithms (SHA-1, SHA-256, and SHA-512). The calculated hash values are then printed to the console.

Message Digest

MD(Message Digest)series is a family of hash functions designed by Ronald Rivest, including MD2, MD4, and MD5. These algorithms are one-way hash functions used to generate fixed-length hash values for messages. Here are the main members of the MD series:

MD2 (Message Digest Algorithm 2)

  • Characteristics:

    • Generates a 128-bit (16-byte) hash value.

    • Designed for calculating checksums of messages.

  • Algorithm Steps:

    1. Pad the message to meet certain length requirements.

    2. Initialize a set of S-boxes.

    3. Process the padded message, handling each 16-byte block at a time.

    4. Output the final 128-bit hash value.

MD4 (Message Digest Algorithm 4)

  • Characteristics:

    • Generates a 128-bit (16-byte) hash value.

    • Widely used in digital signatures, SSL, and other security protocols.

  • Algorithm Steps:

    1. Pad the message to meet certain length requirements.

    2. Initialize four 32-bit buffers.

    3. Process the padded message, handling each 512-bit block at a time.

    4. Output the final 128-bit hash value.

MD5 (Message Digest Algorithm 5)

  • Characteristics:

    • Generates a 128-bit (16-byte) hash value.

    • Widely used but gradually deprecated due to security vulnerabilities.

  • Algorithm Steps:

    1. Pad the message to meet certain length requirements.

    2. Initialize four 32-bit buffers.

    3. Process the padded message, handling each 512-bit block at a time.

    4. Output the final 128-bit hash value.

Now, let's focus on MD5.

This algorithm consists of four main parts:

  1. Padding Bits: When receiving an input string, it must ensure that its size is a multiple of 512 minus 64 bits. Regarding padding bits, it is necessary to first add a (1) and then add zeros to fill the additional characters.

  1. Padding Length: In order to make the final string length a multiple of 512, additional characters need to be added. For this purpose, take the initial input's length and represent it in 64 bits. Combine these two parts, and the final string is ready for hashing.

  1. Initialize MD Buffers: The entire string is converted into multiple blocks, each consisting of 512 bits. It is also necessary to initialize four distinct buffers, namely A, B, C, and D. Each of these buffers is 32 bits and is initialized as follows:

    • A = 01234567

    • B = 89abcdef

    • C = fedcba98

    • D = 76543210

  2. Process Each Block: Each 512-bit block is further broken down into 16 sub-blocks, each consisting of 32 bits. There are four rounds of operations, where each round utilizes all sub-blocks, buffers, and a constant array value.

    This constant array can be represented as T[1] -> T[64].

    Each sub-block is represented as M[0] -> M[15].

According to the above diagram, the values for a single buffer A are processed. The correct sequence is as follows:

  1. Pass B, C, and D through a nonlinear operation.

  2. Add the result to the value at position A.

  3. Add the sub-block values to the above result.

  4. Then, add the constant value for that particular iteration.

  5. Perform a circular shift on the string.

  6. In the final step, add the value of B to the string and store it in buffer A.

The above steps are executed for each buffer and each sub-block. When the final buffer for the last block is completed, you obtain the MD5 digest.

The nonlinear operation for each round on the sub-blocks is different: Round 1: (b AND c) OR ((NOT b) AND (d)) Round 2: (b AND d) OR (c AND (NOT d)) Round 3: b XOR c XOR d Round 4: c XOR (b OR (NOT d))

Through these steps, the MD5 algorithm is completed.

import struct
import math

def left_rotate(value, shift):
    return ((value << shift) | (value >> (32 - shift))) & 0xFFFFFFFF

def md5_padding(message):
    original_length = len(message) * 8
    message += b'\x80'
    while (len(message) + 8) % 64 != 0:
        message += b'\x00'
    message += struct.pack('<Q', original_length)
    return message

def md5(message):
    # Define initial buffers
    A = 0x67452301
    B = 0xEFCDAB89
    C = 0x98BADCFE
    D = 0x10325476

    # Pad the message
    message = md5_padding(message)

    # Process each block of the message
    for i in range(0, len(message), 64):
        chunk = message[i:i+64]
        M = struct.unpack('<16I', chunk)

        AA, BB, CC, DD = A, B, C, D

        # Define nonlinear functions used in MD5
        def F(X, Y, Z):
            return (X & Y) | ((~X) & Z)

        def G(X, Y, Z):
            return (X & Z) | (Y & (~Z))

        def H(X, Y, Z):
            return X ^ Y ^ Z

        def I(X, Y, Z):
            return Y ^ (X | (~Z))

        # MD5 main loop
        s = [7, 12, 17, 22] * 4 + [5, 9, 14, 20] * 4 + [4, 11, 16, 23] * 4 + [6, 10, 15, 21] * 4
        T = [int(abs(math.sin(i + 1)) * 2**32) & 0xFFFFFFFF for i in range(64)]

        for j in range(64):
            if 0 <= j <= 15:
                F_result = F(BB, CC, DD)
                g = j
            elif 16 <= j <= 31:
                F_result = G(BB, CC, DD)
                g = (5 * j + 1) % 16
            elif 32 <= j <= 47:
                F_result = H(BB, CC, DD)
                g = (3 * j + 5) % 16
            elif 48 <= j <= 63:
                F_result = I(BB, CC, DD)
                g = (7 * j) % 16

            F_result = (F_result + AA + M[g] + T[j]) & 0xFFFFFFFF
            AA = DD
            DD = CC
            CC = BB
            BB = (BB + left_rotate(F_result, s[j])) & 0xFFFFFFFF

        A = (A + AA) & 0xFFFFFFFF
        B = (B + BB) & 0xFFFFFFFF
        C = (C + CC) & 0xFFFFFFFF
        D = (D + DD) & 0xFFFFFFFF

    # Return the MD5 digest
    return '{:08x}{:08x}{:08x}{:08x}'.format(A, B, C, D)

if __name__ == "__main__":
    # Example message
    message = "This is a message to hash using MD5."

    # Calculate MD5 hash value
    md5_result = md5(message.encode())
    
    print("Message:", message)
    print("MD5 Hash:", md5_result)

The provided Python code implements the MD5 (Message Digest Algorithm 5) hashing algorithm. Below is an analysis of the key components of the code:

  1. left_rotate Function:

    • This function performs a circular left shift operation on a given value.

    • It is used within the MD5 algorithm to perform bitwise rotation.

  2. md5_padding Function:

    • Calculates and appends padding to the input message to ensure its length is a multiple of 512 bits.

    • The original length of the message is added in a 64-bit format at the end.

  3. md5 Function:

    • Implements the MD5 hashing algorithm.

    • Initializes the MD5 buffers (A, B, C, D) with specific values.

    • Processes each block of the padded message.

    • Defines four nonlinear functions (F, G, H, I) used in the MD5 algorithm.

    • Utilizes a main loop with 64 iterations, each involving specific operations based on the current iteration.

    • Applies bitwise operations, additions, and circular shifts.

    • Computes constant values for each iteration using the sine function.

    • Updates the MD5 buffers in each iteration based on the calculated values.

    • Finally, the MD5 digest is obtained by concatenating the hexadecimal representations of the updated buffer values.

  4. __main__ Block:

    • Demonstrates the usage of the md5 function with an example message.

    • Prints the original message and its corresponding MD5 hash.

Overall, the code provides a functional implementation of the MD5 hashing algorithm, following the specified steps in the MD5 algorithm. It includes functions for padding the message and performing the necessary bitwise and arithmetic operations in the MD5 main loop.

We can also implement this using existing libraries.

import hashlib

def calculate_md5(input_string):
    # Create an MD5 hash object
    md5_hash = hashlib.md5()

    # Encode the string and update the hash object
    md5_hash.update(input_string.encode('utf-8'))

    # Get the hexadecimal representation of the MD5 hash
    md5_hex = md5_hash.hexdigest()

    return md5_hex

# Example usage
input_string = "Hello, MD5!"
md5_result = calculate_md5(input_string)
print(f"MD5 Hash of '{input_string}': {md5_result}")

The provided Python code demonstrates the usage of the hashlib library to calculate the MD5 hash of a given input string. Here's an analysis of the code:

  1. Import hashlib:

    • The code starts by importing the hashlib library, which provides various hash functions, including MD5.

  2. calculate_md5 Function:

    • The function takes an input string as a parameter and calculates its MD5 hash.

    • It creates an MD5 hash object using hashlib.md5().

    • The input string is encoded as UTF-8 and then used to update the hash object using update.

    • The final MD5 hash is obtained in hexadecimal representation using hexdigest().

  3. Example Usage:

    • An example string, "Hello, MD5!", is used to demonstrate the functionality of the calculate_md5 function.

    • The MD5 hash of the input string is calculated and printed to the console.

  4. Print MD5 Hash:

    • The calculated MD5 hash is printed using an f-string to display the original input string and its corresponding MD5 hash.

In summary, the code provides a simple and straightforward implementation of calculating the MD5 hash of a string using the hashlib library. The example usage illustrates how to apply the calculate_md5 function to a specific input string and obtain the MD5 hash.

Birthday Paradox

The Birthday Paradox is a probability problem that involves randomly selecting elements in a set, with the probability of at least two elements having the same value. Although the term "paradox" is included in its name, this phenomenon is not a true paradox but rather a counterintuitive probability result.

Specifically, the Birthday Paradox is often related to the birthdays of elements in a set. The question is: If there are a certain number of people in a room, what is the probability that at least two people share the same birthday?

Surprisingly, it takes a relatively small number of people to achieve a significant likelihood of shared birthdays. This is different from the linear growth that people might intuitively expect.

The mathematical explanation of the paradox is as follows: Assume there are n people, and each person's birthday can be any day of the year, with all birthdays equally likely. To find the probability of at least two people sharing a birthday, we can calculate the probability of the complementary event, i.e., the probability that all people have different birthdays.

For the first person, they can choose any day as their birthday, with a probability of 1. For the second person, they must choose a day different from the first person, with a probability of 364/365 (since any day except the first person's birthday is acceptable out of 365 days). For the third person, they must choose a day different from the first two people, with a probability of 363/365, and so on.

If we continue this process up to the nth person, the probability that all people have different birthdays is:

If we want to calculate the probability of at least two people sharing the same birthday, we just need to compute the probability of the complementary event, which is:

This probability starts to increase rapidly for smaller values of n, exhibiting the effect of the Birthday Paradox. For instance, when n is greater than or equal to 23, the probability has already exceeded 50%.

The Birthday Paradox has significant applications in fields such as cryptography, probability statistics, and computer science, particularly in analyzing the probability of hash collisions.

This leads to the concept of a birthday attack.

A birthday attack is a cryptographic attack that exploits the Birthday Paradox, aiming to find two inputs with the same hash value, i.e., a collision. The Birthday Paradox states that in a set containing n elements, when elements are randomly chosen, it only takes about sqrt{n} chosen elements to have a high probability of at least two elements being the same. In the context of hash functions, this implies that with a relatively small number of inputs, it is possible to find two inputs with the same hash value, especially when the output space of the hash function is small.

The basic steps of a birthday attack are as follows:

  1. Select random inputs: The attacker chooses a large number of random inputs and calculates their hash values.

  2. Search for collisions: Using the principle of the Birthday Paradox, the attacker anticipates finding two inputs with the same hash value in a relatively short time.

  3. Construct the attack: Once a collision is found, the attacker can use either of the two inputs interchangeably without affecting the hash value. This can lead to security issues, especially in some cryptographic applications such as digital signatures.

In cryptography, the strength of hash functions is crucial to resist birthday attacks. Typically, the larger the output space (bit length) of a hash function, the more challenging it is to execute a birthday attack. For example, for a hash function with an output length of n bits, the expected complexity of a birthday attack is O(2^{n/2}).

The significance of birthday attacks lies in their being a universal attack method, not limited to hash functions but applicable to other cryptographic algorithms such as symmetric encryption and public-key encryption. Therefore, when designing cryptographic protocols and algorithms, precautions against birthday attacks, such as choosing sufficiently long keys and using stronger hash functions, need to be considered.

import hashlib
import random
import string

def generate_random_string(length=10):
    # Generate a random string of the specified length
    return ''.join(random.choice(string.ascii_letters) for _ in range(length))

def weak_hash_function(input_string):
    # This is a very weak hash function, taking only the first two characters of the input string as the hash value
    return input_string[:2]

def birthday_attack_weak_hash(num_trials=100):
    # Use a hash table to store hash values and their corresponding inputs
    hash_table = {}

    for _ in range(num_trials):
        # Generate a random string
        random_string = generate_random_string()

        # Calculate the hash value of the string
        hash_value = weak_hash_function(random_string)

        # If the hash value already exists, a collision is found
        if hash_value in hash_table:
            print(f"Collision found!\nString 1: {random_string}\nString 2: {hash_table[hash_value]}")
            return

        # Otherwise, add the hash value and input to the hash table
        hash_table[hash_value] = random_string

    print("No collision found after {} trials.".format(num_trials))

# Example usage
birthday_attack_weak_hash()

The provided Python code demonstrates a basic implementation of a birthday attack on a weak hash function. Let's break down the analysis:

  1. generate_random_string function:

    • This function generates a random string of a specified length using uppercase and lowercase letters from the ASCII alphabet.

  2. weak_hash_function function:

    • This function represents a very weak hash function. It takes an input string and returns only the first two characters as the hash value.

  3. birthday_attack_weak_hash function:

    • This function performs a birthday attack on the weak hash function by generating random strings, computing their hash values using the weak hash function, and storing the hash values along with their corresponding inputs in a hash table.

    • It repeats this process for a specified number of trials (num_trials).

    • If a collision (two different inputs producing the same hash value) is found during the trials, the function prints the collision details and exits.

    • If no collision is found after the specified number of trials, it prints a message indicating that no collision was found.

  4. Example Usage:

    • The birthday_attack_weak_hash function is called with its default num_trials value, leading to 100 trials.

    • The function output indicates whether a collision was found or not.

  • The code illustrates a simplified scenario of a birthday attack on a weak hash function.

  • The weakness of the hash function is evident as it uses only the first two characters of the input, making collisions more likely.

  • The attack relies on the probability of finding two different inputs producing the same hash value due to the limited hash space.

  • In practice, cryptographic hash functions are designed to resist birthday attacks by having a large output space.

It's important to note that this code is educational and not suitable for any real cryptographic purposes. In real-world applications, strong and well-established hash functions should be used to ensure security.

Reference

1. https://en.wikipedia.org/wiki/MD5

2. https://www.techtarget.com/searchsecurity/definition/MD5

3. https://www.simplilearn.com/tutorials/cyber-security-tutorial/md5-algorithm

4. https://atlasvpn.com/blog/birthday-attack

5. https://craigwright.net/blog/bitcoin-blockchain-tech/the-puzzle-of-the-double-hash/

Last updated