0
$\begingroup$

I'm working on implementing a mathematical approach to bit flipping in IEEE 754 FP16 floating-point numbers without using direct bit manipulation. The goal is to flip a specific bit (particularly in the exponent field) while keeping all other bits unchanged.

def flip_exponent_bit_mathematically(value, bit_position):
    """
    Flip a specific bit in the exponent of a floating point number
    
    Args:
        value: The input floating point value (FP16)
        bit_position: Position in exponent field (0-4, where 0 is LSB)
    
    Returns:
        Value with the specified exponent bit flipped
    """
    # Extract the binary representation to check if the bit is set
    bits = struct.unpack('H', np.array(np.float16(value), dtype=np.float16).tobytes())[0]
    
    # Calculate actual bit position in the full representation
    actual_bit_position = 10 + bit_position  # Exponent bits start at position 10
    
    # Check if the bit is currently set
    is_bit_set = (bits & (1 << actual_bit_position)) != 0
    
    # Calculate multiplication factor
    bit_value = 2 ** bit_position  # Value of this bit in the exponent
    
    if is_bit_set:
        # If bit is set, turning it off: divide by 2^(2^bit_position)
        factor = 2 ** (-bit_value)
    else:
        # If bit is not set, turning it on: multiply by 2^(2^bit_position)
        factor = 2 ** bit_value
    
    # Apply the multiplication factor to change only the exponent
    return float(np.float16(value * factor))

However, I'm encountering issues with edge cases:

When flipping bits causes transitions between normal/subnormal/zero states When the exponent becomes all 1's (31), the mantissa bits are being lost
Special handling for zero values
Original: 0 11110 1111111111 (65504.0, max FP16)
Expected after flipping bit 10: 0 11111 1111111111
Actual result: 0 11111 0000000000 (mantissa bits lost)

Original: 0 10111 1111111000 (510.0)
Expected after flipping bit 13: 0 11111 1111111000
Actual result: 0 11111 0000000000 (mantissa bits lost)

I'm specifically looking for a purely mathematical solution (using operations like multiply, divide, add, etc.) rather than direct bit manipulation as I will implement this in ONNX graph representation

$\endgroup$
3
  • $\begingroup$ This may just be a bad idea. $\endgroup$ Commented Mar 18 at 7:41
  • $\begingroup$ Hi! Can you elaborate more? $\endgroup$ Commented Mar 18 at 7:43
  • $\begingroup$ Because with any processor with 16 byte short vectors you can change a bit in each of 8 16-bit floating point numbers with a single integer instruction. $\endgroup$ Commented Mar 18 at 20:47

1 Answer 1

4
$\begingroup$

As you said, 0 11110 1111111111 is the largest (finite) number that can be represented in this format, so when you multiply it by 2, it overflows, and you get +∞, whose bit pattern is 0 11111 0000000000.

The 2,046 values other than ±∞ with 11111 in the exponent field are NaNs. If you really want 0 11110 1111111111 to turn into a particular one of them, then you can't do it with multiplication. You would have to use a syntax or function in your target language that produces the correct NaN value. If no such thing exists, then it's impossible. It's not clear to me why you would want to do this.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.