How to design a 32-Bit Floating point multiplier

Question

My plan is to design a 32 bit floating point multiplier from scratch. I am still an undergrad with very beginner understanding in computer architecture, Verilog/VHDL and digital circuits/design.

Here's what I have understood so far,

I started to understand how floating point numbers are denoted in IEEE standards (754), then I saw how are floating point numbers multiplied.

Let's keep it simple and ignore that there are several cases required to be understood regarding subnormals, normals, infinities, zeroes, NaN, SNaNs and QNaNs.

Clearly, when multiplying two 32-bit floating numbers, I'll have two exponents, mantissa and significands to worry about.

So, I'm clear about the big picture here, that I'll require modules for normalization, overflow checks, sign checks, truncation checks, lot more and mainly the multiplication itself.

My questions are;

How do I implement that "multiplication itself" module, from what I understood of floating point multiplication is, its effectively just adders placed in parallel to add bitwise XOR's, so how do I implement this in verilog
The actual objective is to achieve a low power design, so clearly my plans are to optimize the adders by replacing with some alternate adder designs, in hopes that I end up reducing power.

Am I roughly on the right track? I can't seem to learn how to make a floating point multiplier anywhere, it'll be great if I'm recommended some good resources to learn the entire thing and make one myself.

EDIT : Here's the verilog I have managed upto,

    module sign_unit(input wire sign_a,input wire sign_b,output wire sign_result);
    assign sign_result = sign_a^sign_b;
endmodule

module exponent_adder(input wire [7:0] exp_a, input wire [7:0] exp_b, output wire [7:0] exp_result, output wire overflow,output wire underflow);
    wire [8:0] exp_sum;
    assign exp_sum = exp_a + exp_b - 8'd127;
    assign exp_result = exp_sum[7:0];
    assign overflow = (exp_sum > 8'd254);
    assign underflow = (exp_sum < 8'd1);
endmodule

module mantissa_multiplier(input wire [23:0] mant_a, input wire [23:0] mant_b, output wire [47:0] product);
    assign product = mant_a * mant_b;
endmodule

module normalizer(input wire [47:0] in_product, output reg [22:0] norm_mantissa, output reg [7:0] exp_adjust);
    integer i;
    reg found;

    always @(*) begin
        exp_adjust = 0;
        found = 0;
        norm_mantissa = in_product[46:24];

        if (in_product[47] == 1'b1) begin
            norm_mantissa = in_product[46:24];
            exp_adjust = 1;
        end else begin
            for (i = 46; i >= 24; i = i - 1) begin
                if (!found && in_product[i] == 1'b1) begin
                    norm_mantissa = in_product[i-1 -: 23];
                    exp_adjust = -(46 - i);
                    found = 1;
                end
            end
        end
    end
endmodule

module rounder(input wire [22:0] in_mantissa, input wire guard, input wire round, input wire sticky, output wire [22:0] out_mantissa, output wire carry);
    wire round_up;
    assign round_up = guard & (round | sticky | in_mantissa[0]);
    assign {carry, out_mantissa} = in_mantissa + round_up;
endmodule

module fp32_multiplier(input wire [31:0] a,input wire [31:0] b,output wire [31:0] result);
    wire sign_a = a[31];
    wire sign_b = b[31];
    wire [7:0] exp_a = a[30:23];
    wire [7:0] exp_b = b[30:23];
    wire [23:0] mant_a = {1'b1, a[22:0]}; 
    wire [23:0] mant_b = {1'b1, b[22:0]};

    wire sign_res;
    wire [7:0] exp_res_pre;
    wire overflow, underflow;
    wire [47:0] raw_product;
    wire [22:0] norm_mant;
    wire [7:0] exp_adjust;
    wire [22:0] rounded_mant;
    wire carry;

    sign_unit su(sign_a, sign_b, sign_res);
    exponent_adder ea(exp_a, exp_b, exp_res_pre, overflow, underflow);
    mantissa_multiplier mm(mant_a, mant_b, raw_product);
    normalizer norm(raw_product, norm_mant, exp_adjust);
    rounder rnd(norm_mant, raw_product[23], raw_product[22], |raw_product[21:0], rounded_mant, carry);

    wire [7:0] final_exp = exp_res_pre + exp_adjust + carry;
    assign result = {sign_res, final_exp, rounded_mant};
endmodule

I think making a floating point multiplier from scratch is a great learning experience! And you can do it straightforward, and then read on the plenty of literature on multiplier design, and broaden your horizon as far as you want! Great! — Marcus Müller
– Marcus Müller, Commented Jun 10 at 18:55
But to pull one tooth beforehand: you're not going to build a more power-efficient multiplier by using more efficient adders (because if it was that easy…); there's really highly-optimized multipliers for most use cases, quite likely from the very platform you're targetting (it makes a difference whether you're targeting a modern FPGA, an old FPGA, an ASIC production line (and which one), a small logic device, or maybe even a discrete/7400 design with your HDL). But as said, perfect learning opportunity! Not everything you start must improve 100 yr of computer science. — Marcus Müller
– Marcus Müller, Commented Jun 10 at 19:00
I wouldn't want to pour any cold water on your project -- honestly, it's a great thing to do -- but be warned full IEEE floating point is a lot of work. You might consider reducing it somewhat (do you need subnormal numbers? hidden bit convention?) or even do a fixed-point multiplier. In any case, perhaps this paper is something to start with. Whatever you decide, consider writing "hardware-like" C or Python proof-of-concept so you know what the correct answers are supposed to be. — jonathanjo
– jonathanjo, Commented Jun 11 at 8:44
@jonathanjo thanks for the second reference, did not expect this would be such a heavy task! — whatamidoing
– whatamidoing, Commented Jun 11 at 9:19

Justme · Accepted Answer · 2025-06-10 17:50:03Z

Same way you design any multiplier, in the way you want to reduce power.

If you exclude the values that are special, like zero, each float variable defines 23 bits to multiply together. The value is actuall 24-bits with an implict 1 bit at the front, so all 24-bit mantissas are within limit of 1.0 to 1.9999... etc, and when multiplied together, you have one 48-bit result between 1.0 and 3.9999... etc, so 2 bits integer at front and 46 fractional bits.

You then add up the exponets and if result is larger than 2 then shift the value one bit and adjust the exponent and then take the one implicit bit that is 1 and then 23 actual result bits by clipping or rounding the result (may be defined by standard).

Finally look at the sign bit and basically XOR them together and you have the resulting sign bit - result is negative only if one input is negative but not when both are positive or both are negative.

Then you have a 32-bit float result.

To save energy you might implement a serial multiplier that basically does multiplication like taught in school with pen and paper.

Stack Exchange Network

How to design a 32-Bit Floating point multiplier

1 Answer 1

Hot Network Questions

How to design a 32-Bit Floating point multiplier

1 Answer 1

Related

Hot Network Questions