15,491 questions
0
votes
0
answers
6
views
Surprising result in mixed integer/double arithmetic in Swift
I am working with Swift 5 in Xcode 16.4. I have the following bit of code:
let x = 20.0
let a = 1/6
let b = 1/6*x
let c = (1/6)*x
let d = Double(a)*x
and when I stop at the next ...
-2
votes
0
answers
44
views
Why is 0.1*0.1 not equal to 1e-2 in Python? [duplicate]
I'm using Python 3.12.10 on Spyder, packaged through conda-forge on a Windows x64 machine.
I'm seeing the following output:
0.1 - 1e-1
Out[1]: 0.0
and
0.1*0.1 - 1e-2
Out[3]: 1.734723475976807e-18
...
1
vote
0
answers
119
views
How are you supposed to normalize a fixed point signed number? [closed]
Let's say you have a signed 8 bit integer, and you want to turn it into a normalised floating point number. Do you:
Divide by 127 and clamp any values < -1.0 to -1.0
Divide by 128 and accept that ...
2
votes
1
answer
216
views
Why does Newton’s method overshoot on the first deceleration step in my motion profile generator?
I’m porting a Python motion profile generator to C to implement for my STM32H743. The generator produces step timings for a simple acceleration → cruise → deceleration motion profile. See the ...
0
votes
1
answer
273
views
Inaccuracy replicating Fortran mixed-precision expression in Rust
I have the following code in my Fortran program, where both a and b are declared as REAL (KIND=8):
a = 0.12497443596150659d0
b = 1.0 + 0.00737 * a
This yields b as 1.0009210615647672
For comparison, ...
4
votes
1
answer
146
views
Weird behavior in large complex128 NumPy arrays, imaginary part only [closed]
I'm working on numerical simulations. I ran into an issue with large NumPy arrays (~ 26 GB) on Linux with 128 GB of RAM. The arrays are of type complex128.
Arrays are instantiated without errors (if ...
0
votes
2
answers
238
views
turn Python float argument into numpy array, keep array argument the same
I have a simple function that is math-like:
def y(x):
return x**2
I know the operation x**2 will return a numpy array if supplied a numpy array and a float if supplied a float.
for more complicated ...
25
votes
12
answers
3k
views
How can I parse a string to a float in C in a way that isn't affected by the current locale?
I'm writing a program where I need to parse some configuration files in addition to user input from a graphical user interface. In particular, I'm having issues with parsing strings taken from the ...
2
votes
0
answers
172
views
Why does floating point division take less than 50% of the latency of integer division and also 10x more latency than usual when underflow occurs?
I am measuring the latency of instructions.
For 64-bit primitives, integer division takes about 25 cycles each, usually on my 2.3GHz Digital Ocean vCPU, while floating point division takes about 10 ...
4
votes
2
answers
297
views
Why does adding a value to Float.MAX_VALUE not reach infinity?
According to the standard, overflow in java is handled using a special value called infinity, but here the sum is 3.4028235E38. Why is this the case?
public class FloatingPointTest {
public static ...
6
votes
1
answer
244
views
Speeding up integer division with doubles
I have a fixed-point math-heavy project and I was looking to speed up integer divisions. I tested double division with SSE4 and AVX2 and got nearly 2x speedup versus scalar integer division. I wonder ...
0
votes
1
answer
103
views
GCC offers a _Float16 type, but - what about the functions to work with it?
GCC offers a 16-bit floating point type, outside of the C language standard: _Float16 - at least for x86_64. This allowance is described here.
However - the GCC documentation does not seem to indicate ...
5
votes
1
answer
137
views
Is it expected that vmapping over different input sizes for the same function impacts the accuracy of the result?
I was suprised to see that depending on the size of an input matrix, which is vmapped over inside of a function, the output of the function changes slightly. That is, not only does the size of the ...
3
votes
2
answers
183
views
How does Oracle convert decimal values to float?
If I have a float(5) column, why does 7.89 get rounded to 7.9 but 12.79 gets rounded to 13, not 12.8?
Binary forms are as follows for 3 examples:
7.89 0111.01011001 ------ round to------\> 7.9 ...
0
votes
1
answer
216
views
How can a long double be that big in C++? [duplicate]
The sizeof(long double) is 8, which means that if I use all the bits for the integer part of an unsigned number, I can maximum store 2^64-1=18446744073709551615.
However, std::numeric_limits<long ...