Last updated: May 27, 2026

Making Use of Intel 80-bit Floating-Point Numbers

Masahide Kashiwagi

1. Introduction

Intel CPUs have two kinds of floating-point units: The former is old, while the latter was added later. The former performs internal operations in extended double precision, with a total length of 80 bits, a 15-bit exponent, and a 64-bit significand. Because this has higher precision than the usual IEEE 754 double precision, it has ironically caused various precision-related problems. SSE2, on the other hand, fully follows IEEE 754. Since every CPU that supports 64-bit instructions also has SSE2 instructions, it became common, around the transition to 64-bit operating systems, to avoid the FPU in practice and handle everything with SSE2. As a result, the FPU became something that still exists but is no longer used. In the past, the FPU arithmetic unit could also be used as 80-bit long double, but this is no longer available in Microsoft Visual C++ (where long double = double). It is still available with gcc and clang.

In kv-0.4.54, I added several files for making use of this Intel 80-bit floating-point arithmetic unit.

2. Making Use of 80-bit Floating-Point Numbers (kv::fp80)

As mentioned above, Visual C++ no longer exposes this capability, but gcc and clang can still use 80-bit floating-point numbers. However, the name long double is extremely confusing: in Visual C++ it is the same as double, and on non-Intel architectures it may instead be a 128-bit floating-point type.

Until version 0.4.55, the _Float64x type was used for 80-bit floating-point numbers. However, starting with g++ 13, _Float64x became incompatible with libstdc++, making this type unusable. Therefore, the features related to 80-bit floating-point numbers were disabled for g++ 13 and later.

In version 0.4.60, this feature was restored by using long double. Since the best type to use may change again in the future, the policy is to define kv::fp80, which is currently an alias for long double, and write code using that type. To use 80-bit floating-point numbers, include:

#include <kv/fp80.h>
This header file checks whether 80-bit floating-point numbers are available. If they are available, it defines the macro KV_HAVE_FP80 and defines the type as follows:
using kv::fp80 = long double;
If you write programs using kv::fp80 instead of concrete types such as long double, __float80, or _Float64x, future changes in the surrounding situation should require only minimal modifications.

3. Interval Arithmetic with 80-bit Floating-Point Endpoints (rfp80.hpp)

Interval arithmetic with double can be performed with the type kv::interval<double> by including rdouble.hpp after interval.hpp. Similarly, interval arithmetic with kv::fp80 can be performed with the type kv::interval<kv::fp80> by including rfp80.hpp after interval.hpp.

test/test-ifp80.cc is a sample showing how to use it. It is almost the same as test-interval.cc and test-idd.cc; only the first include and the type name differ.

4. 128-bit Floating-Point Arithmetic by Combining Two kv::fp80 Values (ddx.hpp)

The so-called double-double type (dd) combines two 53-bit double values to perform arithmetic equivalent to about 106 bits. In the same way, I created a floating-point type that combines two kv::fp80 values, giving the equivalent of a 15-bit exponent and a 128-bit significand (ddx = double double-extended).

test/test-ddx.cc is a usage example. Including kv/ddx.hpp makes the kv::ddx type available. It is almost as easy to use as dd. Mathematical functions are also available.

5. Interval Arithmetic with ddx Endpoints (rddx.hpp)

Interval arithmetic with the ddx type can be performed with the type kv::interval<kv::ddx> by including rddx.hpp after interval.hpp.

test/test-iddx.cc is a sample showing how to use it.

6. Conclusion

Adding these types did not require any changes at all to kv/interval.hpp, the core file that implements interval arithmetic. I think this shows that the design around interval arithmetic in kv is quite good (if I may say so myself).

Changing double to kv::fp80, or dd to ddx, slows things down by only about 10-20% at most, so the performance is almost unchanged. This may be useful when you need just a little more precision but MPFR is too slow to be practical.

In principle, it would be possible to implement interval arithmetic in rfp80.hpp and rddx.hpp that does not change the rounding direction at all when -DKV_NOHWROUND is specified, but this has not yet been implemented.

Starting with version 0.4.56, _Float64x began to behave abnormally with g++ version 13, so features related to _Float64x were disabled for g++ version 13 and later.

Starting with version 0.4.60, the feature that had become unavailable with g++ version 13 and later was restored by stopping the use of _Float64x and switching to long double. Since the best type to use may change again in the future, the policy is to define kv::fp80, which is currently an alias for long double, and write code using that type. Along with this change,

Old name New name
rfloat64x.hpp rfp80.hpp
test-ifloat64x.cc test-ifp80.cc
conv-float64x.hpp conv-fp80.hpp

several file names were changed as shown above. Since this feature was disabled for g++ 13, I assume that very few people were using it. If you were using it, please change the header files you include and rewrite your code to use kv::fp80 instead of _Float64x.