[Prev][Next][Index][Thread]
Re: Type theory vs floating-point arithmetic
On Saturday, August 23, 2003, at 01:57 AM, Tim Sweeney wrote:
There is a further problem regarding subtyping. It might first
appear that
the type of 32-bit floating point values is a subtype of the type of
64-bit
floating point values. Though one can losslessly convert from 32-bit
to
64-bit and back, performing a 32-bit arithmetic operation on such
32-bit
numbers can produce a different result than performing a 64-bit
arithmetic
operation on their 64-bit extensions, and then converting back. So
from a
type theoretic point of view, the types of 32-bit and 64-bit floating
point
numbers must be disjoint.
Here is another way of seeing that 32-bit IEEE cannot be a subtype of
64-bit IEEE because
_THERE ARE NO VALUES THAT THE TWO TYPES HAVE IN COMMON_
Why? Well, n-bit IEEE numbers are n-bit approximations of
infinite-precision
reals. In other words, the number
SIGN . DIG1 DIG2 ... DIGn E EXP
should be interpreted as
SIGN . DIG1 DIG2 ... DIGn ? ? ? ? ... E EXP
where each "?" in the infinite sequence of "?"s stands for "I don't
know the value
of this digit". With this, it is clear that +.1011e7 in a
four-bit-mantissa representation
is different from .10110000e7 in an eight-bit-mantissa representation
since
the former really is +.1011????????... e7 while the latter is
+.10110000????... e7.
Matthias