TensorFloat-32
{{Short description|Numbering format in Nvidia hardware}}
{{More citations needed|date=April 2025}}
{{Floating-point}}
TensorFloat-32 (TF32) is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs.
Format
The binary format is:
- 1 sign bit
- 8 exponent bits
- 10 significand bits (also called mantissa, or precision bits)
File:General floating point.svg
The total 19-bit format fits within a double word (32 bits), and while it lacks precision compared with a normal 32-bit IEEE 754 floating-point number, provides much faster computation, up to 8 times on a A100 (compared to a V100 using FP32).https://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html accessed 23 May 2024
See also
References
{{Reflist}}
{{Authority control}}
{{Use dmy dates|date=May 2024}}