6.12. Half-Precision Floating Point

6.12 Half-Precision Floating Point

On ARM targets, GCC supports half-precision (16-bit) floating point via the __fp16 type. You must enable this type explicitly with the -mfp16-format command-line option in order to use it.

ARM supports two incompatible representations for half-precision floating-point values. You must choose one of the representations and use it consistently in your program.

Specifying -mfp16-format=ieee selects the IEEE 754-2008 format. This format can represent normalized values in the range of 2^-14 to 65504. There are 11 bits of significand precision, approximately 3 decimal digits.

Specifying -mfp16-format=alternative selects the ARM alternative format. This representation is similar to the IEEE format, but does not support infinities or NaNs. Instead, the range of exponents

登录查看完整内容