Alpha处理器下为何需要给gcc添加-mieee

Alpha处理器下为何需要给gcc添加-mieee

接触过Alpha架构的同学应该对-mieee不陌生,一般看到SIGFPE立刻就想到
是少加了-mieee这个编译参数,但这个参数到底是干啥的?

man gcc

先摘抄gcc manual里的相关描述

     -mieee
         The Alpha architecture implements floating-point
         hardware optimized for maximum performance.  It is
         mostly compliant with the IEEE floating-point standard.
         However, for full compliance, software assistance is
         required.  This option generates code fully
         IEEE-compliant code except that the inexact-flag is not
         maintained (see below).  If this option is turned on,
         the preprocessor macro "_IEEE_FP" is defined during
         compilation.  The resulting code is less efficient but
         is able to correctly support denormalized numbers and
         exceptional IEEE values such as not-a-number and
         plus/minus infinity.  Other Alpha compilers call this
         option -ieee_with_no_inexact.

         DEBIAN SPECIFIC: This option is on by default for
         alpha-linux-gnu, unless -ffinite-math-only (which is
         part of the -ffast-math set) is specified, because the
         software functions in the GNU libc math libraries
         generate denormalized numbers, NaNs, and infs (all of
         which will cause a programs to SIGFPE when it attempts
         to use the results without -mieee).

     -mieee-with-inexact
         This is like -mieee except the generated code also
         maintains the IEEE inexact-flag.  Turning on this option
         causes the generated code to implement fully-compliant
         IEEE math.  In addition to "_IEEE_FP", "_IEEE_FP_EXACT"
         is defined as a preprocessor macro.  On some Alpha
         implementations the resulting code may execute
         significantly slower than the code generated by default.
         Since there is very little code that depends on the
         inexact-flag, you should normally not specify this
         option.  Other Alpha compilers call this option
         -ieee_with_inexact.

我们一句一句来解释。

The Alpha architecture implements floating-point
hardware optimized for maximum performance.  It is
mostly compliant with the IEEE floating-point standard.
However, for full compliance, software assistance is
required.  This option generates code fully
IEEE-compliant code except that the inexact-flag is not
maintained (see below).
If this option is turned on,
the preprocessor macro "_IEEE_FP" is defined during
compilation.  The resulting code is less efficient but
is able to correctly support denormalized numbers and
exceptional IEEE values such as not-a-number and
plus/minus infinity.  Other Alpha compilers call this
option -ieee_with_no_inexact.
  1. 默认情况下,Alpha架构并非完全遵循IEEE754。
  2. 这么做是为了更好的性能考虑。
  3. 但是可以通过软件辅助做到完全遵循IEEE754。
  4. 仅仅加-mieee还是会导致“不精确结构标志”(inexact-flag)不被维护。
  5. 开启-mieee后,gcc会自动加入_IEEE_FP宏,以便C代码可以以此做一些不同的逻辑处理。
  6. 开启-mieee后性能会下降,但可以支持以下特性
    • denormalized numbers
    • not-a-number (NaN)
    • plus/minus infinity
DEBIAN SPECIFIC: This option is on by default for
alpha-linux-gnu, unless -ffinite-math-only (which is
part of the -ffast-math set) is specified, because the
software functions in the GNU libc math libraries
generate denormalized numbers, NaNs, and infs (all of
which will cause a programs to SIGFPE when it attempts
to use the results without -mieee).
  1. debian在alpha架构下已经默认开启了-mieee
  2. deepin Alpha目前也是,但根据以为的情况看早期并没有默认开启。此外这里需要确认下Alpha的开启方式是通过gcc还是debhelper以便知道默认情况的行为。
-mieee-with-inexact
This is like -mieee except the generated code also
maintains the IEEE inexact-flag.  Turning on this option
causes the generated code to implement fully-compliant
IEEE math.  In addition to "_IEEE_FP", "_IEEE_FP_EXACT"
is defined as a preprocessor macro.  On some Alpha
implementations the resulting code may execute
significantly slower than the code generated by default.
Since there is very little code that depends on the
inexact-flag, you should normally not specify this
option.
  1. 除了-mieee的功效外,还增加了对inexact-flag的支持。
  2. 除了_IEEE_FP外,开启后还增加了_IEEE_FP_EXACT的定义。
  3. 不建议开启,因为可能严重影响性能,且大部分软件都不依赖inexact-flag

浮点数的一些术语解释

前面我们看到了一些浮点数相关的术语infNaNinexactdenormalized number

大家可以通过其他文章来对他们有一个初步的理解。
1
2

理解以上内容后,应该就比较容易明白知识的吸收中提到的第一个例子。

Alpha下浮点指令的探究

double foo(double a, double b)
{
  return a / b;
}
;; 未开启-mieee时
0000000000000000 <foo>:
   0:   60 b4 11 5a     divt/su $f16,$f17,$f0
   4:   00 00 00 60     trapb
   8:   01 80 fa 6b     ret

;; 开启-mieee时
0000000000000000 <foo>:
   0:   60 b4 11 5a     divt/su $f16,$f17,$f0
   4:   00 00 00 60     trapb
   8:   01 80 fa 6b     ret
;; 开启-mieee-with-inexact时
0000000000000000 <foo>:
   0:   60 f4 11 5a     divt/sui $f16,$f17,$f0
   4:   00 00 00 60     trapb
   8:   01 80 fa 6b     ret

由此可以看出,debian alpha gcc目前是默认开启了mieee的行为,这里su,sui实际就是不同
的修饰符。
– s: exception completion enable
– u: underflow enable
– i: inexact enable

trapb是一个类似memory barrier的东西,不过它是针对trap来说的barrier。用来保证浮点操作
执行完毕。 一般只需要在函数结束前调用一次即可。

这里的各种enable以及其他浮点规范是通过FPCR这个特殊寄存器来进行管理的。
主要涉及
– rounding mode 来决定如何舍入,一般有向-inf舍入向inf舍入向0舍入
alpha CPU是将rounding mode直接嵌入到具体的指令中(通过function field指定。这和其他类alpha CPU是不同的),
这里只影响round to plus infinity,这种称之为dynamic rounding mode。而类alpha下的rounding mode
可以理解为全部都是dynamic的。
– 各种情况是否发生,比如
整数溢出(在浮点数转换为整数时可能发生);
非精确结果(经常发生,因为浮点数实际能准确表达的数的数量非常少)
上溢、下溢(指数范围有限)
除0 (如果除数本身不为0,一般会将结果转换为+-INF)
无效操作 (比如除数和被除数都是0,或NaN)
– 禁用各种异常,主要是为了忽略出现上述溢出事件,避免频繁进入内核态执行大量指令。

内核态的处理

前面说到“避免频繁进入内核态执行大量指令”以及man手册也说到完整的ieee规范在alpha上
需要software implement的配合。

https://github.com/torvalds/linux/blob/master/arch/alpha/kernel/traps.c#L212

https://github.com/torvalds/linux/blob/master/arch/alpha/math-emu/math.c#L101

观察内核代码,实际在发生浮点计算时会频繁(如果代码没处理好)进入do_entArith进行处理。
然后根据FPCR的summary标记,若有异常,则根据CPU规范和IEEE规范进行软件模拟。

大致就是根据PC指向的地址,然后分析具体是什么浮点指令,然后根据这个浮点指令以及FPCR的内容进行
模拟正确的行为。

小结

  1. 浮点数计算是非常耗时的,特别是在出现异常时需要进入内核trap handler执行大量的模拟操作。
  2. 某国产CPU虽然和alpha类似,但浮点操作相关的行为差异还是挺大。

发表评论

电子邮件地址不会被公开。 必填项已用*标注