Bug in icpc causing Don's problems

Mark Mitchell mark at codesourcery.com
Thu Sep 29 22:15:08 UTC 2005

I've analyzed the GEMP problem that Don is having.

The short answer is that this is a bug in icpc.

The long answer is that icpc is mishandling the calling conventions for:

std::operator*<float>(std::complex<float> const&,
                      std::complex<float> const&)

In particular, it's inconsistent between the caller and callee.

In particular, icpc is generating an out-of-line copy of the function.
(Why it's not being inlined is another question; you might be able to
work around the bug by banging on the inline-harder button.)

Here's the code generated:

        pushq     %rsi                                          #375.5
        movq      (%rdi), %rdx                                  #376.26
        movss     4(%rdi), %xmm5                                #376.26
        movss     (%rdi), %xmm3                                 #376.26
        movss     (%rsi), %xmm1                                 #377.11
        movss     4(%rsi), %xmm2                                #377.11
        movaps    %xmm3, %xmm4                                  #377.11
        movaps    %xmm5, %xmm0                                  #377.11
        mulss     %xmm1, %xmm5                                  #377.11
        mulss     %xmm1, %xmm4                                  #377.11
        mulss     %xmm2, %xmm0                                  #377.11
        mulss     %xmm2, %xmm3                                  #377.11
        movq      %rdx, (%rsp)                                  #376.26
        subss     %xmm0, %xmm4                                  #377.11
        movss     %xmm4, (%rsp)                                 #377.7
        addss     %xmm3, %xmm5                                  #377.11
        movss     %xmm5, 4(%rsp)                                #377.7
        movq      (%rsp), %rax                                  #378.14
        popq      %rcx                                          #378.14
        ret                                                     #378.14

Basically, the inputs are pointed to by %rsi and %rdi; the return value
is stored at %rsp and %rsp + 4.

However, the caller expects the return value in %xmm0:

        call      _ZStmlIfESt7complexIT_ERKS2_S4_               #76.45
        movlps    %xmm0, -64(%rbp)                              #76.45

The caller is correct.  Because std::complex<float> is a POD, the value
should go in %xmm0, according to the AMD64 ABI.

Note, by contrast, the code generated by G++ for the same function:

        movss   (%rdi), %xmm3
        movss   4(%rdi), %xmm5
        movaps  %xmm3, %xmm2
        movaps  %xmm5, %xmm0
        movss   (%rsi), %xmm1
        movss   4(%rsi), %xmm4
        mulss   %xmm1, %xmm2
        mulss   %xmm4, %xmm0
        mulss   %xmm4, %xmm3
        mulss   %xmm5, %xmm1
        subss   %xmm0, %xmm2
        addss   %xmm1, %xmm3
        movss   %xmm2, -16(%rsp)
        movss   %xmm3, -12(%rsp)
        movq    -16(%rsp), %xmm0

Note that GCC correctly loads the value into %xmm0 at the end of the

We should report this problem to Intel.  I know the Intel tools manager,
so I'm sure I can get a bug report processed.  Will you please send me
(a) the command-line you're using to do the compilation, and (b) put the
preprocessed source (output of "icpc -E") somewhere?  I'll take it from

Mark Mitchell
CodeSourcery, LLC
mark at codesourcery.com
(916) 791-8304

More information about the vsipl++ mailing list