Development

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • rfa (x86): 387<=>sse moves

    6 answers - 889 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    With -march=pentium4 -mfpmath=sse , we get an extra move for code
    like
    double d = atof(foo);
    int i = d;
    call atof
    fstpl -8(%ebp)
    movsd -8(%ebp), %xmm0
    cvttsd2si %xmm0, %eax
    (This is Linux, Darwin is similar.) I think the difficulty is that for
    (set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}
    regclass decides SSE_REGS is a zero-cost choice for 58. Which looks
    wrong, as that requires a store and load from memory. In fact, memory
    is
    the cheapest overall choice for 58 (taking its use into account also),
    and
    gcc will figure that out correctly if a more reasonable assessment is
    given
    to SSE_REGS. The immediate cause is the #Y's in the constraint:
    "=f#Y,m ,f#Y,*r ,o ,Y*x#f,Y*x#f,Y*x#f ,m
    "
    and there's probably a simple fix, but it eludes me. Advice? Thanks.
  • No.1 | | 34908 bytes | |

    Dale Johannesen wrote:
    With -march=pentium4 -mfpmath=sse , we get an extra move for code like

    double d = atof(foo);
    int i = d;

    call atof
    fstpl -8(%ebp)
    movsd -8(%ebp), %xmm0
    cvttsd2si %xmm0, %eax

    (This is Linux, Darwin is similar.) I think the difficulty is that for

    Try the attached patch. It gave a 3% speedup on -mfpmath=sse for
    tramp3d. Richard Henderson asked for SPEC testing, then it may go in.

    Paolo

    2005-07-14 Paolo Bonzini <bonzini (AT) gnu (DOT) org>

    * reload.c (find_reloads): Take PREFERREDUTPUT_RELAD_CLASS
    into account.
    (push_reload): Allow PREFERREDRELAD_CLASS to liberally
    return NREGS.
    * doc/tm.texi (Register Classes): Document what it means
    if PREFERREDRELAD_CLASS return NREGS.
    * config/i386/i386.c (ix86_preferred_reload_class): Force
    using SSE registers (and return NREGS for floating-point
    constants) if math is done with SSE.
    (): New.
    * config/i386/i386-protos.h (): New.
    * config/i386/i386.h (PREFERREDUTPUT_RELAD_CLASS): New.
    * config/i386/i386.md: Remove # register preferences.

    Index: reload.c

    RCS file: /cvs/gcc/gcc/gcc/reload.c,v
    retrieving revision 1.273
    diff -c -r1.273 reload.c
    reload.c25 Jun 2005 02:00:53 -00001.273
    reload.c14 Jul 2005 08:07:56 -0000

    1231,1245

    /* Narrow down the class of register wanted if that is
    desirable on this machine for efficiency. */
    ! if (in != 0)
    ! class = PREFERRED_RELAD_CLASS (in, class);

    /* reloads may need analogous treatment, different in detail. */
    #ifdef PREFERREDUTPUT_RELAD_CLASS
    ! if (out != 0)
    ! class = PREFERREDUTPUT_RELAD_CLASS (out, class);
    #endif

    /* Make sure we use a class that can handle the actual pseudo
    inside any subreg. For example, on the 386, QImode regs
    can appear within SImode subregs. Although GENERAL_REGS
    1231,1254

    /* Narrow down the class of register wanted if that is
    desirable on this machine for efficiency. */
    ! {
    ! enum reg_class preferred_class = class;
    !
    ! if (in != 0)
    ! preferred_class = PREFERRED_RELAD_CLASS (in, class);

    /* reloads may need analogous treatment, different in detail. */
    #ifdef PREFERREDUTPUT_RELAD_CLASS
    ! if (out != 0)
    ! preferred_class = PREFERREDUTPUT_RELAD_CLASS (out, preferred_class);
    #endif

    + /* Discard what the target said if we cannot do it. */
    + if (preferred_class != NREGS
    + || (optional && type == RELAD_FRUTPUT))
    + class = preferred_class;
    + }
    +
    /* Make sure we use a class that can handle the actual pseudo
    inside any subreg. For example, on the 386, QImode regs
    can appear within SImode subregs. Although GENERAL_REGS

    3443,3457

    /* If we can't reload this value at all, reject this
    alternative. Note that we could also lose due to
    ! LIMIT_RELAD_RELAD_CLASS, but we don't check that
    here. */

    if (! CNSTANT_P (operand)
    ! && (enum reg_class) this_alternative[i] != NREGS
    ! && (PREFERRED_RELAD_CLASS (operand,
    ! (enum reg_class) this_alternative[i])
    ! == NREGS))
    ! bad = 1;

    /* Alternative loses if it requires a type of reload not
    permitted for this insn. We can always reload SCRATCH
    3452,3477

    /* If we can't reload this value at all, reject this
    alternative. Note that we could also lose due to
    ! LIMIT_RELAD_CLASS, but we don't check that
    here. */

    if (! CNSTANT_P (operand)
    ! && (enum reg_class) this_alternative[i] != NREGS)
    ! {
    ! if (PREFERRED_RELAD_CLASS
    ! (operand, (enum reg_class) this_alternative[i])
    ! == NREGS)
    ! bad = 1;
    !
    ! #ifdef PREFERREDUTPUT_RELAD_CLASS
    ! if (operand_type[i] == RELAD_FRUTPUT
    ! && PREFERREDUTPUT_RELAD_CLASS
    ! (operand, (enum reg_class) this_alternative[i])
    ! == NREGS)
    ! bad = 1;
    ! #endif
    ! }
    !

    /* Alternative loses if it requires a type of reload not
    permitted for this insn. We can always reload SCRATCH
    Index: doc/tm.texi

    RCS file: /cvs/gcc/gcc/gcc/doc/tm.texi,v
    retrieving revision 1.441
    diff -p -u -r1.441 tm.texi
    doc/tm.texi13 Jul 2005 16:28:25 -00001.441
    doc/tm.texi15 Jul 2005 14:13:49 -0000

    2385,2396
    2385,2408
    into any kind of register, code generation will be better if
    @code{LEGITIMATE_CNSTANT_P} makes the constant illegitimate instead
    of using @code{PREFERRED_RELAD_CLASS}.
    +
    + If an insn has pseudos in it after register allocation, reload will go
    + through the alternatives and call repeatedly @code{PREFERRED_RELAD_CLASS}
    + to find the best one. Returning @code{NREGS}, in this case, makes
    + reload add a @code{?} in front of the constraint: the x86 back-end uses
    + this feature to discourage usage of 387 registers when math is done in
    + the SSE registers (and vice versa). Be careful not to return @code{NREGS}
    + when @code{x} is an hard register. , it will be impossible to
    + successfully reload the insn.
    @end defmac

    @defmac PREFERREDUTPUT_RELAD_CLASS (@var{x}, @var{class})
    Like @code{PREFERRED_RELAD_CLASS}, but for output reloads instead of
    input reloads. If you don't define this macro, the default is to use
    @var{class}, unchanged.
    +
    + You can also use @code{PREFERREDUTPUT_RELAD_CLASS} to discourage
    + reload from using some of the insns, like @code{PREFERRED_RELAD_CLASS}
    @end defmac

    @defmac LIMIT_RELAD_CLASS (@var{mode}, @var{class})
    Index: config/i386/i386.c

    RCS file: /,v
    retrieving revision 1.843
    diff -u -p -r1.843 i386.c
    config/i386/i386.c18 Jul 2005 06:39:18 -00001.843
    config/i386/i386.c25 Jul 2005 15:19:03 -0000

    15411,15425
    enum reg_class
    ix86_preferred_reload_class (rtx x, enum reg_class class)
    {
    /* We're only allowed to return a subclass of CLASS. Many of the
    following checks fail for NREGS, so eliminate that early. */
    if (class == NREGS)
    return NREGS;

    /* All classes can load zeros. */
    ! if (x == CNST0_RTX (GET_MDE (x)))
    return class;

    /* Floating-point constants need more complex checks. */
    if (GET_CDE (x) == CNST_DUBLE && GET_MDE (x) != VIDmode)
    {
    15411,15453
    enum reg_class
    ix86_preferred_reload_class (rtx x, enum reg_class class)
    {
    + enum machine_mode mode = GET_MDE (x);
    + bool is_sse_math_mode;
    +
    /* We're only allowed to return a subclass of CLASS. Many of the
    following checks fail for NREGS, so eliminate that early. */
    if (class == NREGS)
    return NREGS;

    /* All classes can load zeros. */
    ! if (x == CNST0_RTX (mode))
    return class;

    + /* Do not be picky when we are reloading a hard register. */
    + if (REG_P (x) && REGN (x) < FIRST_PSEUDREGISTER)
    + return class;
    +
    + /* Reject this alternative if we are loading: a) a vector constant into
    + an MMX or SSE register b) a floating-point constant into an SSE register
    + that will be used for math. This is because there are no MMX/SSE
    + load-from-constant instructions. */
    +
    + is_sse_math_mode =
    + TARGET_SSE_MATH && !TARGET_MIX_SSE_I387 && SSE_FLAT_MDE_P (mode);
    +
    + if (CNSTANT_P (x))
    + {
    + if (MAYBE_MMX_CLASS_P (class))
    + return NREGS;
    + if (MAYBE_SSE_CLASS_P (class)
    + && (VECTR_MDE_P (mode) || mode == TImode || is_sse_math_mode))
    + return NREGS;
    + }
    +
    + /* Prefer SSE regs only, if we can use them for math. */
    + if (is_sse_math_mode)
    + return SSE_CLASS_P (class) ? class : NREGS;
    +
    /* Floating-point constants need more complex checks. */
    if (GET_CDE (x) == CNST_DUBLE && GET_MDE (x) != VIDmode)
    {

    15431,15438
    zero above. We only want to wind up preferring 80387 registers if
    we plan on doing computation with them. */
    if (TARGET_80387
    - && (TARGET_MIX_SSE_I387
    - || !(TARGET_SSE_MATH && SSE_FLAT_MDE_P (GET_MDE (x))))
    && standard_80387_constant_p (x))
    {
    /* Limit class to non-sse. */
    15459,15464

    15448,15457

    return NREGS;
    }
    - if (MAYBE_MMX_CLASS_P (class) && CNSTANT_P (x))
    - return NREGS;
    - if (MAYBE_SSE_CLASS_P (class) && CNSTANT_P (x))
    - return NREGS;

    /* Generally when we see PLUS here, it's the function invariant
    (plus soft-fp const_int). Which can only be computed into general
    15474,15479

    15473,15478
    15495,15537
    return class;
    }

    + /* Discourage putting floating-point values in SSE registers unless
    + SSE math is being used, and likewise for the 387 registers. */
    + enum reg_class
    + (rtx x, enum reg_class class)
    + {
    + enum machine_mode mode = GET_MDE (x);
    +
    + /* Restrict the output reload class to the register bank that we are doing
    + math on. If we would like not to return a subset of CLASS, reject this
    + alternative: if reload cannot do this, it will still use its choice.
    +
    + We only do this if we are reloading a pseudo. Reloads of floating-point
    + hard registers can happen after a VEC_SELECT (whose output can only be
    + in SSE registers) if -mfpmath=387 is active. */
    + if (REG_P (x) && REGN (x) < FIRST_PSEUDREGISTER)
    + return class;
    +
    + if (TARGET_MIX_SSE_I387)
    + return class;
    +
    + mode = GET_MDE (x);
    + if (TARGET_SSE_MATH && SSE_FLAT_MDE_P (mode))
    + return SSE_CLASS_P (class) ? class : NREGS;
    +
    + if (TARGET_80387 && SCALAR_FLAT_MDE_P (mode))
    + {
    + if (class == FP_TP_SSE_REGS)
    + return FP_TP_REG;
    + else if (class == FP_SECND_SSE_REGS)
    + return FP_SECND_REG;
    + else
    + return FLAT_CLASS_P (class) ? class : NREGS;
    + }
    +
    + return class;
    + }
    +
    /* If we are copying between general and FP registers, we need a memory
    location. The same is true for SSE and MMX registers.

    Index: config/i386/i386.h

    RCS file: /,v
    retrieving revision 1.440
    diff -c -r1.440 i386.h
    config/i386/i386.h26 Jun 2005 05:18:34 -00001.440
    config/i386/i386.h14 Jul 2005 08:07:55 -0000

    1294,1299
    1294,1305
    #define PREFERRED_RELAD_CLASS(X, CLASS) \
    ix86_preferred_reload_class ((X), (CLASS))

    + /* Discourage putting floating-point values in SSE registers unless
    + SSE math is being used, and likewise for the 387 registers. */
    +
    + #define PREFERREDUTPUT_RELAD_CLASS(X, CLASS) \
    + ((X), (CLASS))
    +
    /* If we are copying between general and FP registers, we need a memory
    location. The same is true for SSE and MMX registers. */
    #define SECNDARY_MEMRY_NEEDED(CLASS1, CLASS2, MDE) \
    Index: config/i386/i386-protos.h

    RCS file: /,v
    retrieving revision 1.143
    diff -c -r1.143 i386-protos.h
    config/i386/i386-protos.h29 Jun 2005 17:27:16 -00001.143
    config/i386/i386-protos.h14 Jul 2005 08:07:55 -0000

    188,193
    188,194
    extern bool ix86_cannot_change_mode_class (enum machine_mode,
    enum machine_mode, enum reg_class);
    extern enum reg_class ix86_preferred_reload_class (rtx, enum reg_class);
    + extern enum reg_class (rtx, enum reg_class);
    extern int ix86_memory_move_cost (enum machine_mode, enum reg_class, int);
    extern int ix86_mode_needed (int, rtx);
    extern void emit_i387_cw_initialization (int);
    Index: config/i386/i386.md

    RCS file: /,v
    retrieving revision 1.645
    diff -c -r1.645 i386.md
    config/i386/i386.md12 Jul 2005 09:20:12 -00001.645
    config/i386/i386.md14 Jul 2005 21:22:24 -0000

    946,953

    (define_insn "*cmpfp_i_mixed"
    [(set (reg:CCFP FLAGS_REG)
    ! (compare:CCFP (match_operand 0 "register_operand" "f#x,x#f")
    ! (match_operand 1 "nonimmediate_operand" "f#x,xm#f")))]
    "TARGET_MIX_SSE_I387
    && SSE_FLAT_MDE_P (GET_MDE (operands[0]))
    && GET_MDE (operands[0]) == GET_MDE (operands[1])"
    946,953

    (define_insn "*cmpfp_i_mixed"
    [(set (reg:CCFP FLAGS_REG)
    ! (compare:CCFP (match_operand 0 "register_operand" "f,x")
    ! (match_operand 1 "nonimmediate_operand" "f,xm")))]
    "TARGET_MIX_SSE_I387
    && SSE_FLAT_MDE_P (GET_MDE (operands[0]))
    && GET_MDE (operands[0]) == GET_MDE (operands[1])"

    995,1002

    (define_insn "*cmpfp_iu_mixed"
    [(set (reg:CCFPU FLAGS_REG)
    ! (compare:CCFPU (match_operand 0 "register_operand" "f#x,x#f")
    ! (match_operand 1 "nonimmediate_operand" "f#x,xm#f")))]
    "TARGET_MIX_SSE_I387
    && SSE_FLAT_MDE_P (GET_MDE (operands[0]))
    && GET_MDE (operands[0]) == GET_MDE (operands[1])"
    995,1002

    (define_insn "*cmpfp_iu_mixed"
    [(set (reg:CCFPU FLAGS_REG)
    ! (compare:CCFPU (match_operand 0 "register_operand" "f,x")
    ! (match_operand 1 "nonimmediate_operand" "f,xm")))]
    "TARGET_MIX_SSE_I387
    && SSE_FLAT_MDE_P (GET_MDE (operands[0]))
    && GET_MDE (operands[0]) == GET_MDE (operands[1])"

    2197,2203

    (define_insn "*pushsf"
    [(set (match_operand:SF 0 "push_operand" "=<,<,<")
    ! (match_operand:SF 1 "general_no_elim_operand" "f#rx,rFm#fx,x#rf"))]
    "!TARGET_64BIT"
    {
    /* Anything else should be already split before reg-stack. */
    2197,2203

    (define_insn "*pushsf"
    [(set (match_operand:SF 0 "push_operand" "=<,<,<")
    ! (match_operand:SF 1 "general_no_elim_operand" "f,rFm,x"))]
    "!TARGET_64BIT"
    {
    /* Anything else should be already split before reg-stack. */

    2210,2216

    (define_insn "*pushsf_rex64"
    [(set (match_operand:SF 0 "push_operand" "=X,X,X")
    ! (match_operand:SF 1 "nonmemory_no_elim_operand" "f#rx,rF#fx,x#rf"))]
    "TARGET_64BIT"
    {
    /* Anything else should be already split before reg-stack. */
    2210,2216

    (define_insn "*pushsf_rex64"
    [(set (match_operand:SF 0 "push_operand" "=X,X,X")
    ! (match_operand:SF 1 "nonmemory_no_elim_operand" "f,rF,x"))]
    "TARGET_64BIT"
    {
    /* Anything else should be already split before reg-stack. */

    2250,2258

    (define_insn "*movsf_1"
    [(set (match_operand:SF 0 "nonimmediate_operand"
    ! "=f#xr,m ,f#xr,r#xf ,m ,x#rf,x#rf,x#rf ,m ,!*y,!rm,!*y")
    (match_operand:SF 1 "general_operand"
    ! "fm#rx,f#rx,G ,rmF#fx,Fr#fx,C ,x ,xm#rf,x#rf,rm ,*y ,*y"))]
    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && (reload_in_progress || reload_completed
    || (ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_LARGE)
    2250,2258

    (define_insn "*movsf_1"
    [(set (match_operand:SF 0 "nonimmediate_operand"
    ! "=f,m ,f,r,m ,x,x,x,m ,!*y,!rm,!*y")
    (match_operand:SF 1 "general_operand"
    ! "fm,f,G ,rmF,Fr,C ,x ,xm,x,rm ,*y ,*y"))]
    "!(MEM_P (operands[0]) && MEM_P (operands[1]))
    && (reload_in_progress || reload_completed
    || (ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_LARGE)

    2365,2371

    (define_insn "*pushdf_nointeger"
    [(set (match_operand:DF 0 "push_operand" "=<,<,<,<")
    ! (match_operand:DF 1 "general_no_elim_operand" "f#Y,Fo#fY,*r#fY,Y#f"))]
    "!TARGET_64BIT && !TARGET_INTEGER_DFMDE_MVES"
    {
    /* This insn should be already split before reg-stack. */
    2365,2371

    (define_insn "*pushdf_nointeger"
    [(set (match_operand:DF 0 "push_operand" "=<,<,<,<")
    ! (match_operand:DF 1 "general_no_elim_operand" "f,Fo,*r,Y"))]
    "!TARGET_64BIT && !TARGET_INTEGER_DFMDE_MVES"
    {
    /* This insn should be already split before reg-stack. */

    2377,2383

    (define_insn "*pushdf_integer"
    [(set (match_operand:DF 0 "push_operand" "=<,<,<")
    ! (match_operand:DF 1 "general_no_elim_operand" "f#rY,rFo#fY,Y#rf"))]
    "TARGET_64BIT || TARGET_INTEGER_DFMDE_MVES"
    {
    /* This insn should be already split before reg-stack. */
    2377,2383

    (define_insn "*pushdf_integer"
    [(set (match_operand:DF 0 "push_operand" "=<,<,<")
    ! (match_operand:DF 1 "general_no_elim_operand" "f,rFo,Y"))]
    "TARGET_64BIT || TARGET_INTEGER_DFMDE_MVES"
    {
    /* This insn should be already split before reg-stack. */

    2417,2425

    (define_insn "*movdf_nointeger"
    [(set (match_operand:DF 0 "nonimmediate_operand"
    ! "=f#Y,m ,f#Y,*r ,o ,Y*x#f,Y*x#f,Y*x#f ,m ")
    (match_operand:DF 1 "general_operand"
    ! "fm#Y,f#Y,G ,*roF,F*r,C ,Y*x#f,HmY*x#f,Y*x#f"))]
    "(GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)
    && ((optimize_size || !TARGET_INTEGER_DFMDE_MVES) && !TARGET_64BIT)
    && (reload_in_progress || reload_completed
    2417,2425

    (define_insn "*movdf_nointeger"
    [(set (match_operand:DF 0 "nonimmediate_operand"
    ! "=f,m ,f,*r ,o ,Y*x,Y*x,Y*x,m ")
    (match_operand:DF 1 "general_operand"
    ! "fm,f,G ,*roF,F*r,C ,Y*x,HmY*x,Y*x"))]
    "(GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)
    && ((optimize_size || !TARGET_INTEGER_DFMDE_MVES) && !TARGET_64BIT)
    && (reload_in_progress || reload_completed

    2537,2545

    (define_insn "*movdf_integer"
    [(set (match_operand:DF 0 "nonimmediate_operand"
    ! "=f#Yr,m ,f#Yr,r#Yf ,o ,Y*x#rf,Y*x#rf,Y*x#rf,m")
    (match_operand:DF 1 "general_operand"
    ! "fm#Yr,f#Yr,G ,roF#Yf,Fr#Yf,C ,Y*x#rf,m ,Y*x#rf"))]
    "(GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)
    && ((!optimize_size && TARGET_INTEGER_DFMDE_MVES) || TARGET_64BIT)
    && (reload_in_progress || reload_completed
    2537,2545

    (define_insn "*movdf_integer"
    [(set (match_operand:DF 0 "nonimmediate_operand"
    ! "=f,m ,f,r,o ,Y*x,Y*x,Y*x,m")
    (match_operand:DF 1 "general_operand"
    ! "fm,f,G ,roF,Fr,C ,Y*x,m ,Y*x"))]
    "(GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)
    && ((!optimize_size && TARGET_INTEGER_DFMDE_MVES) || TARGET_64BIT)
    && (reload_in_progress || reload_completed

    2712,2718

    (define_insn "*pushxf_integer"
    [(set (match_operand:XF 0 "push_operand" "=<,<")
    ! (match_operand:XF 1 "general_no_elim_operand" "f#r,ro#f"))]
    "!optimize_size"
    {
    /* This insn should be already split before reg-stack. */
    2712,2718

    (define_insn "*pushxf_integer"
    [(set (match_operand:XF 0 "push_operand" "=<,<")
    ! (match_operand:XF 1 "general_no_elim_operand" "f,ro"))]
    "!optimize_size"
    {
    /* This insn should be already split before reg-stack. */

    2784,2791
    (set_attr "mode" "XF,XF,XF,SI,SI")])

    (define_insn "*movxf_integer"
    ! [(set (match_operand:XF 0 "nonimmediate_operand" "=f#r,m,f#r,r#f,o")
    ! (match_operand:XF 1 "general_operand" "fm#r,f#r,G,roF#f,Fr#f"))]
    "!optimize_size
    && (GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)
    && (reload_in_progress || reload_completed
    2784,2791
    (set_attr "mode" "XF,XF,XF,SI,SI")])

    (define_insn "*movxf_integer"
    ! [(set (match_operand:XF 0 "nonimmediate_operand" "=f,m,f,r,o")
    ! (match_operand:XF 1 "general_operand" "fm,f,G,roF,Fr"))]
    "!optimize_size
    && (GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)
    && (reload_in_progress || reload_completed

    3508,3515
    })

    (define_insn "*extendsfdf2_mixed"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=f#Y,m#fY,Y#f")
    ! (float_extend:DF (match_operand:SF 1 "nonimmediate_operand" "fm#Y,f#Y,mY#f")))]
    "TARGET_SSE2 && TARGET_MIX_SSE_I387
    && (GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)"
    {
    3508,3515
    })

    (define_insn "*extendsfdf2_mixed"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=f,m,Y")
    ! (float_extend:DF (match_operand:SF 1 "nonimmediate_operand" "fm,f,mY")))]
    "TARGET_SSE2 && TARGET_MIX_SSE_I387
    && (GET_CDE (operands[0]) != MEM || GET_CDE (operands[1]) != MEM)"
    {

    3824,3830
    })

    (define_insn "*truncxfsf2_mixed"
    ! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#rx,?r#fx,?x#rf")
    (float_truncate:SF
    (match_operand:XF 1 "register_operand" "f,f,f,f")))
    (clobber (match_operand:SF 2 "memory_operand" "=X,m,m,m"))]
    3824,3830
    })

    (define_insn "*truncxfsf2_mixed"
    ! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f,?r,?x")
    (float_truncate:SF
    (match_operand:XF 1 "register_operand" "f,f,f,f")))
    (clobber (match_operand:SF 2 "memory_operand" "=X,m,m,m"))]

    3851,3857
    (set_attr "mode" "SF")])

    (define_insn "*truncxfsf2_i387"
    ! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f#r,?r#f")
    (float_truncate:SF
    (match_operand:XF 1 "register_operand" "f,f,f")))
    (clobber (match_operand:SF 2 "memory_operand" "=X,m,m"))]
    3851,3857
    (set_attr "mode" "SF")])

    (define_insn "*truncxfsf2_i387"
    ! [(set (match_operand:SF 0 "nonimmediate_operand" "=m,?f,?r")
    (float_truncate:SF
    (match_operand:XF 1 "register_operand" "f,f,f")))
    (clobber (match_operand:SF 2 "memory_operand" "=X,m,m"))]

    3922,3928
    })

    (define_insn "*truncxfdf2_mixed"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f#rY,?r#fY,?Y#rf")
    (float_truncate:DF
    (match_operand:XF 1 "register_operand" "f,f,f,f")))
    (clobber (match_operand:DF 2 "memory_operand" "=X,m,m,m"))]
    3922,3928
    })

    (define_insn "*truncxfdf2_mixed"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f,?r,?Y")
    (float_truncate:DF
    (match_operand:XF 1 "register_operand" "f,f,f,f")))
    (clobber (match_operand:DF 2 "memory_operand" "=X,m,m,m"))]

    3949,3955
    (set_attr "mode" "DF")])

    (define_insn "*truncxfdf2_i387"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f#r,?r#f")
    (float_truncate:DF
    (match_operand:XF 1 "register_operand" "f,f,f")))
    (clobber (match_operand:DF 2 "memory_operand" "=X,m,m"))]
    3949,3955
    (set_attr "mode" "DF")])

    (define_insn "*truncxfdf2_i387"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=m,?f,?r")
    (float_truncate:DF
    (match_operand:XF 1 "register_operand" "f,f,f")))
    (clobber (match_operand:DF 2 "memory_operand" "=X,m,m"))]

    4423,4429
    "")

    (define_insn "*floatsisf2_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f#x,?f#x,x#f,x#f")
    (float:SF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_MIX_SSE_I387"
    "@
    4423,4429
    "")

    (define_insn "*floatsisf2_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f,?f,x,x")
    (float:SF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_MIX_SSE_I387"
    "@

    4466,4472
    "")

    (define_insn "*floatdisf2_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f#x,?f#x,x#f,x#f")
    (float:SF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_64BIT && TARGET_MIX_SSE_I387"
    "@
    4466,4472
    "")

    (define_insn "*floatdisf2_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f,?f,x,x")
    (float:SF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_64BIT && TARGET_MIX_SSE_I387"
    "@

    4534,4540
    "")

    (define_insn "*floatsidf2_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f#Y,?f#Y,Y#f,Y#f")
    (float:DF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_SSE2 && TARGET_MIX_SSE_I387"
    "@
    4534,4540
    "")

    (define_insn "*floatsidf2_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f,?f,Y,Y")
    (float:DF (match_operand:SI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_SSE2 && TARGET_MIX_SSE_I387"
    "@

    4577,4583
    "")

    (define_insn "*floatdidf2_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f#Y,?f#Y,Y#f,Y#f")
    (float:DF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_64BIT && TARGET_SSE2 && TARGET_MIX_SSE_I387"
    "@
    4577,4583
    "")

    (define_insn "*floatdidf2_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f,?f,Y,Y")
    (float:DF (match_operand:DI 1 "nonimmediate_operand" "m,r,r,mr")))]
    "TARGET_64BIT && TARGET_SSE2 && TARGET_MIX_SSE_I387"
    "@

    9383,9391
    "ix86_expand_fp_absneg_operator (ABS, SFmode, operands); DNE;")

    (define_insn "*absnegsf2_mixed"
    ! [(set (match_operand:SF 0 "nonimmediate_operand" "=x#f,x#f,f#x,rm")
    (match_operator:SF 3 "absneg_operator"
    ! [(match_operand:SF 1 "nonimmediate_operand" "0 ,x#f,0 ,0")]))
    (use (match_operand:V4SF 2 "nonimmediate_operand" "xm ,0 ,X ,X"))
    (clobber (reg:CC FLAGS_REG))]
    "TARGET_SSE_MATH && TARGET_MIX_SSE_I387
    9383,9391
    "ix86_expand_fp_absneg_operator (ABS, SFmode, operands); DNE;")

    (define_insn "*absnegsf2_mixed"
    ! [(set (match_operand:SF 0 "nonimmediate_operand" "=x,x,f,rm")
    (match_operator:SF 3 "absneg_operator"
    ! [(match_operand:SF 1 "nonimmediate_operand" "0 ,x,0 ,0")]))
    (use (match_operand:V4SF 2 "nonimmediate_operand" "xm ,0 ,X ,X"))
    (clobber (reg:CC FLAGS_REG))]
    "TARGET_SSE_MATH && TARGET_MIX_SSE_I387

    9479,9487
    "ix86_expand_fp_absneg_operator (ABS, DFmode, operands); DNE;")

    (define_insn "*absnegdf2_mixed"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=Y#f,Y#f,f#Y,rm")
    (match_operator:DF 3 "absneg_operator"
    ! [(match_operand:DF 1 "nonimmediate_operand" "0 ,Y#f,0 ,0")]))
    (use (match_operand:V2DF 2 "nonimmediate_operand" "Ym ,0 ,X ,X"))
    (clobber (reg:CC FLAGS_REG))]
    "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387
    9479,9487
    "ix86_expand_fp_absneg_operator (ABS, DFmode, operands); DNE;")

    (define_insn "*absnegdf2_mixed"
    ! [(set (match_operand:DF 0 "nonimmediate_operand" "=Y,Y,f,rm")
    (match_operator:DF 3 "absneg_operator"
    ! [(match_operand:DF 1 "nonimmediate_operand" "0 ,Y,0 ,0")]))
    (use (match_operand:V2DF 2 "nonimmediate_operand" "Ym ,0 ,X ,X"))
    (clobber (reg:CC FLAGS_REG))]
    "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387

    12723,12730
    (define_insn "*fp_jcc_1_mixed"
    [(set (pc)
    (if_then_else (match_operator 0 "comparison_operator"
    ! [(match_operand 1 "register_operand" "f#x,x#f")
    ! (match_operand 2 "nonimmediate_operand" "f#x,xm#f")])
    (label_ref (match_operand 3 "" ""))
    (pc)))
    (clobber (reg:CCFP FPSR_REG))
    12723,12730
    (define_insn "*fp_jcc_1_mixed"
    [(set (pc)
    (if_then_else (match_operator 0 "comparison_operator"
    ! [(match_operand 1 "register_operand" "f,x")
    ! (match_operand 2 "nonimmediate_operand" "f,xm")])
    (label_ref (match_operand 3 "" ""))
    (pc)))
    (clobber (reg:CCFP FPSR_REG))

    12768,12775
    (define_insn "*fp_jcc_2_mixed"
    [(set (pc)
    (if_then_else (match_operator 0 "comparison_operator"
    ! [(match_operand 1 "register_operand" "f#x,x#f")
    ! (match_operand 2 "nonimmediate_operand" "f#x,xm#f")])
    (pc)
    (label_ref (match_operand 3 "" ""))))
    (clobber (reg:CCFP FPSR_REG))
    12768,12775
    (define_insn "*fp_jcc_2_mixed"
    [(set (pc)
    (if_then_else (match_operator 0 "comparison_operator"
    ! [(match_operand 1 "register_operand" "f,x")
    ! (match_operand 2 "nonimmediate_operand" "f,xm")])
    (pc)
    (label_ref (match_operand 3 "" ""))))
    (clobber (reg:CCFP FPSR_REG))

    13906,13915
    ;; so use special patterns for add and mull.

    (define_insn "*fop_sf_comm_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f#x,x#f")
    (match_operator:SF 3 "binary_fp_operator"
    [(match_operand:SF 1 "nonimmediate_operand" "%0,0")
    ! (match_operand:SF 2 "nonimmediate_operand" "fm#x,xm#f")]))]
    "TARGET_MIX_SSE_I387
    && CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"
    13906,13915
    ;; so use special patterns for add and mull.

    (define_insn "*fop_sf_comm_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f,x")
    (match_operator:SF 3 "binary_fp_operator"
    [(match_operand:SF 1 "nonimmediate_operand" "%0,0")
    ! (match_operand:SF 2 "nonimmediate_operand" "fm,xm")]))]
    "TARGET_MIX_SSE_I387
    && CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"

    13958,13964
    [(set (match_operand:SF 0 "register_operand" "=f,f,x")
    (match_operator:SF 3 "binary_fp_operator"
    [(match_operand:SF 1 "nonimmediate_operand" "0,fm,0")
    ! (match_operand:SF 2 "nonimmediate_operand" "fm,0,xm#f")]))]
    "TARGET_MIX_SSE_I387
    && !CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"
    13958,13964
    [(set (match_operand:SF 0 "register_operand" "=f,f,x")
    (match_operator:SF 3 "binary_fp_operator"
    [(match_operand:SF 1 "nonimmediate_operand" "0,fm,0")
    ! (match_operand:SF 2 "nonimmediate_operand" "fm,0,xm")]))]
    "TARGET_MIX_SSE_I387
    && !CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"

    14052,14061
    (set_attr "mode" "<MDE>")])

    (define_insn "*fop_df_comm_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f#Y,Y#f")
    (match_operator:DF 3 "binary_fp_operator"
    [(match_operand:DF 1 "nonimmediate_operand" "%0,0")
    ! (match_operand:DF 2 "nonimmediate_operand" "fm#Y,Ym#f")]))]
    "TARGET_SSE2 && TARGET_MIX_SSE_I387
    && CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"
    14052,14061
    (set_attr "mode" "<MDE>")])

    (define_insn "*fop_df_comm_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f,Y")
    (match_operator:DF 3 "binary_fp_operator"
    [(match_operand:DF 1 "nonimmediate_operand" "%0,0")
    ! (match_operand:DF 2 "nonimmediate_operand" "fm,Ym")]))]
    "TARGET_SSE2 && TARGET_MIX_SSE_I387
    && CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"

    14101,14110
    (set_attr "mode" "DF")])

    (define_insn "*fop_df_1_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f#Y,f#Y,Y#f")
    (match_operator:DF 3 "binary_fp_operator"
    [(match_operand:DF 1 "nonimmediate_operand" "0,fm,0")
    ! (match_operand:DF 2 "nonimmediate_operand" "fm,0,Ym#f")]))]
    "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387
    && !CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"
    14101,14110
    (set_attr "mode" "DF")])

    (define_insn "*fop_df_1_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f,f,Y")
    (match_operator:DF 3 "binary_fp_operator"
    [(match_operand:DF 1 "nonimmediate_operand" "0,fm,0")
    ! (match_operand:DF 2 "nonimmediate_operand" "fm,0,Ym")]))]
    "TARGET_SSE2 && TARGET_SSE_MATH && TARGET_MIX_SSE_I387
    && !CMMUTATIVE_ARITH_P (operands[3])
    && (GET_CDE (operands[1]) != MEM || GET_CDE (operands[2]) != MEM)"

    14419,14426
    })

    (define_insn "*sqrtsf2_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f#x,x#f")
    ! (sqrt:SF (match_operand:SF 1 "nonimmediate_operand" "0#x,xm#f")))]
    "TARGET_USE_FANCY_MATH_387 && TARGET_MIX_SSE_I387"
    "@
    fsqrt
    14419,14426
    })

    (define_insn "*sqrtsf2_mixed"
    ! [(set (match_operand:SF 0 "register_operand" "=f,x")
    ! (sqrt:SF (match_operand:SF 1 "nonimmediate_operand" "0,xm")))]
    "TARGET_USE_FANCY_MATH_387 && TARGET_MIX_SSE_I387"
    "@
    fsqrt

    14457,14464
    })

    (define_insn "*sqrtdf2_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f#Y,Y#f")
    ! (sqrt:DF (match_operand:DF 1 "nonimmediate_operand" "0#Y,Ym#f")))]
    "TARGET_USE_FANCY_MATH_387 && TARGET_SSE2 && TARGET_MIX_SSE_I387"
    "@
    fsqrt
    14457,14464
    })

    (define_insn "*sqrtdf2_mixed"
    ! [(set (match_operand:DF 0 "register_operand" "=f,Y")
    ! (sqrt:DF (match_operand:DF 1 "nonimmediate_operand" "0,Ym")))]
    "TARGET_USE_FANCY_MATH_387 && TARGET_SSE2 && TARGET_MIX_SSE_I387"
    "@
    fsqrt

    17921,17931
    "if (! ix86_expand_fp_movcc (operands)) FAIL; DNE;")

    (define_insn "*movsfcc_1_387"
    ! [(set (match_operand:SF 0 "register_operand" "=f#r,f#r,r#f,r#f")
    (if_then_else:SF (match_operator 1 "fcmov_comparison_operator"
    [(reg FLAGS_REG) (const_int 0)])
    ! (match_operand:SF 2 "nonimmediate_operand" "f#r,0,rm#f,0")
    ! (match_operand:SF 3 "nonimmediate_operand" "0,f#r,0,rm#f")))]
    "TARGET_80387 && TARGET_CMVE
    && (GET_CDE (operands[2]) != MEM || GET_CDE (operands[3]) != MEM)"
    "@
    17921,17931
    "if (! ix86_expand_fp_movcc (operands)) FAIL; DNE;")

    (define_insn "*movsfcc_1_387"
    ! [(set (match_operand:SF 0 "register_operand" "=f,f,r,r")
    (if_then_else:SF (match_operator 1 "fcmov_comparison_operator"
    [(reg FLAGS_REG) (const_int 0)])
    ! (match_operand:SF 2 "nonimmediate_operand" "f,0,rm,0")
    ! (match_operand:SF 3 "nonimmediate_operand" "0,f,0,rm")))]
    "TARGET_80387 && TARGET_CMVE
    && (GET_CDE (operands[2]) != MEM || GET_CDE (operands[3]) != MEM)"
    "@

    17945,17955
    "if (! ix86_expand_fp_movcc (operands)) FAIL; DNE;")

    (define_insn "*movdfcc_1"
    ! [(set (match_operand:DF 0 "register_operand" "=f#r,f#r,&r#f,&r#f")
    (if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
    [(reg FLAGS_REG) (const_int 0)])
    ! (match_operand:DF 2 "nonimmediate_operand" "f#r,0,rm#f,0")
    ! (match_operand:DF 3 "nonimmediate_operand" "0,f#r,0,rm#f")))]
    "!TARGET_64BIT && TARGET_80387 && TARGET_CMVE
    && (GET_CDE (operands[2]) != MEM || GET_CDE (operands[3]) != MEM)"
    "@
    17945,17955
    "if (! ix86_expand_fp_movcc (operands)) FAIL; DNE;")

    (define_insn "*movdfcc_1"
    ! [(set (match_operand:DF 0 "register_operand" "=f,f,&r,&r")
    (if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
    [(reg FLAGS_REG) (const_int 0)])
    ! (match_operand:DF 2 "nonimmediate_operand" "f,0,rm,0")
    ! (match_operand:DF 3 "nonimmediate_operand" "0,f,0,rm")))]
    "!TARGET_64BIT && TARGET_80387 && TARGET_CMVE
    && (GET_CDE (operands[2]) != MEM || GET_CDE (operands[3]) != MEM)"
    "@

    17961,17971
    (set_attr "mode" "DF")])

    (define_insn "*movdfcc_1_rex64"
    ! [(set (match_operand:DF 0 "register_operand" "=f#r,f#r,r#f,r#f")
    (if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
    [(reg FLAGS_REG) (const_int 0)])
    ! (match_operand:DF 2 "nonimmediate_operand" "f#r,0#r,rm#f,0#f")
    ! (match_operand:DF 3 "nonimmediate_operand" "0#r,f#r,0#f,rm#f")))]
    "TARGET_64BIT && TARGET_80387 && TARGET_CMVE
    && (GET_CDE (operands[2]) != MEM || GET_CDE (operands[3]) != MEM)"
    "@
    17961,17971
    (set_attr "mode" "DF")])

    (define_insn "*movdfcc_1_rex64"
    ! [(set (match_operand:DF 0 "register_operand" "=f,f,r,r")
    (if_then_else:DF (match_operator 1 "fcmov_comparison_operator"
    [(reg FLAGS_REG) (const_int 0)])
    ! (match_operand:DF 2 "nonimmediate_operand" "f,0,rm,0")
    ! (match_operand:DF 3 "nonimmediate_operand" "0,f,0,rm")))]
    "TARGET_64BIT && TARGET_80387 && TARGET_CMVE
    && (GET_CDE (operands[2]) != MEM || GET_CDE (operands[3]) != MEM)"
    "@
  • No.2 | | 1442 bytes | |

    Jul 26, 2005, at 12:51 AM, Paolo Bonzini wrote:
    Dale Johannesen wrote:
    >With -march=pentium4 -mfpmath=sse , we get an extra move for code
    >like
    >double d = atof(foo);
    >int i = d;
    >call atof
    >fstpl -8(%ebp)
    >movsd -8(%ebp), %xmm0
    >cvttsd2si %xmm0, %eax
    >(This is Linux, Darwin is similar.) I think the difficulty is that
    >for
    >
    >(set (reg/v:DF 58 [ d ]) (reg:DF 8 st)) 64 {*movdf_nointeger}


    Try the attached patch. It gave a 3% speedup on -mfpmath=sse for
    tramp3d. Richard Henderson asked for SPEC testing, then it may go in.

    Thanks. That's progress; the cost computation in regclass now figures
    out that memory
    is that fastest place to put R58:

    Register 58 costs: AD_REGS:87000 Q_REGS:87000 NN_Q_REGS:87000
    INDEX_REGS:87000 LEGACY_REGS:87000 GENERAL_REGS:87000 FP_TP_REG:49000
    FP_SECND_REG:50000 FLAT_REGS:50000 SSE_REGS:50000
    FP_TP_SSE_REGS:75000
    FP_SECND_SSE_REGS:75000 FLAT_SSE_REGS:75000 FLAT_INT_REGS:87000
    INT_SSE_REGS:91000 FLAT_INT_SSE_REGS:91000
    ALL_REGS:91000 MEM:40000

    Unfortunately local-alloc insists on putting in a register anyway
    (ST(0) instead of an XMM,
    but the end codegen is unchanged):

    ;; Register 58 in 8.

    I think the RA may be missing the concept that memory might be faster
    than any possible register
    will dig further.
  • No.3 | | 616 bytes | |

    Jul 26, 2005, at 3:34 PM, Dale Johannesen wrote:

    I think the RA may be missing the concept that memory might be faster
    than any possible register
    will dig further.

    Yes, it is. The following fixes my problem, and causes a couple of
    3DNow-specific regressions
    in the testsuite which I need to look at, but nothing serious; I think
    it's gotten far enough to post
    for opinions. This is intended to go on top of Paolo's patch

    It may, of course, run afoul of inaccuracies in the patterns on various
    targets, haven't
    tried any performance testing yet.
  • No.4 | | 600 bytes | |

    Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:
    Yes, it is. The following fixes my problem, and causes a couple of
    3DNow-specific regressions
    in the testsuite which I need to look at, but nothing serious; I think
    it's gotten far enough to post
    for opinions. This is intended to go on top of Paolo's patch

    It may, of course, run afoul of inaccuracies in the patterns on various
    targets, haven't tried any performance testing yet.

    Looks plausible. Let us know what you wind up with wrt those
    regressions and testing.

    r~
  • No.5 | | 991 bytes | |

    Jul 27, 2005, at 2:18 PM, Richard Henderson wrote:

    Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:
    >Yes, it is. The following fixes my problem, and causes a couple of
    >3DNow-specific regressions
    >in the testsuite which I need to look at, but nothing serious; I think
    >it's gotten far enough to post
    >for opinions. This is intended to go on top of Paolo's patch
    >
    >It may, of course, run afoul of inaccuracies in the patterns on
    >various
    >targets, haven't tried any performance testing yet.
    >

    Looks plausible. Let us know what you wind up with wrt those
    regressions and testing.

    With the latest version of Paolo's patch (in PR 19653) the regressions
    are gone. Spec is going to take a bit longer, I haven't gotten GMP to
    build yet on x86 Darwinsince the FP benchmarks are the interesting
    ones for this I should work through it.
  • No.6 | | 1136 bytes | |

    Jul 27, 2005, at 2:18 PM, Richard Henderson wrote:
    Tue, Jul 26, 2005 at 11:10:56PM -0700, Dale Johannesen wrote:
    >Yes, it is. The following fixes my problem, and causes a couple of
    >3DNow-specific regressions
    >in the testsuite which I need to look at, but nothing serious; I think
    >it's gotten far enough to post
    >for opinions. This is intended to go on top of Paolo's patch
    >
    >It may, of course, run afoul of inaccuracies in the patterns on
    >various
    >targets, haven't tried any performance testing yet.
    >

    Looks plausible. Let us know what you wind up with wrt those
    regressions and testing.

    K, I've tested this on darwin x86 (both patches together). No
    regressions.
    I don't think I ought to publish absolute Spec numbers for this
    machine, but
    I get +1% on FP and +1/2% on Int. Wins: applu +3%, lucas +10%,
    eon +3%. Losses: apsi -9%. All other changes under 2%. This looks
    K to me, though I'll be investigating apsi.
    (Paolo and Richard Guenther are doing this for Linux.)

Re: rfa (x86): 387<=>sse moves


max 4000 letters.
Your nickname that display:
In order to stop the spam: 7 + 6 =
QUESTION ON "Development"

EMSDN.COM