Development

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • New: missed optimization with -ftree-vectorize

    10 answers - 7309 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    The following short testcase gets vectorized with 4.1.1 and doesn't with 4.2.0
    revision 114610
    template <class T>
    class vec
    {
    public:
    vec(unsigned int n) : size_(n)
    {
    data_ = new T[n];
    }
    vec& multiply(const vec& other)
    {
    const T* op=other.data_;
    for (unsigned int i=0; i<size_; ++i) {
    data_[i] *= op[i];
    }
    return *this;
    }
    private:
    unsigned int size_;
    T* data_;
    };
    template class vec<float>;
    /usr/local/4.2/bin/g++4.2.0 -ftree-vectorize -ftree-vectorizer-verbose=7
    -march=pentium-m -c vectorizer.cpp
    vectorizer.cpp:16: note: analyze_loop_nest
    vectorizer.cpp:16: note: vect_analyze_loop_form
    vectorizer.cpp:16: note: split exit edge.
    vectorizer.cpp:16: note: get_loop_niters
    vectorizer.cpp:16: note: get_loop_niters:D.2376_16
    vectorizer.cpp:16: note: Symbolic number of iterations is D.2376_16
    vectorizer.cpp:16: note: vect_analyze_data_refs
    vectorizer.cpp:16: note: get vectype with 4 units of type float
    vectorizer.cpp:16: note: vectype: vector float
    vectorizer.cpp:16: note: get vectype with 4 units of type const float
    vectorizer.cpp:16: note: vectype: const vector float
    vectorizer.cpp:16: note: get vectype with 4 units of type float
    vectorizer.cpp:16: note: vectype: vector float
    vectorizer.cpp:16: note: vect_analyze_scalar_cycles
    vectorizer.cpp:16: note: Analyze phi: SMT.6_28 = PHI <SMT.6_27(5),
    SMT.6_26(3)>;
    vectorizer.cpp:16: note: virtual phi. skip.
    vectorizer.cpp:16: note: Analyze phi: i_4 = PHI <i_23(5), 0(3)>;
    vectorizer.cpp:16: note: Access function of PHI: {0, +, 1}_1
    vectorizer.cpp:16: note: step: 1, init: 0
    vectorizer.cpp:16: note: Detected induction.
    vectorizer.cpp:16: note: vect_pattern_recog
    vectorizer.cpp:16: note:
    vectorizer.cpp:16: note: init: phi relevant? SMT.6_28 = PHI <SMT.6_27(5),
    SMT.6_26(3)>;
    vectorizer.cpp:16: note: init: phi relevant? i_4 = PHI <i_23(5), 0(3)>;
    vectorizer.cpp:16: note: init: stmt relevant? <L0>:
    vectorizer.cpp:16: note: init: stmt relevant? D.2378_9 = pretmp.24_1
    vectorizer.cpp:16: note: init: stmt relevant? D.2379_10 = i_4 * 4
    vectorizer.cpp:16: note: init: stmt relevant? D.2380_11 = (float *) D.2379_10
    vectorizer.cpp:16: note: init: stmt relevant? D.2381_12 = pretmp.24_1 +
    D.2380_11
    vectorizer.cpp:16: note: init: stmt relevant? D.2382_17 = *D.2381_12
    vectorizer.cpp:16: note: init: stmt relevant? D.2383_19 = (const float *)
    D.2379_10
    vectorizer.cpp:16: note: init: stmt relevant? D.2384_20 = D.2383_19 + op_3
    vectorizer.cpp:16: note: init: stmt relevant? D.2385_21 = *D.2384_20
    vectorizer.cpp:16: note: init: stmt relevant? D.2386_22 = D.2382_17 * D.2385_21
    vectorizer.cpp:16: note: init: stmt relevant? *D.2381_12 = D.2386_22
    vectorizer.cpp:16: note: vec_stmt_relevant_p: stmt has vdefs.
    vectorizer.cpp:16: note: mark relevant 1, live 0.
    vectorizer.cpp:16: note: init: stmt relevant? i_23 = i_4 + 1
    vectorizer.cpp:16: note: init: stmt relevant? if (D.2376_16 i_23) goto <L9>;
    else goto <L12>;
    vectorizer.cpp:16: note: init: stmt relevant? <L9>:
    vectorizer.cpp:16: note: worklist: examine stmt: *D.2381_12 = D.2386_22
    vectorizer.cpp:16: note: vect_is_simple_use: operand D.2386_22
    vectorizer.cpp:16: note: def_stmt: D.2386_22 = D.2382_17 * D.2385_21
    vectorizer.cpp:16: note: type of def: 2.
    vectorizer.cpp:16: note: worklist: examine use 2: D.2386_22
    vectorizer.cpp:16: note: mark relevant 1, live 0.
    vectorizer.cpp:16: note: worklist: examine stmt: D.2386_22 = D.2382_17 *
    D.2385_21
    vectorizer.cpp:16: note: vect_is_simple_use: operand D.2382_17
    vectorizer.cpp:16: note: def_stmt: D.2382_17 = *D.2381_12
    vectorizer.cpp:16: note: type of def: 2.
    vectorizer.cpp:16: note: worklist: examine use 2: D.2382_17
    vectorizer.cpp:16: note: mark relevant 1, live 0.
    vectorizer.cpp:16: note: vect_is_simple_use: operand D.2385_21
    vectorizer.cpp:16: note: def_stmt: D.2385_21 = *D.2384_20
    vectorizer.cpp:16: note: type of def: 2.
    vectorizer.cpp:16: note: worklist: examine use 2: D.2385_21
    vectorizer.cpp:16: note: mark relevant 1, live 0.
    vectorizer.cpp:16: note: worklist: examine stmt: D.2385_21 = *D.2384_20
    vectorizer.cpp:16: note: worklist: examine stmt: D.2382_17 = *D.2381_12
    vectorizer.cpp:16: note:
    vectorizer.cpp:16: note:
    vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1
    vectorizer.cpp:16: note:
    vectorizer.cpp:16: note: Unknown alignment for access: *op_3
    vectorizer.cpp:16: note:
    vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1
    vectorizer.cpp:16: note:
    vectorizer.cpp:16: note: examining statement: <L0>:
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: D.2378_9 = pretmp.24_1
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: D.2379_10 = i_4 * 4
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: D.2380_11 = (float *)
    D.2379_10
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: D.2381_12 = pretmp.24_1 +
    D.2380_11
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: D.2382_17 = *D.2381_12
    vectorizer.cpp:16: note: vectype: vector float
    vectorizer.cpp:16: note: nunits = 4
    vectorizer.cpp:16: note: examining statement: D.2383_19 = (const float *)
    D.2379_10
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: D.2384_20 = D.2383_19 + op_3
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: D.2385_21 = *D.2384_20
    vectorizer.cpp:16: note: vectype: const vector float
    vectorizer.cpp:16: note: nunits = 4
    vectorizer.cpp:16: note: examining statement: D.2386_22 = D.2382_17 *
    D.2385_21
    vectorizer.cpp:16: note: get vectype for scalar type: float
    vectorizer.cpp:16: note: get vectype with 4 units of type float
    vectorizer.cpp:16: note: vectype: vector float
    vectorizer.cpp:16: note: vectype: vector float
    vectorizer.cpp:16: note: nunits = 4
    vectorizer.cpp:16: note: examining statement: *D.2381_12 = D.2386_22
    vectorizer.cpp:16: note: vectype: vector float
    vectorizer.cpp:16: note: nunits = 4
    vectorizer.cpp:16: note: examining statement: i_23 = i_4 + 1
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: if (D.2376_16 i_23) goto
    <L9>; else goto <L12>;
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: examining statement: <L9>:
    vectorizer.cpp:16: note: skip.
    vectorizer.cpp:16: note: vect_analyze_dependences
    vectorizer.cpp:16: note: dependence distance = 0.
    vectorizer.cpp:16: note: accesses have the same alignment.
    vectorizer.cpp:16: note: dependence distance modulo vf == 0 between *D.2381_12
    and *D.2381_12
    vectorizer.cpp:16: note: not vectorized: can't determine dependence between
    *D.2384_20 and *D.2381_12
    vectorizer.cpp:16: note: bad data dependence.
    vectorizer.cpp:16: note: vectorized 0 loops in function.
    The workaround with "op" is not needed with the current autovect-branch BTW.
  • No.1 | | 115 bytes | |

    Comment #1 from pinskia at gcc dot gnu dot org 2006-06-14 13:55
    Actually I think this is wrong code with 4.1.x.
  • No.2 | | 440 bytes | |

    Comment #2 from pinskia at gcc dot gnu dot org 2006-06-14 14:05
    The code is basicially the same as:
    void multiply(float *data_, const float *op, unsigned int size_)
    {
    for (unsigned int i=0; i<size_; ++i)
    data_[i] *= op[i];
    }

    And what happens is op is data_ + 3 and size_ is 6, we will get the wrong
    answer as there will be no feedback in the loop.

    Anyways this is a 4.1 bug fixed already in 4.2.0
  • No.3 | | 1375 bytes | |

    Comment #3 from rguenth at gcc dot gnu dot org 2006-07-10 13:24
    Confirmed. In 4.1, the data-refs have the wrong memtag associated:

    Created dr for *D.2061_7
    base_address: data__6
    offset from base address: 0
    constant offset from base address: 0
    base_object:
    step: 4B
    misalignment from base: 0B
    aligned to: 4
    memtag: TMT.5

    Created dr for *D.2064_15
    base_address: op_14
    offset from base address: 0
    constant offset from base address: 0
    base_object:
    step: 4B
    misalignment from base: 0B
    aligned to: 4
    memtag: TMT.6

    after ifcvt:

    # TMT.6_22 = PHI <TMT.6_21(3), TMT.6_20(1)>;
    # i_2 = PHI <i_18(3), 0(1)>;
    <L0>:;
    D.2059_4 = i_2 * 4;
    D.2060_5 = (float *) D.2059_4;
    D.2061_7 = D.2060_5 + data__6;
    # VUSE <TMT.6_22>;
    D.2062_11 = *D.2061_7;
    D.2063_13 = (const float *) D.2059_4;
    D.2064_15 = D.2063_13 + op_14;
    # VUSE <TMT.6_22>;
    D.2065_16 = *D.2064_15;
    D.2066_17 = D.2062_11 * D.2065_16;
    # TMT.6_21 = V_MAY_DEF <TMT.6_22>;
    *D.2061_7 = D.2066_17;
    i_18 = i_2 + 1;
    if (size__3 i_18) goto <L8>; else goto <L2>;

    <L8>:;
    goto <bb 2(<L0>);

    no idea where that TMT.5 comes from (it's from the const qualifier, but
    the vectorizer makes this up itself).
  • No.4 | | 1284 bytes | |

    Comment #6 from rguenth at gcc dot gnu dot org 2006-07-21 11:00
    Backporting

    Author: dberlin
    Date: Wed Feb 15 22:09:45 2006
    New Revision: 111120

    URL:
    Log:
    2006-02-15 Daniel Berlin <dberlin (AT) dberlin (DOT) org>

    * tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READNLY
    specially here.

    causes this problem to go away. For reference:

    Index: gcc/tree-ssa-alias.c

    gcc/tree-ssa-alias.c (revision 115613)
    gcc/tree-ssa-alias.c (working copy)
    @@ -1818,8 +1818,7 @@ get_tmt_for (tree ptr, struct alias_info
    {
    struct alias_map_d *curr = ai->pointers[i];
    tree curr_tag = var_ann (curr->var)->type_mem_tag;
    - if (tag_set == curr->set
    - && TYPE_READNLY (tag_type) == TYPE_READNLY (TREE_TYPE (curr_tag)))
    + if (tag_set == curr->set)
    {
    tag = curr_tag;
    break;
    @@ -1856,10 +1855,6 @@ get_tmt_for (tree ptr, struct alias_info
    pointed-to type. */
    gcc_assert (tag_set == get_alias_set (tag));
    - /* If PTR's pointed-to type is read-only, then TAG's type must also
    - be read-only. */
    - gcc_assert (TYPE_READNLY (tag_type) == TYPE_READNLY (TREE_TYPE (tag)));
    -
    return tag;
    }

    I'm going to bootstrap and test that backport.
  • No.5 | | 426 bytes | |

    Comment #5 from rguenth at gcc dot gnu dot org 2006-07-21 10:41
    the mainline we produce

    Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4
    Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4

    while 4.1 branch does

    Variable: D.1604, UID 1604, float *, type memory tag: TMT.5
    Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6

    anyones bell ringing?
  • No.6 | | 956 bytes | |

    Comment #4 from rguenth at gcc dot gnu dot org 2006-07-21 10:35
    This looks like a data-ref bug or an aliasing issue.
    get's for the statement

    (gdb) call debug_generic_expr (stmt)
    # VUSE <TMT.8D.2162_28>;
    D.2129_17 = *D.2128_12

    as it calls get_var_ann on D.2128_12:

    (gdb) print *$23
    $24 = {common = {type = VAR_ANN, aux = 0x0, value_handle = 0x0},
    out_of_ssa_tag = 0, root_var_processed = 0, mem_tag_kind = NT_A_TAG,
    is_alias_tag = 0, used = 0, need_phi_state = NEED_PHI_STATE_MAYBE,
    in_vuse_list = 0, in_v_may_def_list = 0, type_mem_tag = 0xb7d8c5d8,
    may_aliases = 0x0, partition = 0, root_index = 0, default_def = 0x0,
    current_def = 0x0, reference_vars_info = 0x0, subvars = 0x0}
    (gdb) call debug_generic_expr ($23->type_mem_tag)
    TMT.7D.2161

    so there is a discrepancy between the VUSE (which is correct) and the
    type_mem_tag on the variable. (const vs. non-const type)
  • No.7 | | 198 bytes | |

    Comment #7 from patchapp at dberlin dot org 2006-07-21 12:25
    Subject: Bug number PR28029
    A patch for this bug has been added to the patch tracker.
    The mailing list url for the patch is
  • No.8 | | 890 bytes | |

    Comment #8 from dberlin at gcc dot gnu dot org 2006-07-22 13:30
    Subject: Re: [4.1 Regression] wrong optimization
    with -ftree-vectorize

    rguenth at gcc dot gnu dot org wrote:
    Comment #5 from rguenth at gcc dot gnu dot org 2006-07-21 10:41
    the mainline we produce

    Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4
    Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4

    while 4.1 branch does

    Variable: D.1604, UID 1604, float *, type memory tag: TMT.5
    Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6

    anyones bell ringing?

    Well, the symbol difference was likely caused by the TYPE_READNLY
    comparison that ensured that readonly types got their own SMT.
    However, doing *that* was simply wrong, even though it worked the
    majority of the time :).

    , i see you figured that out :)
  • No.9 | | 625 bytes | |

    Comment #9 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
    Fixed.

    Comment #10 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
    Subject: Bug 28029

    Author: rguenth
    Date: Mon Jul 24 08:18:51 2006
    New Revision: 115708

    URL:
    Log:
    2006-07-24 Richard Guenther <rguenther (AT) suse (DOT) de>

    PR tree-optimization/28029
    Backport
    2006-02-15 Daniel Berlin <dberlin (AT) dberlin (DOT) org>

    * tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READNLY
    specially here.

    * gcc.dg/vect/pr28029.c: New testcase.

    Added:

    Modified:
  • No.10 | | 625 bytes | |

    Comment #9 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
    Fixed.

    Comment #10 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
    Subject: Bug 28029

    Author: rguenth
    Date: Mon Jul 24 08:18:51 2006
    New Revision: 115708

    URL:
    Log:
    2006-07-24 Richard Guenther <rguenther (AT) suse (DOT) de>

    PR tree-optimization/28029
    Backport
    2006-02-15 Daniel Berlin <dberlin (AT) dberlin (DOT) org>

    * tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READNLY
    specially here.

    * gcc.dg/vect/pr28029.c: New testcase.

    Added:

    Modified:

Re: New: missed optimization with -ftree-vectorize


max 4000 letters.
Your nickname that display:
In order to stop the spam: 5 + 4 =
QUESTION ON "Development"

EMSDN.COM