New: missed optimization with -ftree-vectorize
10 answers - 7309 bytes -

The following short testcase gets vectorized with 4.1.1 and doesn't with 4.2.0
revision 114610
template <class T>
class vec
{
public:
vec(unsigned int n) : size_(n)
{
data_ = new T[n];
}
vec& multiply(const vec& other)
{
const T* op=other.data_;
for (unsigned int i=0; i<size_; ++i) {
data_[i] *= op[i];
}
return *this;
}
private:
unsigned int size_;
T* data_;
};
template class vec<float>;
/usr/local/4.2/bin/g++4.2.0 -ftree-vectorize -ftree-vectorizer-verbose=7
-march=pentium-m -c vectorizer.cpp
vectorizer.cpp:16: note: analyze_loop_nest
vectorizer.cpp:16: note: vect_analyze_loop_form
vectorizer.cpp:16: note: split exit edge.
vectorizer.cpp:16: note: get_loop_niters
vectorizer.cpp:16: note: get_loop_niters:D.2376_16
vectorizer.cpp:16: note: Symbolic number of iterations is D.2376_16
vectorizer.cpp:16: note: vect_analyze_data_refs
vectorizer.cpp:16: note: get vectype with 4 units of type float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: get vectype with 4 units of type const float
vectorizer.cpp:16: note: vectype: const vector float
vectorizer.cpp:16: note: get vectype with 4 units of type float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: vect_analyze_scalar_cycles
vectorizer.cpp:16: note: Analyze phi: SMT.6_28 = PHI <SMT.6_27(5),
SMT.6_26(3)>;
vectorizer.cpp:16: note: virtual phi. skip.
vectorizer.cpp:16: note: Analyze phi: i_4 = PHI <i_23(5), 0(3)>;
vectorizer.cpp:16: note: Access function of PHI: {0, +, 1}_1
vectorizer.cpp:16: note: step: 1, init: 0
vectorizer.cpp:16: note: Detected induction.
vectorizer.cpp:16: note: vect_pattern_recog
vectorizer.cpp:16: note:
vectorizer.cpp:16: note: init: phi relevant? SMT.6_28 = PHI <SMT.6_27(5),
SMT.6_26(3)>;
vectorizer.cpp:16: note: init: phi relevant? i_4 = PHI <i_23(5), 0(3)>;
vectorizer.cpp:16: note: init: stmt relevant? <L0>:
vectorizer.cpp:16: note: init: stmt relevant? D.2378_9 = pretmp.24_1
vectorizer.cpp:16: note: init: stmt relevant? D.2379_10 = i_4 * 4
vectorizer.cpp:16: note: init: stmt relevant? D.2380_11 = (float *) D.2379_10
vectorizer.cpp:16: note: init: stmt relevant? D.2381_12 = pretmp.24_1 +
D.2380_11
vectorizer.cpp:16: note: init: stmt relevant? D.2382_17 = *D.2381_12
vectorizer.cpp:16: note: init: stmt relevant? D.2383_19 = (const float *)
D.2379_10
vectorizer.cpp:16: note: init: stmt relevant? D.2384_20 = D.2383_19 + op_3
vectorizer.cpp:16: note: init: stmt relevant? D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: init: stmt relevant? D.2386_22 = D.2382_17 * D.2385_21
vectorizer.cpp:16: note: init: stmt relevant? *D.2381_12 = D.2386_22
vectorizer.cpp:16: note: vec_stmt_relevant_p: stmt has vdefs.
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: init: stmt relevant? i_23 = i_4 + 1
vectorizer.cpp:16: note: init: stmt relevant? if (D.2376_16 i_23) goto <L9>;
else goto <L12>;
vectorizer.cpp:16: note: init: stmt relevant? <L9>:
vectorizer.cpp:16: note: worklist: examine stmt: *D.2381_12 = D.2386_22
vectorizer.cpp:16: note: vect_is_simple_use: operand D.2386_22
vectorizer.cpp:16: note: def_stmt: D.2386_22 = D.2382_17 * D.2385_21
vectorizer.cpp:16: note: type of def: 2.
vectorizer.cpp:16: note: worklist: examine use 2: D.2386_22
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: worklist: examine stmt: D.2386_22 = D.2382_17 *
D.2385_21
vectorizer.cpp:16: note: vect_is_simple_use: operand D.2382_17
vectorizer.cpp:16: note: def_stmt: D.2382_17 = *D.2381_12
vectorizer.cpp:16: note: type of def: 2.
vectorizer.cpp:16: note: worklist: examine use 2: D.2382_17
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: vect_is_simple_use: operand D.2385_21
vectorizer.cpp:16: note: def_stmt: D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: type of def: 2.
vectorizer.cpp:16: note: worklist: examine use 2: D.2385_21
vectorizer.cpp:16: note: mark relevant 1, live 0.
vectorizer.cpp:16: note: worklist: examine stmt: D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: worklist: examine stmt: D.2382_17 = *D.2381_12
vectorizer.cpp:16: note:
vectorizer.cpp:16: note:
vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1
vectorizer.cpp:16: note:
vectorizer.cpp:16: note: Unknown alignment for access: *op_3
vectorizer.cpp:16: note:
vectorizer.cpp:16: note: Unknown alignment for access: *pretmp.24_1
vectorizer.cpp:16: note:
vectorizer.cpp:16: note: examining statement: <L0>:
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: D.2378_9 = pretmp.24_1
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: D.2379_10 = i_4 * 4
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: D.2380_11 = (float *)
D.2379_10
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: D.2381_12 = pretmp.24_1 +
D.2380_11
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: D.2382_17 = *D.2381_12
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: examining statement: D.2383_19 = (const float *)
D.2379_10
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: D.2384_20 = D.2383_19 + op_3
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: D.2385_21 = *D.2384_20
vectorizer.cpp:16: note: vectype: const vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: examining statement: D.2386_22 = D.2382_17 *
D.2385_21
vectorizer.cpp:16: note: get vectype for scalar type: float
vectorizer.cpp:16: note: get vectype with 4 units of type float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: examining statement: *D.2381_12 = D.2386_22
vectorizer.cpp:16: note: vectype: vector float
vectorizer.cpp:16: note: nunits = 4
vectorizer.cpp:16: note: examining statement: i_23 = i_4 + 1
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: if (D.2376_16 i_23) goto
<L9>; else goto <L12>;
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: examining statement: <L9>:
vectorizer.cpp:16: note: skip.
vectorizer.cpp:16: note: vect_analyze_dependences
vectorizer.cpp:16: note: dependence distance = 0.
vectorizer.cpp:16: note: accesses have the same alignment.
vectorizer.cpp:16: note: dependence distance modulo vf == 0 between *D.2381_12
and *D.2381_12
vectorizer.cpp:16: note: not vectorized: can't determine dependence between
*D.2384_20 and *D.2381_12
vectorizer.cpp:16: note: bad data dependence.
vectorizer.cpp:16: note: vectorized 0 loops in function.
The workaround with "op" is not needed with the current autovect-branch BTW.
No.1 | | 115 bytes |
| 
Comment #1 from pinskia at gcc dot gnu dot org 2006-06-14 13:55
Actually I think this is wrong code with 4.1.x.
No.2 | | 440 bytes |
| 
Comment #2 from pinskia at gcc dot gnu dot org 2006-06-14 14:05
The code is basicially the same as:
void multiply(float *data_, const float *op, unsigned int size_)
{
for (unsigned int i=0; i<size_; ++i)
data_[i] *= op[i];
}
And what happens is op is data_ + 3 and size_ is 6, we will get the wrong
answer as there will be no feedback in the loop.
Anyways this is a 4.1 bug fixed already in 4.2.0
No.3 | | 1375 bytes |
| 
Comment #3 from rguenth at gcc dot gnu dot org 2006-07-10 13:24
Confirmed. In 4.1, the data-refs have the wrong memtag associated:
Created dr for *D.2061_7
base_address: data__6
offset from base address: 0
constant offset from base address: 0
base_object:
step: 4B
misalignment from base: 0B
aligned to: 4
memtag: TMT.5
Created dr for *D.2064_15
base_address: op_14
offset from base address: 0
constant offset from base address: 0
base_object:
step: 4B
misalignment from base: 0B
aligned to: 4
memtag: TMT.6
after ifcvt:
# TMT.6_22 = PHI <TMT.6_21(3), TMT.6_20(1)>;
# i_2 = PHI <i_18(3), 0(1)>;
<L0>:;
D.2059_4 = i_2 * 4;
D.2060_5 = (float *) D.2059_4;
D.2061_7 = D.2060_5 + data__6;
# VUSE <TMT.6_22>;
D.2062_11 = *D.2061_7;
D.2063_13 = (const float *) D.2059_4;
D.2064_15 = D.2063_13 + op_14;
# VUSE <TMT.6_22>;
D.2065_16 = *D.2064_15;
D.2066_17 = D.2062_11 * D.2065_16;
# TMT.6_21 = V_MAY_DEF <TMT.6_22>;
*D.2061_7 = D.2066_17;
i_18 = i_2 + 1;
if (size__3 i_18) goto <L8>; else goto <L2>;
<L8>:;
goto <bb 2(<L0>);
no idea where that TMT.5 comes from (it's from the const qualifier, but
the vectorizer makes this up itself).
No.4 | | 1284 bytes |
| 
Comment #6 from rguenth at gcc dot gnu dot org 2006-07-21 11:00
Backporting
Author: dberlin
Date: Wed Feb 15 22:09:45 2006
New Revision: 111120
URL:
Log:
2006-02-15 Daniel Berlin <dberlin (AT) dberlin (DOT) org>
* tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READNLY
specially here.
causes this problem to go away. For reference:
Index: gcc/tree-ssa-alias.c
gcc/tree-ssa-alias.c (revision 115613)
gcc/tree-ssa-alias.c (working copy)
@@ -1818,8 +1818,7 @@ get_tmt_for (tree ptr, struct alias_info
{
struct alias_map_d *curr = ai->pointers[i];
tree curr_tag = var_ann (curr->var)->type_mem_tag;
- if (tag_set == curr->set
- && TYPE_READNLY (tag_type) == TYPE_READNLY (TREE_TYPE (curr_tag)))
+ if (tag_set == curr->set)
{
tag = curr_tag;
break;
@@ -1856,10 +1855,6 @@ get_tmt_for (tree ptr, struct alias_info
pointed-to type. */
gcc_assert (tag_set == get_alias_set (tag));
- /* If PTR's pointed-to type is read-only, then TAG's type must also
- be read-only. */
- gcc_assert (TYPE_READNLY (tag_type) == TYPE_READNLY (TREE_TYPE (tag)));
-
return tag;
}
I'm going to bootstrap and test that backport.
No.5 | | 426 bytes |
| 
Comment #5 from rguenth at gcc dot gnu dot org 2006-07-21 10:41
the mainline we produce
Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4
Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4
while 4.1 branch does
Variable: D.1604, UID 1604, float *, type memory tag: TMT.5
Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6
anyones bell ringing?
No.6 | | 956 bytes |
| 
Comment #4 from rguenth at gcc dot gnu dot org 2006-07-21 10:35
This looks like a data-ref bug or an aliasing issue.
get's for the statement
(gdb) call debug_generic_expr (stmt)
# VUSE <TMT.8D.2162_28>;
D.2129_17 = *D.2128_12
as it calls get_var_ann on D.2128_12:
(gdb) print *$23
$24 = {common = {type = VAR_ANN, aux = 0x0, value_handle = 0x0},
out_of_ssa_tag = 0, root_var_processed = 0, mem_tag_kind = NT_A_TAG,
is_alias_tag = 0, used = 0, need_phi_state = NEED_PHI_STATE_MAYBE,
in_vuse_list = 0, in_v_may_def_list = 0, type_mem_tag = 0xb7d8c5d8,
may_aliases = 0x0, partition = 0, root_index = 0, default_def = 0x0,
current_def = 0x0, reference_vars_info = 0x0, subvars = 0x0}
(gdb) call debug_generic_expr ($23->type_mem_tag)
TMT.7D.2161
so there is a discrepancy between the VUSE (which is correct) and the
type_mem_tag on the variable. (const vs. non-const type)
No.7 | | 198 bytes |
| 
Comment #7 from patchapp at dberlin dot org 2006-07-21 12:25
Subject: Bug number PR28029
A patch for this bug has been added to the patch tracker.
The mailing list url for the patch is
No.8 | | 890 bytes |
| 
Comment #8 from dberlin at gcc dot gnu dot org 2006-07-22 13:30
Subject: Re: [4.1 Regression] wrong optimization
with -ftree-vectorize
rguenth at gcc dot gnu dot org wrote:
Comment #5 from rguenth at gcc dot gnu dot org 2006-07-21 10:41
the mainline we produce
Variable: D.1848, UID 1848, float *, symbol memory tag: SMT.4
Variable: D.1851, UID 1851, const float *, symbol memory tag: SMT.4
while 4.1 branch does
Variable: D.1604, UID 1604, float *, type memory tag: TMT.5
Variable: D.1607, UID 1607, const float *, type memory tag: TMT.6
anyones bell ringing?
Well, the symbol difference was likely caused by the TYPE_READNLY
comparison that ensured that readonly types got their own SMT.
However, doing *that* was simply wrong, even though it worked the
majority of the time :).
, i see you figured that out :)
No.9 | | 625 bytes |
| 
Comment #9 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
Fixed.
Comment #10 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
Subject: Bug 28029
Author: rguenth
Date: Mon Jul 24 08:18:51 2006
New Revision: 115708
URL:
Log:
2006-07-24 Richard Guenther <rguenther (AT) suse (DOT) de>
PR tree-optimization/28029
Backport
2006-02-15 Daniel Berlin <dberlin (AT) dberlin (DOT) org>
* tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READNLY
specially here.
* gcc.dg/vect/pr28029.c: New testcase.
Added:
Modified:
No.10 | | 625 bytes |
| 
Comment #9 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
Fixed.
Comment #10 from rguenth at gcc dot gnu dot org 2006-07-24 08:19
Subject: Bug 28029
Author: rguenth
Date: Mon Jul 24 08:18:51 2006
New Revision: 115708
URL:
Log:
2006-07-24 Richard Guenther <rguenther (AT) suse (DOT) de>
PR tree-optimization/28029
Backport
2006-02-15 Daniel Berlin <dberlin (AT) dberlin (DOT) org>
* tree-ssa-alias.c (get_tmt_for): Don't handle TYPE_READNLY
specially here.
* gcc.dg/vect/pr28029.c: New testcase.
Added:
Modified: