Development

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Fix memory corruption in operands

    11 answers - 1735 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi,
    this bug has caused number of very interesting memory corruptions (including
    crashes in valgrind, glibc checking faults and GCC binary locking in mutex and
    blocking memory tester for two days) and took me about this afternoon to
    localize :). The problem is that vop_free_bucket_size and NUM_VP_FREE_BUCKETS
    are out of sync. vop_free_bucket_size return values in the range 045, while
    NUM_VP_FREE_BUCKETS is set to 27.
    I am testing the attached patch on i686-linux and x86_64-linux and will commit
    it as obvious hopefully before midinght so next run of C++ testers gets it.
    This also fix the wave faiulre that appeared after early inlining commit on C++
    tester.
    Honza
    * tree-ssa-operands.c (vop_free_bucket_size): Never return value
    greater than NUM_VP_FREE_BUCKETS.
    Index: tree-ssa-operands.c
    tree-ssa-operands.c(revision 120900)
    tree-ssa-operands.c(working copy)
    vop_free_bucket_size (int bucket)
    305,319
    static inline int
    vop_free_bucket_index (int num)
    {
    ! gcc_assert (num 0);
    /* Sizes 1 through 16 use buckets 0-15. */
    if (num <= 16)
    return num - 1;
    ! /* Buckets 16 - 45 represent 17 through 256 in 8 unit chunks. */
    ! if (num < 256)
    ! return 14 + (num - 1) / 8;
    ! return -1;
    }
    305,321
    static inline int
    vop_free_bucket_index (int num)
    {
    ! gcc_assert (num 0 && NUM_VP_FREE_BUCKETS 16);
    /* Sizes 1 through 16 use buckets 0-15. */
    if (num <= 16)
    return num - 1;
    ! /* Buckets 16 - NUM_VP_FREE_BUCKETS represent 8 unit chunks. */
    ! num = 14 + (num - 1) / 8;
    ! if (num >= NUM_VP_FREE_BUCKETS)
    ! return -1;
    ! else
    ! return num;
    }
  • No.1 | | 2174 bytes | |

    Thu, 2007-01-18 at 22:08 +0100, Jan Hubicka wrote:
    Hi,
    this bug has caused number of very interesting memory corruptions (including
    crashes in valgrind, glibc checking faults and GCC binary locking in mutex and
    blocking memory tester for two days) and took me about this afternoon to
    localize :). The problem is that vop_free_bucket_size and NUM_VP_FREE_BUCKETS
    are out of sync. vop_free_bucket_size return values in the range 045, while
    NUM_VP_FREE_BUCKETS is set to 27.

    Hrm. Actually, its set to 29. Apparently when I trimmed the maximum
    bucket size from 256 to 128, I missed changing it in one spot. Sorry
    for the hassle, and thanks for tracking it down.

    I am testing the attached patch on i686-linux and x86_64-linux and will commit
    it as obvious hopefully before midinght so next run of C++ testers gets it.
    This also fix the wave faiulre that appeared after early inlining commit on C++
    tester.
    Honza

    * tree-ssa-operands.c (vop_free_bucket_size): Never return value
    greater than NUM_VP_FREE_BUCKETS.
    Index: tree-ssa-operands.c

    tree-ssa-operands.c(revision 120900)
    tree-ssa-operands.c(working copy)
    vop_free_bucket_size (int bucket)
    305,319
    static inline int
    vop_free_bucket_index (int num)
    {
    ! gcc_assert (num 0);

    /* Sizes 1 through 16 use buckets 0-15. */
    if (num <= 16)
    return num - 1;
    ! /* Buckets 16 - 45 represent 17 through 256 in 8 unit chunks. */
    ! if (num < 256)
    ! return 14 + (num - 1) / 8;
    ! return -1;
    }

    305,321
    static inline int
    vop_free_bucket_index (int num)
    {
    ! gcc_assert (num 0 && NUM_VP_FREE_BUCKETS 16);

    /* Sizes 1 through 16 use buckets 0-15. */
    if (num <= 16)
    return num - 1;
    ! /* Buckets 16 - NUM_VP_FREE_BUCKETS represent 8 unit chunks. */
    ! num = 14 + (num - 1) / 8;
    ! if (num >= NUM_VP_FREE_BUCKETS)
    ! return -1;
    ! else
    ! return num;
    }

    K.

    Are you seeing many VPS larger than 128? I originally set it to 256,
    but only ever saw one case where it was bigger than 127, so I dropped
    the upper limit on bucket size.

    Andrew
  • No.2 | | 2360 bytes | |

    Thu, 2007-01-18 at 22:08 +0100, Jan Hubicka wrote:
    Hi,
    this bug has caused number of very interesting memory corruptions (including
    crashes in valgrind, glibc checking faults and GCC binary locking in mutex and
    blocking memory tester for two days) and took me about this afternoon to
    localize :). The problem is that vop_free_bucket_size and NUM_VP_FREE_BUCKETS
    are out of sync. vop_free_bucket_size return values in the range 045, while
    NUM_VP_FREE_BUCKETS is set to 27.

    Hrm. Actually, its set to 29. Apparently when I trimmed the maximum
    bucket size from 256 to 128, I missed changing it in one spot. Sorry
    for the hassle, and thanks for tracking it down.

    I am testing the attached patch on i686-linux and x86_64-linux and will commit
    it as obvious hopefully before midinght so next run of C++ testers gets it.
    This also fix the wave faiulre that appeared after early inlining commit on C++
    tester.
    Honza

    * tree-ssa-operands.c (vop_free_bucket_size): Never return value
    greater than NUM_VP_FREE_BUCKETS.
    Index: tree-ssa-operands.c

    tree-ssa-operands.c(revision 120900)
    tree-ssa-operands.c(working copy)
    vop_free_bucket_size (int bucket)
    305,319
    static inline int
    vop_free_bucket_index (int num)
    {
    ! gcc_assert (num 0);

    /* Sizes 1 through 16 use buckets 0-15. */
    if (num <= 16)
    return num - 1;
    ! /* Buckets 16 - 45 represent 17 through 256 in 8 unit chunks. */
    ! if (num < 256)
    ! return 14 + (num - 1) / 8;
    ! return -1;
    }

    305,321
    static inline int
    vop_free_bucket_index (int num)
    {
    ! gcc_assert (num 0 && NUM_VP_FREE_BUCKETS 16);

    /* Sizes 1 through 16 use buckets 0-15. */
    if (num <= 16)
    return num - 1;
    ! /* Buckets 16 - NUM_VP_FREE_BUCKETS represent 8 unit chunks. */
    ! num = 14 + (num - 1) / 8;
    ! if (num >= NUM_VP_FREE_BUCKETS)
    ! return -1;
    ! else
    ! return num;
    }

    K.

    Are you seeing many VPS larger than 128? I originally set it to 256,
    but only ever saw one case where it was bigger than 127, so I dropped
    the upper limit on bucket size.

    Yes, on the wave I definitly got positives on gcc_assert on num <=
    NUM_VP_FREE_BUCKETS.
    I will send you prerpdocessed testcase (by Richard) in separate mail.

    Honza

    Andrew
  • No.3 | | 791 bytes | |


    K.

    Are you seeing many VPS larger than 128? I originally set it to 256,
    but only ever saw one case where it was bigger than 127, so I dropped
    the upper limit on bucket size.

    Also with early inlining in, we seem to be producing bigger function
    bodies hitting this limit on fairly usual testcases, like Gerald's
    testcase used by memory tester.
    I will investigate if everything in inliner is going as expected, but
    overall this is pretty good thing to flatten the C++ abstraction and it
    helps tramp3d considerably. (unfortunately tramp3d is only benchmark
    exhibition such a noticeable benefit from the change and freefem3d
    regress. Not precisely what I hoped for, but retuning of inliner is on
    the way).

    Honza

    Andrew
  • No.4 | | 1506 bytes | |

    Thu, 2007-01-18 at 22:55 +0100, Jan Hubicka wrote:

    K.

    Are you seeing many VPS larger than 128? I originally set it to 256,
    but only ever saw one case where it was bigger than 127, so I dropped
    the upper limit on bucket size.

    Also with early inlining in, we seem to be producing bigger function
    bodies hitting this limit on fairly usual testcases, like Gerald's
    testcase used by memory tester.

    IIRC that after MEMSSA, this limit only gets hit by having a large
    number of global variables (or file level or whatever). with MEMSSA
    before your changes, we didn't get that many very often (empirical
    collection from a bunch of large files), so I arbitrarily chose 128 as
    the point of diminished returns.

    With your changes, the limit or bucket sizes may very well need to be
    modified. I would expect to see more of that type of thing. This was
    just a first go untl we saw we needed something different.

    I will investigate if everything in inliner is going as expected, but
    overall this is pretty good thing to flatten the C++ abstraction and it
    helps tramp3d considerably. (unfortunately tramp3d is only benchmark
    exhibition such a noticeable benefit from the change and freefem3d
    regress. Not precisely what I hoped for, but retuning of inliner is on
    the way).

    And I expect re-tuning vop buckets may be required too since the
    underlying assumptions may well be different now :-)

    Andrew
  • No.5 | | 3099 bytes | |

    Hi,

    IIRC that after MEMSSA, this limit only gets hit by having a large
    number of global variables (or file level or whatever). with MEMSSA
    before your changes, we didn't get that many very often (empirical
    collection from a bunch of large files), so I arbitrarily chose 128 as
    the point of diminished returns.

    With your changes, the limit or bucket sizes may very well need to be
    modified. I would expect to see more of that type of thing. This was
    just a first go untl we saw we needed something different.

    I've checked and everything seems to work sort of as expected. We
    clearly inline more than we ought to (since our limits are still tuned
    to expect unoptimized code), but retuning the limits should not change
    behaviour that much.

    problem I wanted to draw into your attention is the memory
    consumption of datastructures associated with operands. With
    we can now get footprint after all are transfered
    to SSA and before we release function bodies of compiled functions (that
    will become critical for LT). In this footprint the tree-ssa-operands
    is major consumer.

    I was thinking on several things we can reduce memory usage by"

    1) Currently we allocate operand caches per-function basis. Since on
    tramp3d we redlete about 90% of statements during early cleanups, I
    would expect about 90% of the memory do be dead. Can we have just
    one global freelist cache for all functions? (it would probably
    have problem if we are leaking some unused operands unfreed, but
    that is bad anyway)

    If this fails for some reason, perhaps we can simply drop all the
    operands memory after early cleanups and rebuild them, but it is bit
    zelaous way of doing so.
    2) The annotations are big too. Perhaps we can mix all
    vuses/uses/defs/vdefs into one vector and modify iterators to simply
    skip ones of different type? At least for uses+vuses/defs+vdefs mix
    it should work pretty well, since numbers of real operands are
    limited
    3) Danny observed that addresses_taken is computed by by
    tree-ssa-operands just to be used on one place and just after
    recomputing the statements operands. Perhaps it can be dropped
    completely and maintained just one per operands computed on demand?
    4) loads/stores bitmaps can probably be merged into one bitmap (by i.e
    encoding for even/odd numbers) or even move into on-side hash since
    they are usually not set.

    Does some of this seem to make sense?
    Honza

    I will investigate if everything in inliner is going as expected, but
    overall this is pretty good thing to flatten the C++ abstraction and it
    helps tramp3d considerably. (unfortunately tramp3d is only benchmark
    exhibition such a noticeable benefit from the change and freefem3d
    regress. Not precisely what I hoped for, but retuning of inliner is on
    the way).

    And I expect re-tuning vop buckets may be required too since the
    underlying assumptions may well be different now :-)

    Andrew

  • No.6 | | 3283 bytes | |

    Sun, 2007-01-21 at 04:08 +0100, Jan Hubicka wrote:
    Hi,

    problem I wanted to draw into your attention is the memory
    consumption of datastructures associated with operands. With
    we can now get footprint after all are transfered
    to SSA and before we release function bodies of compiled functions (that
    will become critical for LT). In this footprint the tree-ssa-operands
    is major consumer.

    I was thinking on several things we can reduce memory usage by"

    1) Currently we allocate operand caches per-function basis. Since on
    tramp3d we redlete about 90% of statements during early cleanups, I
    would expect about 90% of the memory do be dead. Can we have just
    one global freelist cache for all functions? (it would probably
    have problem if we are leaking some unused operands unfreed, but
    that is bad anyway)

    If this fails for some reason, perhaps we can simply drop all the
    operands memory after early cleanups and rebuild them, but it is bit
    zelaous way of doing so.

    I think we discussed being able to use the free-list in a global way
    back when you were doing the initial file conversions. There is no
    reason that can't be done.

    we shouldn't be leaking any operands, it should be an enclosed system.

    rebuilding is also easy to do if it comes to that. Real operands
    can certainly be rebuilt very quickly. Question. When you inline a
    function, and there are virtual operands, do you have to do update ssa
    on those newly inlined stmts? I assume so, which means you are in
    effect rebuilding the virtual operands on those stmts as well aren't
    you? If that's the case, you might be better off dumping the operand
    cache anyway

    2) The annotations are big too. Perhaps we can mix all
    vuses/uses/defs/vdefs into one vector and modify iterators to simply
    skip ones of different type? At least for uses+vuses/defs+vdefs mix
    it should work pretty well, since numbers of real operands are
    limited

    I originally did some experiments with that type of thing (one vector
    with indexes or pointers into it, and there were some unpleasant compile
    time implications, IIRC, as well as some other issue I forget. I could
    revisit it again if it turns out to be a big issue.

    3) Danny observed that addresses_taken is computed by by
    tree-ssa-operands just to be used on one place and just after
    recomputing the statements operands. Perhaps it can be dropped
    completely and maintained just one per operands computed on demand?

    This is Diego's stuff I think, he's best able to comment on that.

    4) loads/stores bitmaps can probably be merged into one bitmap (by i.e
    encoding for even/odd numbers) or even move into on-side hash since
    they are usually not set.

    More of Diego's mem-ssa stuff :-) I beleive those bitmaps were a bit of
    a pox to him as well. We had discussed ways of saving memory by hashing
    the bitmaps to return a single value which would get used by the stmt.
    This would allow sharing of common bitmaps and reduce the memory
    footprint. We discussed other possibilities as well, but I don't think
    he has gotten to any of them yet.

    Andrew
  • No.7 | | 4758 bytes | |

    Sun, 2007-01-21 at 04:08 +0100, Jan Hubicka wrote:
    Hi,

    problem I wanted to draw into your attention is the memory
    consumption of datastructures associated with operands. With
    we can now get footprint after all are transfered
    to SSA and before we release function bodies of compiled functions (that
    will become critical for LT). In this footprint the tree-ssa-operands
    is major consumer.

    I was thinking on several things we can reduce memory usage by"

    1) Currently we allocate operand caches per-function basis. Since on
    tramp3d we redlete about 90% of statements during early cleanups, I
    would expect about 90% of the memory do be dead. Can we have just
    one global freelist cache for all functions? (it would probably
    have problem if we are leaking some unused operands unfreed, but
    that is bad anyway)

    If this fails for some reason, perhaps we can simply drop all the
    operands memory after early cleanups and rebuild them, but it is bit
    zelaous way of doing so.

    I think we discussed being able to use the free-list in a global way
    back when you were doing the initial file conversions. There is no
    reason that can't be done.

    we shouldn't be leaking any operands, it should be an enclosed system.

    Good, so we take care to remove operands of all dead statements and put
    them back to freelist, right?

    rebuilding is also easy to do if it comes to that. Real operands
    can certainly be rebuilt very quickly. Question. When you inline a
    function, and there are virtual operands, do you have to do update ssa
    on those newly inlined stmts? I assume so, which means you are in
    effect rebuilding the virtual operands on those stmts as well aren't
    you? If that's the case, you might be better off dumping the operand
    cache anyway

    Yes, I simply do update_stmt after copying the statement expecting
    operand scanner to build operands for it, followed by regular update_ssa
    to fixup some special cases where I don't produce SSA form, but direct
    VAR_DECLs. We don't have aliasing built, so at the moment we don't have
    virutal operands really at this level.
    I am evaulating posibility to build aliasing. Early results show about
    100MB memory increase on tramp3d that is not _that_ bad. But I still
    can't compile non-trivial testcases so I don't know what the final
    outcome might be.

    I would actually preffer not dumping the caches for simple reason - we
    want to write IPA analyzers in as similar way to local GIMPLE analyzes
    as possible, so it would be better if the invariants on IPA and local
    level was essentially the same. (for instance ipa-pure-const and other
    passes can be greatly simplified if they used ssa-operands now).

    2) The annotations are big too. Perhaps we can mix all
    vuses/uses/defs/vdefs into one vector and modify iterators to simply
    skip ones of different type? At least for uses+vuses/defs+vdefs mix
    it should work pretty well, since numbers of real operands are
    limited

    I originally did some experiments with that type of thing (one vector
    with indexes or pointers into it, and there were some unpleasant compile
    time implications, IIRC, as well as some other issue I forget. I could
    revisit it again if it turns out to be a big issue.

    Stmt annotations are definitly important from memory footprint (I will
    generate fresh numbers and send), so trying it might be interesting
    experiment.

    3) Danny observed that addresses_taken is computed by by
    tree-ssa-operands just to be used on one place and just after
    recomputing the statements operands. Perhaps it can be dropped
    completely and maintained just one per operands computed on demand?

    This is Diego's stuff I think, he's best able to comment on that.

    4) loads/stores bitmaps can probably be merged into one bitmap (by i.e
    encoding for even/odd numbers) or even move into on-side hash since
    they are usually not set.

    More of Diego's mem-ssa stuff :-) I beleive those bitmaps were a bit of
    a pox to him as well. We had discussed ways of saving memory by hashing
    the bitmaps to return a single value which would get used by the stmt.
    This would allow sharing of common bitmaps and reduce the memory
    footprint. We discussed other possibilities as well, but I don't think
    he has gotten to any of them yet.

    Yeah, I even have function for hashing now, so doing so is interesting.
    (but still combining the two bitmaps into one would save the extra
    pointer allocated in every individual stmt annotation)

    Thanks,
    Honza

    Andrew
  • No.8 | | 2742 bytes | |

    Mon, 2007-01-22 at 17:36 +0100, Jan Hubicka wrote:
    Sun, 2007-01-21 at 04:08 +0100, Jan Hubicka wrote:
    Hi,

    I think we discussed being able to use the free-list in a global way
    back when you were doing the initial file conversions. There is no
    reason that can't be done.

    we shouldn't be leaking any operands, it should be an enclosed system.

    Good, so we take care to remove operands of all dead statements and put
    them back to freelist, right?

    as long as the annotation is freed, they end up in the free list. Right
    now, we never lose anything in a function because when we free SSA,
    *all* the pointers to operands are allocated from the memory chunks
    which are managed by the operand cache. At the point, even if we had
    lost the handle to some of the memory via stray annotations that got
    lost, freeing the cache up still makes the memory go away for the
    operand cache.

    you will of course lose that if you go to a global list because the
    cache will never be freed up until the last function is processed. It
    boils down to how well we manage the annotations. As far as I know, they
    should be handled reasonably well. If we don't, They certainly should
    be anyway :-)

    At least I believe that to be true. Hmm. Perhaps we better verify
    it :-) We definitely manage it properly when operands are updated by
    the cache routines.

    rebuilding is also easy to do if it comes to that. Real operands
    can certainly be rebuilt very quickly. Question. When you inline a
    function, and there are virtual operands, do you have to do update ssa
    on those newly inlined stmts? I assume so, which means you are in
    effect rebuilding the virtual operands on those stmts as well aren't
    you? If that's the case, you might be better off dumping the operand
    cache anyway

    Yes, I simply do update_stmt after copying the statement expecting
    operand scanner to build operands for it, followed by regular update_ssa
    to fixup some special cases where I don't produce SSA form, but direct
    VAR_DECLs. We don't have aliasing built, so at the moment we don't have
    virutal operands really at this level.
    I am evaulating posibility to build aliasing. Early results show about
    100MB memory increase on tramp3d that is not _that_ bad. But I still
    can't compile non-trivial testcases so I don't know what the final
    outcome might be.

    I would actually have expected the virtual operands to chew up more
    memory than the real operands. How much memory do the operands consume
    on tramp3d? perhaps we aren't freeing enough stuff from the
    annotations

    Andrew
  • No.9 | | 4524 bytes | |

    Mon, 2007-01-22 at 17:36 +0100, Jan Hubicka wrote:
    Sun, 2007-01-21 at 04:08 +0100, Jan Hubicka wrote:
    Hi,

    I think we discussed being able to use the free-list in a global way
    back when you were doing the initial file conversions. There is no
    reason that can't be done.

    we shouldn't be leaking any operands, it should be an enclosed system.

    Good, so we take care to remove operands of all dead statements and put
    them back to freelist, right?

    as long as the annotation is freed, they end up in the free list. Right
    now, we never lose anything in a function because when we free SSA,
    *all* the pointers to operands are allocated from the memory chunks
    which are managed by the operand cache. At the point, even if we had
    lost the handle to some of the memory via stray annotations that got
    lost, freeing the cache up still makes the memory go away for the
    operand cache.

    you will of course lose that if you go to a global list because the
    cache will never be freed up until the last function is processed. It
    boils down to how well we manage the annotations. As far as I know, they
    should be handled reasonably well. If we don't, They certainly should
    be anyway :-)

    Well, the only place we call free_ssa_operands is from delete_ssa, so
    all stmt annotations for statments elliminated by DCE and others are
    left dead for a moment I blieve.
    Just to get some hard data, at the end of compilation we have for
    combine.c:

    tree-ssanames.c:143 (make_ssa_name) 1367136: 4.1% 0: 0.0% 0: 0.0% 0: 0.0% 28482
    tree-inline.c:2754 (copy_tree_r) 1372400: 4.2% 0: 0.0% 0: 0.0% 17356: 0.5% 37765
    tree-dfa.c:189 (create_stmt_ann) 1433432: 4.3% 866376: 6.3% 0: 0.0% 0: 0.0% 41068
    genrtl.c:17 (gen_rtx_fmt_ee) 1885404: 5.7% 0: 0.0% 1956: 0.1% 0: 0.0% 157280
    Total 32996633 13710922 3145510 3307173 1365803
    source location Garbage Freed Leak Times

    (this show that stmt annotations are one of major memory consumers, but
    also that amount of them allocated by left unfreeded to them freeded is
    something like 4:6. This seems like pretty big leak for freelists. It
    is of course bigger when amount of dead code increase, ie for example in
    tramp3d).

    Just for the promised data, after IPA (ie when all functions reside in
    memory in SSA form) the footprint is:

    gimplify.c:455 (create_tmp_var_raw) 9504: 0.2% 0: 0.0% 508376: 4.7% 0: 0.0% 5885
    optabs.c:4986 (new_convert_optab) 0: 0.0% 0: 0.0% 602212: 5.6% 176228:27.0% 13
    tree-ssanames.c:143 (make_ssa_name) 99840: 1.7% 0: 0.0% 612864: 5.7% 0: 0.0% 14848
    tree-ssa-operands.c:490 (ssa_operand_alloc) 0: 0.0% 0: 0.0% 867167: 8.1% 33631: 5.2% 151
    tree-inline.c:2754 (copy_tree_r) 41148: 0.7% 0: 0.0% 1009396: 9.4% 14484: 2.2% 28558
    tree-dfa.c:189 (create_stmt_ann) 289184: 4.8% 0: 0.0% 1111768:10.3% 0: 0.0% 25017
    Total 6014810 340680 10769570 652312 375815

    So 10% are the live statement annotations (20% of statement annotations
    are already gabrage collected), 8% operand caches. Again it is more
    steep for tramp3d even tough there it is partly covered by dead C++
    frontend datastructures still being resident.

    At least I believe that to be true. Hmm. Perhaps we better verify
    it :-) We definitely manage it properly when operands are updated by
    the cache routines.

    Which is one of few places where we get the freelists right, sadly.
    I am evaulating posibility to build aliasing. Early results show about
    100MB memory increase on tramp3d that is not _that_ bad. But I still
    can't compile non-trivial testcases so I don't know what the final
    outcome might be.

    I would actually have expected the virtual operands to chew up more
    memory than the real operands. How much memory do the operands consume
    on tramp3d? perhaps we aren't freeing enough stuff from the
    annotations

    I still have just a rough numbers on tramp3d, but I will try to produce
    something more precise and tell you.
    But the 100MB was naturaly combination of virtual operands, aliasing
    tables and such. Everything is pretty big and has tendency to explode
    in side cases that is what makes me bit cureful about pushing it to IPA
    level. I would be sad if 4.3 consumed a lot more memory than previous
    releases just because of the IPA-SSA projects (and I am happy the
    numbers so far seems to be showing the contrary).

    Honza

    Andrew
  • No.10 | | 4045 bytes | |

    Mon, 2007-01-22 at 19:21 +0100, Jan Hubicka wrote:
    Mon, 2007-01-22 at 17:36 +0100, Jan Hubicka wrote:

    you will of course lose that if you go to a global list because the
    cache will never be freed up until the last function is processed. It
    boils down to how well we manage the annotations. As far as I know, they
    should be handled reasonably well. If we don't, They certainly should
    be anyway :-)

    Well, the only place we call free_ssa_operands is from delete_ssa, so
    all stmt annotations for statments elliminated by DCE and others are
    left dead for a moment I blieve.

    Hmm. There is code in here that is no longer working the way I
    originally intended. I shall have to make a few changes :-)
    free_ssa_operands() isnt serving a lot of purpose where it is. its
    simply clearing out the annotation before ssa is destroyed, which
    doesn't do much when it comes to the cache since it gets completely
    destroyed anyway. The pointers to the operands are irrelevant.

    It looks like we dont actually have a way of catching removed stmts
    earlier. Part for the reason for that is because bsi_remove() is
    responsible for removing a stmt from from the stmt list, not actually
    destroying the stmt. A removed stmt could immediately be linked back
    into the stmt list, and then you would of course still want the
    operands.

    ISTR this issue coming up for different reasons in the past.

    Do the stmt annotations for these deleted stmts ever get freed either? I
    cant seem to find a routine that does that, so they are just garbage
    collected when the stmt is collected?

    Perhaps we need in addition to bsi_remove() a bsi_destroy() which also
    frees up anything associated with it. It could then call
    free_ssa_operands() which could then be fixed to us the free lists.

    I'll take a look at that. I thought we had actually done the
    bsi_destroy() before, but perhaps it was just the remove_eh flag which
    has been added is there ever a case where we ask to remove the eh
    info, and we AREN'T destroying the stmt? Anyhoo, I'll take a look.

    Just to get some hard data, at the end of compilation we have for
    combine.c:

    tree-ssanames.c:143 (make_ssa_name) 1367136: 4.1% 0: 0.0% 0: 0.0% 0: 0.0% 28482
    tree-inline.c:2754 (copy_tree_r) 1372400: 4.2% 0: 0.0% 0: 0.0% 17356: 0.5% 37765
    tree-dfa.c:189 (create_stmt_ann) 1433432: 4.3% 866376: 6.3% 0: 0.0% 0: 0.0% 41068
    genrtl.c:17 (gen_rtx_fmt_ee) 1885404: 5.7% 0: 0.0% 1956: 0.1% 0: 0.0% 157280
    Total 32996633 13710922 3145510 3307173 1365803
    source location Garbage Freed Leak Times

    (this show that stmt annotations are one of major memory consumers, but
    also that amount of them allocated by left unfreeded to them freeded is
    something like 4:6. This seems like pretty big leak for freelists. It
    is of course bigger when amount of dead code increase, ie for example in
    tramp3d).

    We dont actually have a free list for stmt annotations right? they are
    just garbaged collected. If we destroyed a stmt as mentioned above by
    also clearing out the annotation field, would that help at all? probably
    not. they would get collected at the same time as the stmt that was
    delinked by bsi_remove() anyway.

    what exactly does that "leak" number mean? The operand cache is all
    pointed to via a linked list of large segments which operands are
    sub-allocated from and all freed at once as segments, so it should never
    show up as a "leak". And annotations are all GC'd, so I wouldnt think
    they would ever show up as a leak. am I misinterpreting what a leak
    is?

    how do I get this memory report?

    Which is one of few places where we get the freelists right, sadly.

    well, it will certainly help if I add a bsi_destroy() and add the caches
    to the free list, and make the free list global to the compilation unit.
    I'll have a look and see if I can help.

    Andrew
  • No.11 | | 6277 bytes | |

    Mon, 2007-01-22 at 19:21 +0100, Jan Hubicka wrote:
    Mon, 2007-01-22 at 17:36 +0100, Jan Hubicka wrote:

    you will of course lose that if you go to a global list because the
    cache will never be freed up until the last function is processed. It
    boils down to how well we manage the annotations. As far as I know, they
    should be handled reasonably well. If we don't, They certainly should
    be anyway :-)

    Well, the only place we call free_ssa_operands is from delete_ssa, so
    all stmt annotations for statments elliminated by DCE and others are
    left dead for a moment I blieve.

    Hmm. There is code in here that is no longer working the way I
    originally intended. I shall have to make a few changes :-)
    free_ssa_operands() isnt serving a lot of purpose where it is. its
    simply clearing out the annotation before ssa is destroyed, which
    doesn't do much when it comes to the cache since it gets completely
    destroyed anyway. The pointers to the operands are irrelevant.

    Yep :)

    It looks like we dont actually have a way of catching removed stmts
    earlier. Part for the reason for that is because bsi_remove() is
    responsible for removing a stmt from from the stmt list, not actually
    destroying the stmt. A removed stmt could immediately be linked back
    into the stmt list, and then you would of course still want the
    operands.

    Yes, we have weakly defined what bsi_remove should do. At the moment it
    just removes the stmt from list and few tables, while it keeps
    annotations/SSA names and others.
    Zdenek hit the problem with his (unfinished) work on removing unused SSA
    names from the list
    @gcc.gnu.org/msg22722.html
    I think he is right making bsi_remove destructive (ie remove/free as
    much from statement as we manage explicitely expecting it to not be used
    again).
    Perhaps the not too obvious BL argument can be eliminated by forking
    bsi_remove into bsi_remove (doing all the removal) and bsi_unlink
    (expecting the statement to be linked back again).

    Do the stmt annotations for these deleted stmts ever get freed either? I
    cant seem to find a routine that does that, so they are just garbage
    collected when the stmt is collected?

    Yes, they are garbage collected. Sadly we reference to dead statements
    in many places, so I think it would be profitable to actually clear the
    link to annotation so it actually can be ggc_collected.
    See the other thread on ggc_unreferenced_object - I would like to
    progress on eliminating those dangling pointers to statements, but it is
    harder that one would think.

    Perhaps we need in addition to bsi_remove() a bsi_destroy() which also
    bsi_remove/bsi_destroy or bsi_remove/bsi_unlink :) But yes, we are on
    the same track.
    frees up anything associated with it. It could then call
    free_ssa_operands() which could then be fixed to us the free lists.

    I'll take a look at that. I thought we had actually done the
    bsi_destroy() before, but perhaps it was just the remove_eh flag which
    has been added is there ever a case where we ask to remove the eh
    info, and we AREN'T destroying the stmt? Anyhoo, I'll take a look.

    Well, we can make the statement untrapping and then remove EH, but we
    probably won't do that via remove_stmt.

    Just to get some hard data, at the end of compilation we have for
    combine.c:

    tree-ssanames.c:143 (make_ssa_name) 1367136: 4.1% 0: 0.0% 0: 0.0% 0: 0.0% 28482
    tree-inline.c:2754 (copy_tree_r) 1372400: 4.2% 0: 0.0% 0: 0.0% 17356: 0.5% 37765
    tree-dfa.c:189 (create_stmt_ann) 1433432: 4.3% 866376: 6.3% 0: 0.0% 0: 0.0% 41068
    genrtl.c:17 (gen_rtx_fmt_ee) 1885404: 5.7% 0: 0.0% 1956: 0.1% 0: 0.0% 157280
    Total 32996633 13710922 3145510 3307173 1365803
    source location Garbage Freed Leak Times

    (this show that stmt annotations are one of major memory consumers, but
    also that amount of them allocated by left unfreeded to them freeded is
    something like 4:6. This seems like pretty big leak for freelists. It
    is of course bigger when amount of dead code increase, ie for example in
    tramp3d).

    We dont actually have a free list for stmt annotations right? they are
    just garbaged collected. If we destroyed a stmt as mentioned above by
    also clearing out the annotation field, would that help at all? probably
    not. they would get collected at the same time as the stmt that was
    delinked by bsi_remove() anyway.

    We call gcc_free on stmt annotations just before releasing going to RTL.
    I used this logic as base for the numbers above (freed/leaking), since
    we clear SSA operands shortly before on the very same set of statements.
    they are just left to be collected.
    Exlicit allocation on thsese (via freelists or ggc_free) would make
    sense. During early optimization on C++ we tend to produce tons of
    statements to just remove them shortly.

    what exactly does that "leak" number mean? The operand cache is all
    pointed to via a linked list of large segments which operands are
    sub-allocated from and all freed at once as segments, so it should never
    show up as a "leak". And annotations are all GC'd, so I wouldnt think
    they would ever show up as a leak. am I misinterpreting what a leak
    is?

    Garbage is memory that has been garbage collected out, Freed is memory
    explicitely freed via ggc_free, Leak is memory still referenced.
    They are not leaking in the numbers above.
    The other set of numbers was run just after IPA, so at this time all the
    annotations/operands for all the live functions was included.

    how do I get this memory report?

    You need to compile with and then
    you can use -fmem-report to get numbers after whole compilation,
    -fpre-ipa-mem-report -fpost-ipa-mem-report to get allocations before and
    after.

    Which is one of few places where we get the freelists right, sadly.

    well, it will certainly help if I add a bsi_destroy() and add the caches
    to the free list, and make the free list global to the compilation unit.
    I'll have a look and see if I can help.

    Thanks a lot!

    Honza

    Andrew

Re: Fix memory corruption in operands


max 4000 letters.
Your nickname that display:
In order to stop the spam: 3 + 2 =
QUESTION ON "Development"

EMSDN.COM