Compilers

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Graph coloring register allocation via iterative scan

    8 answers - 469 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hello!
    I wonder whether register allocation via graph coloring (improved by
    Preston Briggs) is commonly better than iterative scan algortihm
    (introduced by Alkis Evlogimenos). In what cases one is better than
    the other? Are there some theoretical proofs of dominance of one of
    the methods?
    By being better I mean producing more efficient code with less register
    spills.
    Any information would be appreciated. Thank you.
  • No.1 | | 1159 bytes | |

    avayvod@gmail.com wrote:
    I wonder whether register allocation via graph coloring (improved by
    Preston Briggs) is commonly better than iterative scan algortihm
    (introduced by Alkis Evlogimenos). In what cases one is better than
    the other? Are there some theoretical proofs of dominance of one of
    the methods?

    By being better I mean producing more efficient code with less register
    spills.

    Register allocation algorithms can be complicated. As important as
    good code is producing something that allocates fast enough not to
    hurt compile times too much, this was the intention of linear scan.

    It also can be very difficult to bolt a particular algorithm into a
    particular compiler. A while ago an attempt was made to put a Briggs
    like allocator into GCC, but GCC's backend had a completely different
    point of view of the machine making it very complex. The authors
    ended up writing something like 16Kloc to do something that in so
    compilers only takes ~1Kloc, and in the end gave up. So GCC still
    uses something else (an odd variation of the Chow/Hennessey algorithm
    I think).
  • No.2 | | 2725 bytes | |

    Rob Thorpe wrote:
    avayvod@gmail.com wrote:
    >
    >>I wonder whether register allocation via graph coloring (improved by
    >>Preston Briggs) is commonly better than iterative scan algortihm
    >>(introduced by Alkis Evlogimenos). In what cases one is better than
    >>the other? Are there some theoretical proofs of dominance of one of
    >>the methods?
    >>
    >>By being better I mean producing more efficient code with less register
    >>spills.

    >
    >

    Register allocation algorithms can be complicated. As important as
    good code is producing something that allocates fast enough not to
    hurt compile times too much, this was the intention of linear scan.

    It also can be very difficult to bolt a particular algorithm into a
    particular compiler. A while ago an attempt was made to put a Briggs
    like allocator into GCC, but GCC's backend had a completely different
    point of view of the machine making it very complex. The authors
    ended up writing something like 16Kloc to do something that in so
    compilers only takes ~1Kloc, and in the end gave up. So GCC still
    uses something else (an odd variation of the Chow/Hennessey algorithm
    I think).

    Actually, the current Gcc register allocator is about 10Kloc plus
    15Kloc of reload pass which was shared by the mentioned Briggs
    allocator implementation (whose size was ~10Kloc). So the size of the
    current gcc allocator and the Briggs allocator implementation was the
    same.

    But you are right, it is more about execution. The current gcc
    register allocator was developed and refined during 18 years and
    solves all register allocator tasks in some form (like coalescing,
    rematerialization, live range splitting and part of code selection
    which is specific for gcc). Few years spent by Michael Matz and
    others to implement Briggs allocator is not enough to achieve the same
    code quality especially in such multi-target compiler as gcc with its
    very complicated register model presented in GCC RTL and machine
    description.

    As for comparison of graph colouring and iterative scan algorithms,
    they are both heuristic and might work better for one programs and
    worse for others. Common opinion is that graph colouring will work
    better for programs with complicated CFG (like perlbmk in
    SPECInt2000), bin packing, linear scan and iterative scan will work
    better for programs with small number of big basic blocks (like fpppp
    from SPECFp95). But modern register allocators frequently use some
    form of combination of these two approaches.
  • No.3 | | 284 bytes | |

    Thank you.
    Do you know some scientific researches on this comparison? there's
    only a common opinion?
    Where can I find these test sources with complicated CFG? Didn't manage
    to find them on SPECInt2000 :( May be, went to the wrong page.
  • No.4 | | 5780 bytes | |

    Whywhat wrote:
    Do you know some scientific researches on this comparison? there's
    only a common opinion?

    That is a common opinion what I heard from specialists working in this
    area. You can also find such opinion (about bin packing approach and
    approach based on colouring conflict graph) in Bob Morgan's book
    "Building optimizing compiler". As I understand iterative scan is
    just improvement of linear scan to deal with irregular file
    architectures (like x86). Comparison of linear scan and George Appel
    approach can be find in original article about linear scan register
    allocator.

    IMH all this comparison should be taken with a grain of salt. For
    example, few years ago I read articles where one method
    (Callahan-Koblenz approach) was shown better than Briggs approach, and
    another article where it was shown as worse. About ten years ago, I
    read very educated discussion with Briggs participation what is better
    his approach or priority based colouring (I guess you can find in
    comp.compilers archive). Personally I still don't know what is
    better. The priority based colouring and Briggs optimistic colouring
    can colour successfully classical diamond graph with two colours.
    Briggs biased register assigning can be easily implemented in the
    priority based colouring too (it could be a small project for GCC --
    its global register allocator can not have biased colouring).

    As I wrote there are a few reasons for comparison difficulty
    (execution, compiler environment, target processor specifics).
    Execution means how good and accurate the implementation of the
    register was (a good implementation needs a lot of time which
    researchers don't have in many cases).

    Compiler environment means how aggressive optimizations in the
    compiler itself (e.g. aggressive inlining creates more functions with
    complex CFG and in this case graph colouring works better), does the
    compiler have preliminary passes (like register pressure relief an
    others mentioned in Bob Morgan's book).

    Target processor is also important. Some transformations are easy to
    implement following one approach. E.g. modern x86 processors are
    processors and implementation of chameleon intervals (the important
    part of FAT algorithm which was missed in Bob Morgan's book) is very
    important for them because x86 has irregular register file and has
    xchg (swap registers) which is very fast in Intel processor (and a bit
    slower in AMD ones). Swap is practically free because processor
    has register renaming hidden in hardware. Imho the chameleon
    intervals is easier to implement in bin packing approach. For other
    processors which have no swap insn (and you need 3 xor insns to
    implement swap), implementation of the chameleon intervals is less
    important. Irregular register files creates also a big problem for
    the register allocator and this problem can be solved easier in one
    approach than in another.

    Because the comparison of linear-scan and George's graph colouring in
    the original article about linear-scan algorithm has shown that it is
    worse that George's approach, I believe the article. It corresponds
    my belief that a good register allocator (for modern state-of-the art
    compilers) should have some form of graph colouring. Bin packing
    worked good for DEC compiler (which was not so aggressive) and Alpha
    as a target processor had big regular register files.

    I think a perspective approach is to combine solution different
    allocation tasks (like assigning, splitting, coalescing, and
    rematerialization) no to solve them as as separate tasks in some order
    with iterations. Colouring based on graph fusion is one of them (as I
    understand it is used in Intel compiler). As a research work, it
    would be interesting to use some metaheuristic (like taboo search) to
    get a good combinations of small register allocation transformation
    (assigning to one pseudo-register, coalescing pair of pseudos,
    splitting one pseudo live range etc) because it will work faster than
    usage ILP or PQBQ solvers (but still not fast enough for a commercial
    compiler).

    It is also interesting too try other colouring algorithms. I remember
    early articles of Ershow and Starkman (unfortunately that was
    published on russian only in 50s and early 60s) about interesting
    algorithms of graph colouring different from Chaintin's approach. The
    algorithms were based on graph node cluing for memory cells economy
    (memory was precious that time).

    If you want write a good register allocator, you should try many
    approaches and a good infrastructure (framework) is the most important
    part of this.

    And last thing, register allocation area has probably more patents
    than any other compiler optimization area. Linear scan (as many graph
    coloring algorithms, and bin packing) is patented. So one should
    think about commercial usage of LLVM whose register allocator is based
    on linear scan and imho far way from the state-of-the art register
    allocator.

    Where can I find these test sources with complicated CFG? Didn't manage
    to find them on SPECInt2000 :( May be, went to the wrong page.

    You should have SPEC2000 testsuite for this which is not free. If you
    have no access to the SPEC, you could look at perl regexec function
    (half time of perlbmk test from SPEC2000 is spent there). It is a
    regular expression matcher. The function has a big switch and a lot
    of its cases contain loops. There are lot of long and short live
    variables (and PRE creates much more pseudos).
  • No.5 | | 2161 bytes | |

    Hi,

    I just want ot make a small comment on graphs and register allocation.

    I read the paper on Linear Scan Register Allocation, and what
    immediately astonished me ist that it is in fact a graph coloring
    algorithm ! More precisely it is interval graph coloring. All what is
    said about the algorithm and its complexity is well known in graph
    theory since at least 1970 (the book of Claude Berge: Graphs and
    Hypergraphs).

    course the use of this algorithm in the context of global register
    allocation is interesting, as interval graphs model only live ranges in
    code without control structures like conditionals and loops (i.e basic
    blocks). In fact when the results are good vs Chaitin-Briggs
    algorithm, then I would bet that is because few conditionals are
    present in the code. Chaitin-Briggs algorithm is a general graph
    coloring algorithm. The heuristic in the paper with the SCCs is aimed
    at that goal I think.

    As a note, interference graphs built in the case of loops (single loop
    without conditionals) are circular-arc graphs whose coloring is also
    well known in the art (Tucker 1975 and1978 for the coloring, and the
    paper on spill code mimisation in PLDI 89 by Bernstein et al. (Golumbic
    is one of the authors)for mentioning this, and of course the paper of
    Laurie Hendren at CC'92 for register allocation in loops.

    I would also like to mention the PhD thesis of Angelika Zobel (1992),
    which, if I remember well, tries to modify the code to make the
    interference graphs similar to graphs whose coloring is easier and
    well-known (like interval or circular-arc graphs). But it is a bit in
    the fog of my memory.

    For intersection graphs (interval graphs, circle graphs, circular-arc
    graphs) it is common place to work on the interval families underlying
    the graph structure and not on the graph structure itself. It is
    easier.

    It would be nice if compiler writers open other books than compiler
    books sometimes, espacially when dealing with problems or solutions not
    directlly in the field.

    Sylvain Lelait
  • No.6 | | 2483 bytes | |

    Completely agreed, the linear scan register allocation didn't bring
    anything new from the computer science perspective, and any person
    aware about graph theory knows this.

    Since many people working on code optimization are mainly interested in
    computer engineering aspects, the linear scan register allocation seems
    a new method to them, which is not really the case.

    S

    sylerugby@yahoo.com a :
    I read the paper on Linear Scan Register Allocation, and what
    immediately astonished me ist that it is in fact a graph coloring
    algorithm ! More precisely it is interval graph coloring. All what is
    said about the algorithm and its complexity is well known in graph
    theory since at least 1970 (the book of Claude Berge: Graphs and
    Hypergraphs).

    course the use of this algorithm in the context of global register
    allocation is interesting, as interval graphs model only live ranges in
    code without control structures like conditionals and loops (i.e basic
    blocks). In fact when the results are good vs Chaitin-Briggs
    algorithm, then I would bet that is because few conditionals are
    present in the code. Chaitin-Briggs algorithm is a general graph
    coloring algorithm. The heuristic in the paper with the SCCs is aimed
    at that goal I think.

    As a note, interference graphs built in the case of loops (single loop
    without conditionals) are circular-arc graphs whose coloring is also
    well known in the art (Tucker 1975 and1978 for the coloring, and the
    paper on spill code mimisation in PLDI 89 by Bernstein et al. (Golumbic
    is one of the authors)for mentioning this, and of course the paper of
    Laurie Hendren at CC'92 for register allocation in loops.

    I would also like to mention the PhD thesis of Angelika Zobel (1992),
    which, if I remember well, tries to modify the code to make the
    interference graphs similar to graphs whose coloring is easier and
    well-known (like interval or circular-arc graphs). But it is a bit in
    the fog of my memory.

    For intersection graphs (interval graphs, circle graphs, circular-arc
    graphs) it is common place to work on the interval families underlying
    the graph structure and not on the graph structure itself. It is
    easier.

    It would be nice if compiler writers open other books than compiler
    books sometimes, espacially when dealing with problems or solutions not
    directly in the field.
  • No.7 | | 1413 bytes | |

    (Followup-to: comp.compilers)

    Sid Touati wrote:
    Completely agreed, the linear scan register allocation didn't bring
    anything new from the computer science perspective, and any person
    aware about graph theory knows this.

    Since many people working on code optimization are mainly interested in
    computer engineering aspects, the linear scan register allocation seems
    a new method to them, which is not really the case.

    Hm, maybe it is so. I personally don't like graph coloring too much,
    for one reason: it's only about variables having a single lifetime, in
    one single place, which is totally unrealistic.

    Variables may have ranges of lifetimes (though transformation to
    something like SSA can probably split those lifetimes pretty well), and
    most importantly they aren't places, but values, i.e. a variables is
    *needed* in certain places, like in a certain register as a function
    argument, or a return value, or in ANY register to serve as another value.

    Graph RAs seem to solve this by inserting moves, but I'd rather have a
    backwards working RA that loads values on demand, maybe a few cycles in
    advance so it's good for the pipeline. I don't see where a global graph
    coloring, that merely assigns variables to locations in a static way,
    can do that.

    Please correct me if I'm mistaken.
  • No.8 | | 2005 bytes | |

    Regarding the original request for a comparison between linear scan
    and graph coloring allocators, I've collated Regarding the original
    request for a comparison between linear scan and graph coloring
    allocators, I've collated some data I had comparing gcc's default
    allocator to the now defunct graph allocator. course, the default
    allocator isn't a linear scan allocator, but I thought the comparison
    might be informative. The report, "A Comparison of Allocators", can
    be found at http://www.cs.cmu.edu/~dkoes/research.html

    The graph coloring aspect of register allocation is something of a red
    herring. It's easy for interference graphs from a single basic block
    or code in SSA form. There are several theory-centric papers that
    prove things about the colorability of structured programs (i.e.
    goto-free C) because they have bounded tree-width (Bodlaender,
    Gustedt, and Telle. Linear-time reigster allocation for a fixed number
    of registers; Thorup. All structured programs have small tree width
    and good register allocation). From a practical aspect, these results
    aren't very useful. The traditional graph coloring algorithm
    (originally due to Kempe) can correctly determine the colorability of
    an interference graph 99% of the time.

    What matters in getting a quality register allocation (especially when
    there are only a small number of registers such as with x86), is what
    you do when the interference graph is not colorable. If the graph is
    not colorable and spilling is necessary, then the problem becomes
    NP-complete, even for allocating a single block (V. Liberatore,
    M. Farach, and U. Kremer. Hardness and algorithms for local register
    allocation.).

    For more detail into how graph coloring isn't very helpful to register
    allocation, see "An Analysis of Graph Coloring Register Allocation"
    also at http://www.cs.cmu.edu/~dkoes/research.html
    -Dave

Re: Graph coloring register allocation via iterative scan


max 4000 letters.
Your nickname that display:
In order to stop the spam: 4 + 3 =
QUESTION ON "Compilers"

EMSDN.COM