Compilers

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Register Based Interpreter Model

    10 answers - 309 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    I am doing some research on the register-based interpreter model in
    comparison to the more traditional stack-based approach. I would like
    to know if anyone has had any practical experience with register-based
    interpreters or has any good suggestions for references about this
    topic?
  • No.1 | | 285 bytes | |

    2005-11-02, Avatar <acampbellb@hotmail.comwrote:
    I am doing some research on the register-based interpreter model in
    comparison to the more traditional stack-based approach. []
    I suggest looking into Parrot. It is a register based VM. See
    http://www.parrotcode.org/
  • No.2 | | 587 bytes | |

    Avatar wrote:
    I am doing some research on the register-based interpreter model in
    comparison to the more traditional stack-based approach. I would like
    to know if anyone has had any practical experience with register-based
    interpreters or has any good suggestions for references about this
    topic?

    Hi there

    RISC-like virtual ISAs (e.g. for use on virtual machines, bytecode
    emitters etc) are relevant to your question. Such an example is found
    in the LLVM compiler infrastructure:

    http://llvm.cs.uiuc.edu/

    Nikolaos Kavvadias
  • No.3 | | 1371 bytes | |

    I am doing some research on the register-based interpreter model in
    comparison to the more traditional stack-based approach. I would like
    to know if anyone has had any practical experience with register-based
    interpreters or has any good suggestions for references about this
    topic?

    There are at least two recent projects that decided for a register based
    approach, namely Lua (www.lua.org) and Parrot (www.parrotcode.org). The
    Lua people report a significant speedup when they switched to the register
    machine. Davis et al did some experiments with a JVM for their paper:

    The Case for Virtual Register Machines (2002)

    Some authors argue that a register machine is more complicated but a
    better starting point for jitting.
    I am currently developing a register based interpreter for the programming
    system Alice (www.ps.uni-sb.de/alice). As the intermediate representation
    is somewhat biased towards the register approach, it is even easier than
    using a stack. Like in Lua, I decided to use the local variables as
    registers, i.e. to have an unbounded number of registers. In that sense
    the only difference between stack and registers is that registers can
    randomly be accessed and intermediate results are stored not on the stack
    but in scratch registers.

    Regards,

    Christian
  • No.4 | | 672 bytes | |

    Avatar wrote:
    I am doing some research on the register-based interpreter model in
    comparison to the more traditional stack-based approach. I would like
    to know if anyone has had any practical experience with register-based
    interpreters or has any good suggestions for references about this
    topic?

    I know that Lua (http://www.lua.org) is a register-based interpreter.
    The source code is available and the language itself is fantastic (very
    expressive and flexible for an embedded scripting language).

    More information can be found here:
    http://www.tecgraf.puc-rio.br/~lhf/ftp/doc/sblp2005.pdf

    Hope this helps.

  • No.5 | | 633 bytes | |

    "Avatar" <acampbellb@hotmail.comwrote in
    I would like to know if anyone has had any practical experience
    with register-based interpreters or has any good suggestions for
    references about this topic?

    You may wish to read Appendix D (Abstract Machine design and
    reference) of the Implementor's Guide for the scripting language
    Pawn. You can obtain a copy at

    The author demonstrates with examples how register-based
    interpreters are more readily optimized than stack-based
    interpreters.

    The author lists Lua as a stack-based interpreter, but since
    version 5.0, Lua is register-based.
  • No.6 | | 9765 bytes | |

    "Avatar" <acampbellb@hotmail.comwrites:
    >I am doing some research on the register-based interpreter model in
    >comparison to the more traditional stack-based approach. I would like
    >to know if anyone has had any practical experience with register-based
    >interpreters


    There are some register-based VM interpreters, and their implementors
    should have experience, and sometimes have written about it. Some
    that come to mind are Inferno [winterbottom&pike97], Parrot, and IIRC
    the Lua VM. The last two are especially interesting, because
    originally the language was implemented using a stack-based VM, and
    then converted to a register-based VM. Moreover, there is the WAM for
    implementing Prolog [vanroy94].

    However, I still believe, that, if you want good performance, a hybrid
    stack-based VM (hybrid because you usually do indexed accesses to
    local variables) is a good idea, because:
    - The main advantage of register VMs is fewer dispatches. If you
    really want performance, you use dynamic superinstructions (aka
    selective inlining) [piumarta&riccardi98,ertl&gregg03], which bring
    the number of dispatches to the same level (one dispatch per control
    flow change), eliminating the advantage gained from register VMs.
    - It's easier to keep VM stack items in real-machine registers than VM
    registers. Keeping one stack item in real-machine registers is
    almost trivial, and can be very beneficial (factor 2 speedup on
    PPC970). For more complex schemes, see our work on stack caching
    [ertl&gregg04ivme,ertl&gregg05].
    - Stack-based VMs are simpler to implement, leaving more time for
    implementing highly effective optimizations like dynamic
    superinstructions.

    >or has any good suggestions for references about this
    >topic?


    Well, this has been discussed here several times, so you can look up
    the old discussions. There is one paper that does a direct empirical
    comparison [davis+03] (a journal version of this paper appeared
    recently in Science of Computer Programming).

    @InProceedings{davis+03,
    author = {Brian Davis and Andrew Beatty and Kevin Casey and
    David Gregg and John Waldron},
    title = {The Case for Virtual Register Machines},
    booktitle = {Interpreters, Virtual Machines and Emulators
    (IVME~'03)},
    pages = {41},
    year = {2003},
    url1 = {},
    url2 = {},
    abstract = {Virtual machines (VMs) are a popular target for
    language implementers. Conventional wisdom tells us
    that virtual stack architectures can be implemented
    with an interpreter more efficiently, since the
    location of operands is implicit in the stack
    pointer. In contrast, the operands of register
    machine instructions must be specified
    explicitly. In this paper, we present a working
    system for translating stack-based Java virtual
    machine (JVM) code to a simple register code. We
    describe the translation process, the complicated
    parts of the JVM which make translation more
    difficult, and the optimisations needed to eliminate
    copy instructions. Experimental results show that a
    register format reduces the number of executed
    instructions by 34.88%, while increasing the number
    of bytecode loads by an average of 44.81%. ,
    this corresponds to an increase of 2.32 loads for
    each dispatch removed. We believe that the high cost
    of dispatches makes register machines attractive
    even at the cost of increased loads.}
    }

    @InProceedings{winterbottom&pike97,
    author = "Phil Winterbottom and Rob Pike",
    title = "The Design of the {Inferno} Virtual Machine",
    PTeditor = "{IEEE}",
    booktitle = "Hot Chips {IX}: Stanford University, Stanford,
    California, August 24, 1997",
    publisher = "IEEE Computer Society Press",
    PTaddress = "1109 Spring Street, Suite 300, Silver Spring, MD
    20910, USA",
    year = "1997",
    ISBN = "?",
    PTpages = "?...-?",
    bibdate = "Mon Jan 08 16:33:30 2001",
    acknowledgement = ack-nhfb,
    }

    @InProceedings{piumarta&riccardi98,
    author = {Ian Piumarta and Fabio Riccardi},
    title = { Direct Threaded Code by Selective
    Inlining},
    crossref = {sigplan98},
    pages = {291},
    url = {},
    annote = {They reduce the overhead of a direct threaded
    interpreter by combining all the VM instructions in
    a basic block into a single virtual machine
    instruction. This is done by simply concatenating
    the machine code for the virtual machine
    instructions (except for the Next code). Multiple
    instances of the same sequence are just translated
    once, and then reused. They evaluated this technique
    empirically in the context of a fine-grained
    RISC-like VM, and in an Caml
    interpreter. The speedup over plain direct threaded
    code for the RISC-like VM is a factor
    1.33 For the Caml VM the speedups varied
    with benchmark and processor, from 1 to 2.2. The
    code expansion ranges from 2.2 for the Sparc,
    with larger benchmarks having less expansion (due to
    more reuse of sequences). Implementing this
    technique on the Caml VM took only one day.}
    }

    @InProceedings{ertl&gregg03,
    author = "M. Anton Ertl and David Gregg",
    title = " Indirect Branch Prediction Accuracy in Virtual Machine Interpreters",
    crossref = "sigplan03",
    PTpages = "",
    url = "%26gregg03.ps.gz",
    abstract = "Interpreters designed for efficiency execute a huge
    number of indirect branches and can spend more than
    half of the execution time in indirect branch
    mispredictions. Branch target buffers are the best
    widely available\mn{on all recent general-purpose
    machines?} form of indirect branch prediction;
    however, their prediction accuracy for existing
    interpretes is only 2\%\%. In this paper we
    investigate two methods for improving the prediction
    accuracy of BTBs for interpreters: replicating
    virtual machine (VM) instructions and combining
    sequences of VM instructions into superinstructions.
    We investigate static (interpreter build-time) and
    dynamic (interpreter run-time) variants of these
    techniques and compare them and several combinations
    of these techniques. These techniques can eliminate
    nearly all of the dispatch branch mispredictions,
    and have other benefits, resulting in speedups by a
    factor of up to 3.17 over efficient threaded-code
    interpreters, and speedups by a factor of up to 1.3
    over techniques relying on superinstructions alone."
    }

    @Proceedings{sigplan03,
    booktitle = "SIGPLAN '03 Conference on Programming Language
    Design and Implementation",
    title = "SIGPLAN '03 Conference on Programming Language
    Design and Implementation",
    year = "2003",
    key = "SIGPLAN '03"
    }

    @InProceedings{ertl&gregg04ivme,
    author = {M. Anton Ertl and David Gregg},
    title = {Combining Stack Caching with Dynamic
    Superinstructions},
    crossref = {ivme04},
    pages = {7},
    URL = {%26gregg04ivme.ps.gz},
    abstract = {Dynamic superinstructions eliminate most of the
    interpreter dispatch overhead. This results in a
    higher proportion of interpreter time spent in stack
    accesses (on a stack-based virtual machine). Stack
    caching reduces the stack access overhead. Each of
    these optimizations provides more speedup, if the
    other one is applied, too. Combining these
    optimizations also opens additional opportunities:
    we can insert stack state transitions without
    dispatch cost; this reduces the number of necessary
    VM instruction instances significantly. A
    shortest-path search can find the optimal sequence
    of state transitions and VM instructions. In this
    paper we describe an implementation of static stack
    caching employing these ideas. We also represent
    empirical results for our implementation, resulting
    in a speedup of up to 58\% over a version that keeps
    one value in registers all the time.}
    }

    @Proceedings{ivme04,
    booktitle = {Interpreters, Virtual Machines and Emulators (IVME '04)},
    title = {Interpreters, Virtual Machines and Emulators (IVME '04)},
    year = {2004}
    }

    @InProceedings{ertl&gregg05,
    author = {M. Anton Ertl and David Gregg},
    title = {Stack Caching in {Forth}},
    crossref = {euroforth05},
    pages = {6},
    url = {%26gregg05.ps.gz},
    PTnote = {not refereed},
    abstract = {Stack caching speeds Forth up by keeping stack items
    in registers, reducing the number of memory accesses
    for stack items. This paper describes our work on
    extending Gforth's stack caching implementation to
    support more than one register in the canonical
    state, and presents timing results for the resulting
    Forth system. For single-representation stack
    caches, keeping just one stack item in registers is
    usually best, and provides speedups up to a factor
    of 2.84 over the straight-forward stack
    representation. For stack caches with multiple stack
    representations, using the one-register
    representation as canonical representation is
    usually optimal, resulting in an overall speedup of
    up to a factor of 3.80 (and up to a factor of 1.53
    over single-representation stack caching).}
    }

    @Proceedings{euroforth05,
    title = {21st EuroForth Conference},
    booktitle = {21st EuroForth Conference},
    year = {2005},
    key = {EuroForth'05},
    url = {}
    }

    @Article{vanroy94,
    author = {Van Roy, Peter},
    title = {1983 The Wonder Years of Sequential Prolog Implementation},
    journal = {Journal of Logic Programming},
    year = 1994,
    volume = {19,20},
    pages = {385},
    url = {},
    url = {}
    }
    - anton
  • No.7 | | 421 bytes | |

    * Avatar:

    I am doing some research on the register-based interpreter model in
    comparison to the more traditional stack-based approach. I would like
    to know if anyone has had any practical experience with register-based
    interpreters or has any good suggestions for references about this
    topic?

    Parrot is a register-based VM. The source code probably contains
    further references.
  • No.8 | | 635 bytes | |

    >I am doing some research on the register-based interpreter model in
    >comparison to the more traditional stack-based approach. I would like
    >to know if anyone has had any practical experience with register-based
    >interpreters or has any good suggestions for references about this
    >topic?


    Lua 5.0 uses a register-based VM. Read about it in
    The implementation of Lua 5.0
    Journal of Universal Computer Science 11 #7 (2005) 1159-1176.
    http://www.tecgraf.puc-rio.br/~lhf/ftp/doc/jucs05.pdf

    There is also
    The case for virtual register machines

  • No.9 | | 1276 bytes | |

    In article <05-11-025@comp.compilers>, <christian.mueller@aktivanet.dewrote:
    Some authors argue that a register machine is more complicated but a
    better starting point for jitting.

    The idea of jitting _bytecode_ seems rather strange to me in the first
    place. Bytecode is a _target_ language, after all, designed to be
    efficiently interpreted by a VM with only straightforward
    modifications (threading, symbol resolution), but no real syntactic
    analysis. It is not a particularly good intermediate language from
    which to translate to native machine code. For example, bytecode fixes
    the evaluation order completely, so a jit-compiler needs to go through
    hoops to reorder the computations optimally for the target platform.

    To my mind, if a VM is supposed to support jitting, the code should be
    stored in some higher-level format, which is then jitted at run-time
    to a target language (native code or interpreted bytecode).

    Lauri

    [Bytecode can be though of as a linearized version of a parse tree. I
    don't see that it makes the situation any worse unless you want the
    compiler to do a bunch of dependency analysis and dump out the
    analysis results along with the code. -John]

  • No.10 | | 2896 bytes | |

    Lauri Alanko wrote:

    <christian.mueller@aktivanet.dewrote:
    >>Some authors argue that a register machine is more complicated but a
    >>better starting point for jitting.

    >
    >

    The idea of jitting _bytecode_ seems rather strange to me in the first
    place. Bytecode is a _target_ language, after all, designed to be
    efficiently interpreted by a VM with only straightforward
    modifications (threading, symbol resolution), but no real syntactic
    analysis. It is not a particularly good intermediate language from
    which to translate to native machine code. For example, bytecode fixes
    the evaluation order completely, so a jit-compiler needs to go through
    hoops to reorder the computations optimally for the target platform.

    It's likely that if your performance concerns are such that you need
    to pay attention to processor-specific instruction-ordering issues
    then you wouldn't be using a jitted language implementation anyway.
    Modern processors do reordering anyway, making expending
    effort in determining the best order an increasingly dubious
    proposition.

    But one really important factor in many bytecoded language
    implementations is the requirement for low pause times and interactive
    performance. In such systems the last thing you can afford is having
    a low-level optimizer spend lots of effort trying to perform low-level
    optimizations. Far better to have the smarts at a higher-level in the
    system, and have the optimizer target bytecode. then gets a nice
    trade-off in portability. concentrates on high-level
    optimizations (inlining, closure elimination, unboxing, (virtual)
    register allocation) and rely on a relatively naive
    bytecode-to-native-code generator to generate acceptably fast code
    with little effort, and little latency.

    advantage is that the bytecode can persist between system runs,
    unlike native code. So one can pre-optimize a system such that it
    starts up "hot". Current JIT architectures require some time for code
    to optimize, giving slow startup and optimization rates.

    Another is that performance-critical but infrequent code (e.g.
    finalization processing) can be optimized with the right system
    architecture. If optimization is pushed down into the VM its
    extremely unlikely to ever consider infrequently run code as worth
    optimizing, as it chooses what code to optimize based on simple
    metrics. But a higher-level optimizer can use additional criteria,
    such as process priorities or programmer annotations to choose what to
    optimize.

    To my mind, if a VM is supposed to support jitting, the code should be
    stored in some higher-level format, which is then jitted at run-time
    to a target language (native code or interpreted bytecode).

Re: Register Based Interpreter Model


max 4000 letters.
Your nickname that display:
In order to stop the spam: 0 + 9 =
QUESTION ON "Compilers"

EMSDN.COM