XML

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • XML Performance in a Transacation

    21 answers - 1039 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    This is a pretty general question, and I know that it can very from
    application to application, and even from parser to parser. However,
    I'm curious if anybody has any links to some performance tests that show
    the overall time a transaction takes to complete, and how much of that
    overall transaction time is devoted to parsing or validation of an XML
    file. Particularly, from retrieving an XML file from a data store, and
    then completing the end transaction.
    I've been requested to provide some numbers to show that actual XML
    validation results and parsing are a small portion of the overall
    transaction process, when dealing with XML in a B2B process. Any
    information that can be provided would be appreciated.
    Dave
    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>
    The list archives are at
    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.1 | | 2090 bytes | |

    Hi Dave,

    Based on what I've observed in our environment, there really isn't an
    across-the-board answer. The main contributing factors are:

    * How big your messages are
    * How you're parsing the messages
    * How many times you're parsing the messages

    If you're parsing messages multiple times, obviously that will give you
    some overhead. If you can do what you're doing in one pass, that's
    certainly the most efficient.

    I don't have any specific times, but we had to restructure some things
    because of how we were constantly serializing/deserializing to strings
    due to the toolkit (and usage model) we were using. It was killing us
    from a timing point of view. Restructuring the way the message
    processing was done and chaining things together using SAX filters, we
    were able to get it reduced significantly.

    So, I guess the short answer is: it depends. ;)

    Hope this helps you a little.

    ast

    Wed, 2006-03-22 at 21:19, David Carver wrote:
    This is a pretty general question, and I know that it can very from
    application to application, and even from parser to parser. However,
    I'm curious if anybody has any links to some performance tests that show
    the overall time a transaction takes to complete, and how much of that
    overall transaction time is devoted to parsing or validation of an XML
    file. Particularly, from retrieving an XML file from a data store, and
    then completing the end transaction.

    I've been requested to provide some numbers to show that actual XML
    validation results and parsing are a small portion of the overall
    transaction process, when dealing with XML in a B2B process. Any
    information that can be provided would be appreciated.

    Dave

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.2 | | 4027 bytes | |

    Dave,
    Without knowing the environment or the other transaction components its
    impossible to say how long anything will take. You need to do some
    actual measurement in your own environment using the tools that you are
    using. The chances are that your XML processing costs are not
    insignificant. XML processing is CPU intensive and hardly the most
    machine-efficient way of handling data. Adding in validation adds
    additional overhead that I would expect were proportional to the
    complexity of the interaction between the XML document and the schema.

    There are a few questions to address:

    1. Define "transaction" - is it getting the message, parsing it,
    transforming it in some fashion and storing it, or does it include the
    business logic that acts on the data? If you are trying to argue that
    XML is not expensive then you had better look at "whole of life" costs
    for the transaction.

    2. Decompose the costs - what is important/what is the problem? Elapsed
    time, CPU, I, memory? Are you running out of CPU? Elapsed time?
    What? "Everything" is not a useful answer at this point. Something is
    causing the question to be asked in your environment

    3. Model - what parts of the processing are there and how stable are
    their costs? If you have an XML transform that takes 50ms and database
    or messaging acess that takes 3000ms then you don't care about the XML.
    If you have database access that takes 10ms then the XML will look bad.
    This depends on what your problem is, what your platform is, and why the
    question is being asked. Are all your XML documents the same size?
    Size will cost you something too.

    4. Measure - this is quite straight-forward. The last time I needed
    this kind of number I used a JNI-based Log4J filter to get the S CPU
    time for the thread and process and externalise it. Measuring was then
    a process of subtracting numbers from each other.

    Now, to give you some utterly useless numbers:

    I have seen XML parsing add over a hundred milliseconds to a transaction
    on a Sun box, out of a total transaction cost of something like 250ms.
    This was quite expensive in that environment and the XML was not large
    but it was being mapped into Java objects and the time included object
    construction and associated overheads. That application had other
    performance problems (the remaining 150ms in the transaction). In
    another environment, on an Intel box, we were happy to take 160ms each
    to read, XSL transform and store XML messages in a fairly complex way
    because that performance was far better than we needed.

    Greg

    David Carver wrote:

    This is a pretty general question, and I know that it can very from
    application to application, and even from parser to parser. However,
    I'm curious if anybody has any links to some performance tests that
    show the overall time a transaction takes to complete, and how much of
    that overall transaction time is devoted to parsing or validation of
    an XML file. Particularly, from retrieving an XML file from a data
    store, and then completing the end transaction.

    I've been requested to provide some numbers to show that actual XML
    validation results and parsing are a small portion of the overall
    transaction process, when dealing with XML in a B2B process. Any
    information that can be provided would be appreciated.

    Dave
    --

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
    --

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.3 | | 2032 bytes | |

    Greg, I appreciate the response, and agree whole heartedly that it
    varies greatly. To narrow the scope down a bit, which doesn't really
    narrow it down much, I work for STAR which does XML Message
    specifications for the Automotive Retail industry. The transaction I
    was referring to was, retrieving the Data from a database, doing any
    type of business logic on the data, populating the appropriate xml file,
    validating the file, and then serializing it, and finally sending the
    transaction out through a web service (whether SAP or ebXML).

    Now I have tried to explain to the powers that be that there are a wide
    number of variables that can affect the overall transaction time, and
    XML processing and validation is one of them. Speed in this area is
    greatly affected by the size of the xml, the type of validation being
    done, and the tools that are being used to create the XML. I'm not
    really looking for concrete numbers, but general overall impressions on
    XML performance in the above stated life cycle. What percentage of the
    time involved does the parsing take. Again, it varies greatly.

    The issue has been raised by members as we look at migrating our
    standards from AGIS 8.0 to AGIS 9.0 which implements Core Components.
    I would expect 9.0 to have similiar results as UBL 1.0 since both
    implement Core Components and very similar NDRs. Because of the new
    NDRs, the size of the XML files has increased, which was to be expected,
    and my own ad-hoc tests indicate about 3% increase in parsing and
    validation time. Which really in milliseconds is not that huge amount.

    Hope that clarifies things a bit. The answers you provided actually
    are very helpful.

    Dave

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.4 | | 2560 bytes | |

    fwiw, we use xsltproc to parse documents. we have one application which
    produces complex forms (a la xsl-fo, but our own vocabulary, for our own
    reasons).
    we also use it to distribute transactions between servers.

    my observations - for small messages and documents it all seems to be
    fast enough (haven't had any performance issues to cause me to measure it).

    as message size grows, the predictability of the performance decreases,
    in a way that can't be explained by disc caching.

    at some point the system slows to unusable (xml source around
    500k->1mb). i'll admit at this point that the style sheet is also large.
    sadly the performance looks to be o(n2) - and this is the important
    point. it is degrading at a much faster rate than the increase in the
    message size.

    some initial investigation reveals that the vast majority of the time is
    spent parsing the input strings and coping with utf-8. i haven't had
    time to play with the string parsing yet, but i'm hoping for a classic
    o(nlogn) performance at the end.

    david veillard may be able to comment further on this.

    rick

    David Carver wrote:

    This is a pretty general question, and I know that it can very from
    application to application, and even from parser to parser. However,
    I'm curious if anybody has any links to some performance tests that
    show the overall time a transaction takes to complete, and how much of
    that overall transaction time is devoted to parsing or validation of
    an XML file. Particularly, from retrieving an XML file from a data
    store, and then completing the end transaction.

    I've been requested to provide some numbers to show that actual XML
    validation results and parsing are a small portion of the overall
    transaction process, when dealing with XML in a B2B process. Any
    information that can be provided would be appreciated.

    Dave
    --

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
    --
    !DSPAM:4421bf0c103379137362908!

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.5 | | 3584 bytes | |

    Michael Champion said:

    See

    "XML Parsing - A Threat to Database Performance." Be forewarned that the
    conclusion may be unpalatable:

    By rights, it seems that there should be some market for a highly
    optimized XML parser. You need high performance, you seek high performance
    libraries; if there are none, you get them made internally or externally.
    But I don't recall ever having seen any requests on XML-DEV for high speed
    parsers: certainly none with any dollars behind them.

    If some companies get together and say "We will pay $$$ for a higher
    performance XML parser" they would get one. A $10,000 first prize and
    $5,000 second prize for the winning parser on specified data, schema and
    platform would be enough stimulate a lot of hackers and researchers, not
    to mention prompting people with inhouse, private parsers to oen source
    them. When you move to an Source software economy, the issue for
    business becomes "How do we stimulate development in areas that help
    us?"

    this week I was listening to people from a client airline who had to
    write their own XML parser in PLI for optimized access to mainframe DB2.
    The lack of such a parser suggests to me that organizations using
    databases need to adopt a new,
    pro-active stance in getting high performance, open source XML software
    written. Passivity in this area will assure they only have unsuitable
    implementations.

    If you look at, say, Apache Xerces and Xalan, you can see that
    hyper-efficiency plays little part of the game. The same is true, by and
    large, for the other open source software. Hyper-efficient design is not
    an optimization that can be tacked on after, it has to be the core of the
    design; you cannot expect a general-purpose, cross-platform parser to be
    optimal. (For example, one trick that goes as far back as Mark's
    predecessor in the late 80s (I believe) was for parsers to have two
    parsers:
    one optimized for the most common case and encoding XML this would be
    for an entity-less document--, and another to handle
    all the other cases.)

    My expectation is that XML parsing can be significantly sped up with
    better use of SSE intrinsics*, integrating parsing and transcoding, also
    validation and type assignment using streaming path-matching rather than
    automata (i.e. transform horizontal grammars into vertical paths), direct
    parsing to native data types for numbers, for example. I am sure many
    other people have a shopping list of good ideas: but there are no parsers
    that implement any of these things AFAIK at the moment. Parser innovation
    has stalled, and it surely should be an issue of serious concern (and by
    serious concern I mean $$$) to high-volume companies to get it restarted.

    The other aspect is that there is no "type aware SAX" API. Without this,
    Source or even proprietary versus public parsers are not
    interchangeable. this applies to Java most, but the principle is
    the same: we need agreements at the interfaces (a.k.a. standards).

    Cheers
    Rick Jelliffe

    * See and search for
    Intrinsics. The Reilly blog site is being altered, it is a complete mess
    at the moment, so sorry about the odd format for this archive.

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.6 | | 1128 bytes | |

    Hello,

    Thu, 23 Mar 2006 14:30:52 -0000
    "Michael Kay" <mike (AT) saxonica (DOT) comwrote:

    My expectation is that XML parsing can be significantly sped up with

    I think that UTF-8 decoding is often the bottleneck and the obvious way
    to speed that up is to write the whole thing in assembler. I suspect the
    only way of getting a significant improvement (i.e. more than a
    doubling) in parser speed is to get closer to the hardware. I'm
    surprised no-one has done it. Perhaps no-one knows how to write
    assembler any more (or perhaps, like me, they just don't enjoy it).

    I think the issue is a bit different. An experienced developer can
    implement a very fast parser, for example, in 1 year. But whom he can
    sell it? I just don't see a market for XML parsers.

    Michael Kay
    http://www.saxonica.com/

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.7 | | 1709 bytes | |

    A. Paraschenko wrote:

    I think the issue is a bit different. An experienced developer can
    >
    >implement a very fast parser, for example, in 1 year. But whom he can
    >sell it? I just don't see a market for XML parsers.
    >


    Hence the need for something like a consortium offering a cash prize.
    Kickstart.

    Here is how I would see it working. 15 organizations (banks, vendors,
    etc) get together
    and put $1000 each into a kitty. They announce that they will pay
    $10,000 first prize
    and $5,000 second prize for the two fastest non-viral open source XML
    parsers that meet the bottom line of being twice as fast as libxml (as
    of current version) for a particular suite of ASCII-dominated
    transactions of about 1 to 10K each for non-validating parsing. Contest
    to run for six months.

    What do the sponsors get out of it? Worst case: no one wins; no cost,
    no benefit (though proving we need to go beyond XML does have a value
    actually!) Best case: tiny investment, substantial reduction in
    performance of multi-million dollar assets and transaction rates,
    ability to adopt desirable new architectures. Techniques are open source
    non-viral so they can potentially feed into commercial products (at the
    end of the day, Bill gets all the $$$ no matter what!)

    Any takers? Joseph Chiusano: know anyone?

    Cheers
    Rick

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.8 | | 2040 bytes | |

    Fastest doing what? Does the parser need to implement some API in some
    language on some set of platforms? Does it get to define its own API and
    pick a language and platforms?

    Bob

    Rick Jelliffe wrote:
    A. Paraschenko wrote:

    >I think the issue is a bit different. An experienced developer can
    >>

    >implement a very fast parser, for example, in 1 year. But whom he can
    >sell it? I just don't see a market for XML parsers.
    >>

    >
    >>

    Hence the need for something like a consortium offering a cash prize.
    Kickstart.

    Here is how I would see it working. 15 organizations (banks, vendors,
    etc) get together
    and put $1000 each into a kitty. They announce that they will pay
    $10,000 first prize
    and $5,000 second prize for the two fastest non-viral open source XML
    parsers that meet the bottom line of being twice as fast as libxml (as
    of current version) for a particular suite of ASCII-dominated
    transactions of about 1 to 10K each for non-validating parsing. Contest
    to run for six months.

    What do the sponsors get out of it? Worst case: no one wins; no cost,
    no benefit (though proving we need to go beyond XML does have a value
    actually!) Best case: tiny investment, substantial reduction in
    performance of multi-million dollar assets and transaction rates,
    ability to adopt desirable new architectures. Techniques are open source
    non-viral so they can potentially feed into commercial products (at the
    end of the day, Bill gets all the $$$ no matter what!)

    Any takers? Joseph Chiusano: know anyone?

    Cheers
    Rick

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.9 | | 2505 bytes | |

    can we just go back a minute - raw speed is not the only issue, it is
    the way in which the it degrades. o(n2) (order n squared) performance
    will always be bad, just faster bad.

    big documents will degrade badly - and this is the real thing to beat -
    not simply raw speed.

    rick

    Rick Jelliffe wrote:

    A. Paraschenko wrote:
    >
    >I think the issue is a bit different. An experienced developer can
    >>

    >implement a very fast parser, for example, in 1 year. But whom he can
    >sell it? I just don't see a market for XML parsers.
    >>

    >
    >>

    Hence the need for something like a consortium offering a cash prize.
    Kickstart.

    Here is how I would see it working. 15 organizations (banks, vendors,
    etc) get together
    and put $1000 each into a kitty. They announce that they will pay
    $10,000 first prize
    and $5,000 second prize for the two fastest non-viral open source XML
    parsers that meet the bottom line of being twice as fast as libxml (as
    of current version) for a particular suite of ASCII-dominated
    transactions of about 1 to 10K each for non-validating parsing.
    Contest to run for six months.

    What do the sponsors get out of it? Worst case: no one wins; no cost,
    no benefit (though proving we need to go beyond XML does have a value
    actually!) Best case: tiny investment, substantial reduction in
    performance of multi-million dollar assets and transaction rates,
    ability to adopt desirable new architectures. Techniques are open
    source non-viral so they can potentially feed into commercial products
    (at the end of the day, Bill gets all the $$$ no matter what!)

    Any takers? Joseph Chiusano: know anyone?

    Cheers
    Rick
    >
    >
    >


    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
    --
    !DSPAM:44237860163371060754097!

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.10 | | 2616 bytes | |

    Do you know some intrinsic reason why a parser can't scale linearly?
    AFAIK, a parser only needs to retain an element stack and a set of
    entity definitions that is fixed when document parsing begins. Without
    validation, what's the issue?

    Bob

    Rick Marshall wrote:
    can we just go back a minute - raw speed is not the only issue, it is
    the way in which the it degrades. o(n2) (order n squared) performance
    will always be bad, just faster bad.

    big documents will degrade badly - and this is the real thing to beat -
    not simply raw speed.

    rick

    Rick Jelliffe wrote:

    >A. Paraschenko wrote:
    >>

    I think the issue is a bit different. An experienced developer can

    implement a very fast parser, for example, in 1 year. But whom he can
    sell it? I just don't see a market for XML parsers.


    >Hence the need for something like a consortium offering a cash prize.
    >Kickstart.
    >>

    >Here is how I would see it working. 15 organizations (banks, vendors,
    >etc) get together
    >and put $1000 each into a kitty. They announce that they will pay
    >$10,000 first prize
    >and $5,000 second prize for the two fastest non-viral open source XML
    >parsers that meet the bottom line of being twice as fast as libxml (as
    >of current version) for a particular suite of ASCII-dominated
    >transactions of about 1 to 10K each for non-validating parsing.
    >Contest to run for six months.
    >>

    >What do the sponsors get out of it? Worst case: no one wins; no cost,
    >no benefit (though proving we need to go beyond XML does have a value
    >actually!) Best case: tiny investment, substantial reduction in
    >performance of multi-million dollar assets and transaction rates,
    >ability to adopt desirable new architectures. Techniques are open
    >source non-viral so they can potentially feed into commercial products
    >(at the end of the day, Bill gets all the $$$ no matter what!)
    >>

    >Any takers? Joseph Chiusano: know anyone?
    >>

    >Cheers
    >Rick


    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.11 | | 1888 bytes | |

    Rick Marshall wrote:

    can we just go back a minute - raw speed is not the only issue, it is
    the way in which the it degrades. o(n2) (order n squared) performance
    will always be bad, just faster bad.

    So what?

    Where is the (N^2) code in the UTF-8 converter? Encoding converters are
    regularly stated as the source of bottlenecks. ( course any serious
    developer will look at the order of the algorithm: Rick is being prudent
    and correct, but you don't solve problems by denying them!)

    big documents will degrade badly - and this is the real thing to beat
    - not simply raw speed.

    And that is the point of a contest. Instead of windbagging in vague
    generalities, developers can write code that actually tests and proves
    their (including my!) theories.

    Last year I think there were reports of good speed ups in parsing in
    Java jsut from re-using SAX objects. (I think the research was academic
    from Eastern Europe, sorry no references.) Good work. Corporate users of
    XML open source processors (and commercial processors too!) need to
    stimulate research and development of techniques and open source code,
    from all angles.

    XML is not a technology optimized for maximum throughput: indeed its
    development goal of "terseness is of minimal importance" means that it
    may be regarded as a technology optimised for mediocre throughput! It
    needs R&D, open source and benchmarking to improve the state of the
    basic processing stack to make it a better, cheaper fit for high-volume
    systems than it is now.

    Cheers
    Rick

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.12 | | 1372 bytes | |

    Bob Foster wrote:

    Fastest doing what? Does the parser need to implement some API in some
    language on some set of platforms? Does it get to define its own API
    and pick a language and platforms?

    Excellent question.

    I'd leave the language open to the competitor. I'd leave the API open
    too, since the aim is to encourage experimentation. However, it would
    make benchmarking easier if the processor used an existing API: less
    harness code to write. Also, except for SAX and DM, there is such a
    lack of standard APIs that mandating any particular ones seems useless.

    The other issue is whether you want to have one contest for
    XML-to-stream and a second contest for XML-to-tree. Personally I'd go
    for just XML-to-stream to keep it within the reach of single
    developers. (XML-to-tree could be a subsequent competition.)

    your issue of N(^2), I think Rick is referring to a potential
    problem of XML-to-Tree implementations, or maybe XPath etc traversers,
    rather than XML-to-stream processors necessarily.

    Cheers
    Rick

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.13 | | 705 bytes | |

    Rick Jelliffe <rjelliffe (AT) allette (DOT) com.auwrites:

    Last year I think there were reports of good speed ups in parsing in
    Java jsut from re-using SAX objects. (I think the research was
    academic from Eastern Europe, sorry no references.)

    Do you mean the paper "An Adaptive, Fast and Safe XML Parser Based on
    Byte Sequences Memorization" ()
    by Toshiro Takase and others that was at the WWW conference last year?
    Their parser caches the SAX events generated from a sequence of bytes
    and then replays those events when encountering the same bytes (of
    course taking into account context like prefix mappings). They report
    speedups of up to 70% compared to Piccolo.
  • No.14 | | 1254 bytes | |

    Just the sort of unexpected approach Rick's proposed contest is intended
    to smoke out! Your reference has convinced me Rick is onto something.

    Bob

    Jaakko Kangasharju wrote:
    Rick Jelliffe <rjelliffe (AT) allette (DOT) com.auwrites:


    >>Last year I think there were reports of good speed ups in parsing in
    >>Java jsut from re-using SAX objects. (I think the research was
    >>academic from Eastern Europe, sorry no references.)


    Do you mean the paper "An Adaptive, Fast and Safe XML Parser Based on
    Byte Sequences Memorization" ()
    by Toshiro Takase and others that was at the WWW conference last year?
    Their parser caches the SAX events generated from a sequence of bytes
    and then replays those events when encountering the same bytes (of
    course taking into account context like prefix mappings). They report
    speedups of up to 70% compared to Piccolo.

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.15 | | 1608 bytes | |

    Tatu Saloranta said:
    Rick Jelliffe <rjelliffe (AT) allette (DOT) com.auwrote:


    >Last year I think there were reports of good speed
    >ups in parsing in
    >Java jsut from re-using SAX objects. (I think the
    >research was academic
    >from Eastern Europe, sorry no references.) Good
    >

    Huh? So is this not a basic common knowledge?!?
    course proper reusing of components can have impact!
    And in case of SAX (or StAX, to a lesser degree), it
    has big impact for startup time, ie. performance when
    handling small documents.

    It is one thing to be common knowledge, it is another thing
    to be common practise :-)

    It sounds more like developer education issue, though.
    An order of magnitude or two simpler than trying to
    hand-code assembler level ultra-efficient decoding
    (although, for much bigger audience -- only small
    number of people need to write libs, compared hordes
    of developers using them).

    I don't think it is an education issue. People implement
    as well as they can, given limited resources. Last time
    I looked at the Xalan code, for example, there were notices
    saying "no attempt at optimization has been made".

    So the solution is not sneering, but contribution.

    Cheers
    Rick Jelliffe

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.16 | | 1047 bytes | |

    Tatu Saloranta said:

    There's nothing new there I guess; new CS students are
    taught recursion in all the wrong places, and
    generally learn quickly enough not to calculate
    multiplication by recursive addition by one.

    that subject, for text programming, the use of recursion is almost
    always the sign of an inexperienced or poor programmer. The worst offender
    in this is the Java REGEX package, which is almost useless for large
    document text processing, because of stack growth in some pathological
    cases with some harmless-looking regular expressions.

    Tree-walking and small-stack problems are a different kettle of fish, of
    course. when using languages with the tail-recursion optimization, of
    course.

    Cheers
    Rick

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.17 | | 523 bytes | |

    Mar 26, 2006, at 7:50 AM, Rick Jelliffe wrote:

    that subject, for text programming, the use of recursion is almost
    always the sign of an inexperienced or poor programmer.

    a sign of using the wrong programming language ;-)

    Stefan

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.18 | | 1060 bytes | |

    reflection, that sounds like something I did not want to say. I
    meant that the use of recursion is perfectly fine in languages that
    are designed to support it properly.

    Mar 26, 2006, at 9:20 AM, Stefan Tilkov wrote:

    Mar 26, 2006, at 7:50 AM, Rick Jelliffe wrote:
    >
    >that subject, for text programming, the use of recursion is almost
    >always the sign of an inexperienced or poor programmer.
    >

    a sign of using the wrong programming language ;-)

    Stefan

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.19 | | 3551 bytes | |

    Tatu,

    I suspect this is one of the reasons why such things as the
    development of intermediate node trees is such a big facet of XSLT 2.0
    (and why x:node-set() is implemented in just about every 1.0
    implementation) - these make it permissable to effectively optimize
    recursions.
    -- Kurt

    3/26/06, Tatu Saloranta <cowtowncoder (AT) yahoo (DOT) comwrote:
    Stefan Tilkov <info (AT) tilkov (DOT) comwrote:

    reflection, that sounds like something I did not
    want to say. I
    meant that the use of recursion is perfectly fine in
    languages that
    are designed to support it properly.

    I think there are both cases where using recursion
    would generally be a bad idea (cases where there are
    solutions with lower complexity; such as many toy
    examples used for teaching recursion), and many others
    were recursion may be a reasonable choice (like
    typical tree/graph traversing use cases).
    In latter category, choice has to do with things
    already mentioned (if and how efficiently
    language/runtime can optimize tail recursion; are
    there some extra penalties for deep call stacks etc),
    as well as clarity of code. But for former category it
    doesn't really matter: there's no way to fix an
    algorithmically bad solution to scale well.

    Recursion solutions are often cleaner, and I do not
    claim that all uses of recursion are bad. For
    tree/graph traversal they are usually good starting
    points.

    Regarding xslt, what I have always wondered is whether
    there are many (enough?) cases where having stateful
    alternative (with real variables or such to store
    state) would have benefits: it seems as if computing
    certain things just once (or cumulatively) could yield
    significant improvements.
    For example: instead of computing counts of nodes in a
    node set consisting of all previous siblings of
    certain element type, keeping track of number of such
    elements encountered. Keeping track would require bit
    more code (or new constructs to simplify it), but
    would seem likely to perform faster.
    Supporting state would be against XSL as a language
    (wouldn't it?), and would also reduce many
    optimization possibilites (document would have to be
    traversed and processed in a specific order, no
    parallel processing of sub-trees etc). But for some
    transformations it'd be much faster, possibly
    including lower complexity.
    one hand, it is certainly true that higher level
    abstractions give more optimization.
    But on the other hand, such optimizations can not
    compete with better algorithmic solutions: that is,
    even if computing node sets and their counts is as
    efficient as it could be, it still wouldn't match
    performance of low-overhead counters.

    So would there perhaps be also room for other xml
    transformation approaches, besides dominant functional
    alternatives? (but above 'wrap your own SAX-based
    solution' level)
    Anyone have links to papers on such alternatives? (the
    one regarding SAX event reuse was a very interesting
    one)

    -+ Tatu +-
    --

    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
    --
  • No.20 | | 1080 bytes | |

    Hello Tatu,

    Sun, 26 Mar 2006 09:42:33 -0800 (PST)
    Tatu Saloranta <cowtowncoder (AT) yahoo (DOT) comwrote:

    So would there perhaps be also room for other xml
    transformation approaches, besides dominant functional
    alternatives? (but above 'wrap your own SAX-based
    solution' level)
    Anyone have links to papers on such alternatives?

    I'm writing a paper about XSieve <http://xsieve.sourceforge.net/for
    XTech 2006. Technically, XSieve is just the standard way to extend XSLT,
    with Scheme as the language. But the result is more than the mix, it's the
    .

    (the
    one regarding SAX event reuse was a very interesting
    one)
    -+ Tatu +-

    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around
    http://mail.yahoo.com

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.21 | | 178 bytes | |

    Sun, Mar 26, 2006 at 08:24:14PM -0600, sterling wrote:
    I hard pressed to comprend what is meant by a bad programmer?
    After they die, programmers go rotten.
    Liam

Re: XML Performance in a Transacation


max 4000 letters.
Your nickname that display:
In order to stop the spam: 1 + 0 =
QUESTION ON "XML"

EMSDN.COM