XML

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Namespaces, Xml Schema Whitespace normalization, xs:anyURI, and URILiterals in XPath 2.0

    8 answers - 3505 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Michael Kay <mike (AT) saxonica (DOT) comha scritto:
    I think that spaces in URIs, as from the RFC,
    are not allowed
    Yes, that's true.
    If they are present
    in the characters of an URI they should
    be ignored, they are there just to allow
    to split the URI between multiple lines.
    (this part comes [historically] from the URL RFC).
    I don't recall seeing any such statement: can you
    give a reference?
    In this section they talk about whitespace
    1.6. Syntax Notation and Common Elements
    This document uses two conventions to describe and
    define the syntax
    for URI. The first, called the layout form, is a
    general description
    of the order of components and component
    separators, as in
    <first>/<second>;<third>?<fourth>
    The component names are enclosed in angle-brackets
    and any characters
    outside angle-brackets are literal separators.
    Whitespace should be
    ignored. These descriptions are used informally
    and do not define
    the syntax requirements.
    Then they say is excluded, we agreed on this.
    2.4.3. Excluded US-ASCII Characters
    The space character is excluded because significant
    spaces may
    disappear and insignificant spaces may be
    introduced when URI are
    transcribed or typeset or subjected to the
    treatment of word-
    processing programs. Whitespace is also used to
    delimit URI in many
    contexts.

    >In Appendix E: They say it should be removed

    E. Recommendations for Delimiting URI in Context
    In some cases, extra whitespace (spaces, linebreaks,
    tabs, etc.) may
    need to be added to break long URI across lines.
    The whitespace
    should be ignored when extracting the URI.
    I have to go, talk later of the rest.
    Regards,
    Michele
    So
    http://www.example.com/Example with two spaces
    is not a valid xs:anyURI
    You seem to be assuming that because it's not a
    valid URI then it's not a
    valid xs:anyURI. This doesn't follow. The schema
    spec allows an xs:anyURI to
    contain what I call a "wannabe URI": more formally,
    it can contain any
    string that can be mapped to a URI by following the
    escaping procedure in
    section 5.4 of XLink. This mapping performs
    percent-encoding on all
    "disallowed characters"; a space is a disallowed
    character that maps to %20;
    therefore a space is allowed in an xs:anyURI value
    (even though it not
    allowed in an IRI as defined by RFC 3987).
    As further evidence that space is allowed in an
    xs:anyURI, you yourself
    quoted the statement that spaces are discouraged. It
    wouldn't be necessary
    to discourage them if they were invalid.
    Michael Kay
    http://www.saxonica.com/
    The xml-dev list is sponsored by XML.org
    <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>
    The list archives are at
    To subscribe or unsubscribe from this list use the
    subscription
    manager:
    <>
    Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB
    http://mail.yahoo.it
    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>
    The list archives are at
    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.1 | | 687 bytes | |

    The other aspect to anyURI is that if you are using a simple type derived
    by list from anyURI, then whitespace will be used as a token delimiter.
    Therefore in XML you cannot use literal spaces in URI-ish strings for
    types derived by list from anyURI. The same issue arises for the
    schemaLocation hint.

    It is probably prudent to simply avoid spaces in URIs altogether.

    Cheers
    Rick Jelliffe

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.2 | | 1290 bytes | |

    Prudence is one thing, responding to the actions of the imprudent is another.

    I'm basically wondering how one should deal with the imprudent in this
    situation.
    Especially if ignoring them is not an option.

    Cheers,
    Bryan Rasmussen

    3/29/06, Rick Jelliffe <rjelliffe (AT) allette (DOT) com.auwrote:
    The other aspect to anyURI is that if you are using a simple type derived
    by list from anyURI, then whitespace will be used as a token delimiter.
    Therefore in XML you cannot use literal spaces in URI-ish strings for
    types derived by list from anyURI. The same issue arises for the
    schemaLocation hint.

    It is probably prudent to simply avoid spaces in URIs altogether.

    Cheers
    Rick Jelliffe

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
    --

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.3 | | 711 bytes | |

    Especially if ignoring them is not an option.

    Is that so? I really wonder why we can't just agree that URIs are opaque
    strings in the context of XML that should be used unchanged.

    I just can't imagine that any user might actually benefit from these
    normalisation rules if even the gurus argue about what is right or
    wrong. You end up with behaviour that is totally unpredictable to the
    user.

    Martin

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.4 | | 2024 bytes | |

    I though the convention with URLs was to replace spaces with %20

    Bob Irving

    29 Mar 2006, at 07:36, bryan rasmussen wrote:

    Prudence is one thing, responding to the actions of the imprudent
    is another.

    I'm basically wondering how one should deal with the imprudent in this
    situation.
    Especially if ignoring them is not an option.

    Cheers,
    Bryan Rasmussen

    3/29/06, Rick Jelliffe <rjelliffe (AT) allette (DOT) com.auwrote:
    >The other aspect to anyURI is that if you are using a simple type
    >derived
    >by list from anyURI, then whitespace will be used as a token
    >delimiter.
    >Therefore in XML you cannot use literal spaces in URI-ish strings for
    >types derived by list from anyURI. The same issue arises for the
    >schemaLocation hint.
    >>

    >It is probably prudent to simply avoid spaces in URIs altogether.
    >>

    >Cheers
    >Rick Jelliffe
    >>

    >
    >The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    >initiative of ASIS <http://www.oasis-open.org>
    >>

    >The list archives are at
    >>

    >To subscribe or unsubscribe from this list use the subscription
    >manager: <>
    >>
    >>

    >


    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.5 | | 1112 bytes | |

    It is probably prudent to simply avoid spaces in URIs altogether.

    I'd agree, but

    file:///c:/Program Files/hmmmm

    No one would be mad enough to put a space in a directory name
    like "Program Files" would they? That would be silly

    So the choice is; do you make the end user (if it's end users creating
    the xml files) do the %20 escaping, or do you let the end user use a
    space and specify that somewhere along the chain between the xml file
    and the URL resolution the space gets encoded. Just simply avoiding
    them isn't really an option.

    David

    This e-mail has been scanned for all viruses by Star. The
    service is powered by MessageLabs. For more information on a proactive
    anti-virus service working around the clock, around the globe, visit:
    http://www.star.net.uk

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.6 | | 1409 bytes | |

    It is probably prudent to simply avoid spaces in URIs altogether.

    I'd agree, but

    file:///c:/Program Files/hmmmm

    No one would be mad enough to put a space in a directory name
    like "Program Files" would they? That would be silly

    Just use the German Windows version, there it's
    file:///c:/Programme/ ;-)

    So the choice is; do you make the end user (if it's end users creating
    the xml files) do the %20 escaping, or do you let the end user use a
    space and specify that somewhere along the chain between the xml file
    and the URL resolution the space gets encoded. Just simply avoiding
    them isn't really an option.

    It's probably much too late to fix, but the problem is that people who
    are deep into XML for years are not entirely sure which part of which
    specs a transcoding/escaping step at exactly which point. I'd argue that
    if a guru can't really understand it, a user won't too, and he will get
    results he can't understand or predict. In that case I think less is
    more - just have the user do the escaping.

    Martin

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.7 | | 1783 bytes | |

    David Carlisle wrote:
    >It is probably prudent to simply avoid spaces in URIs altogether.


    I'd agree, but

    file:///c:/Program Files/hmmmm

    No one would be mad enough to put a space in a directory name
    like "Program Files" would they? That would be silly

    So the choice is; do you make the end user (if it's end users creating
    the xml files) do the %20 escaping, or do you let the end user use a
    space and specify that somewhere along the chain between the xml file
    and the URL resolution the space gets encoded. Just simply avoiding
    them isn't really an option.

    According to the specs I would say that you should raise an error for
    the URI as it is not legal. Ultimately this means that the user (or the
    editor/tool the user is using to generate the document) should convert
    the spaces to %20. Additionally, I would say that the ":" is illegal
    based on rfc1738 [1] which says:

    Thus, only alphanumerics, the special characters "$!*'(),", and
    reserved characters used for their reserved purposes may be used
    unencoded within a URL.

    Under file:// there is no reserved purpose for the ":" character as I
    read it.

    The gray area is that Namespaces in XML and xs:anyURI talk about strings
    that can be turned into URI references. In both cases however, I think
    that the algorithms they cite do not allow for converting " " to "%20".

    [1]

    Cheers,
    Jeff Rafter

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>
  • No.8 | | 1208 bytes | |

    David Carlisle wrote:

    >So the choice is; do you make the end user (if it's end users creating
    >the xml files) do the %20 escaping, or do you let the end user use a
    >space and specify that somewhere along the chain between the xml file
    >and the URL resolution the space gets encoded. Just simply avoiding
    >them isn't really an option.
    >

    course it is an option. For example, people who use UNIX shells
    manage to live
    without spaces in file names perfectly well. It is prudent not becuase
    everyone will do it,
    but because cautious people will. It is prudent not to use a rare
    encoding too.

    As a side issue, I don't think "but people will always cross the road
    without looking: it is not an option" is no answer to a statement like
    "it is prudent for people to look before walking across the road."

    Cheers
    Rick Jelliffe

    The xml-dev list is sponsored by XML.org <http://www.xml.org>, an
    initiative of ASIS <http://www.oasis-open.org>

    The list archives are at

    To subscribe or unsubscribe from this list use the subscription
    manager: <>

Re: Namespaces, Xml Schema Whitespace normalization, xs:anyURI, and URILiterals in XPath 2.0


max 4000 letters.
Your nickname that display:
In order to stop the spam: 8 + 7 =
QUESTION ON "XML"

EMSDN.COM