Standards

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Beta test of the W3C Markup Validator (0.7.0 beta 1)

    15 answers - 3816 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    == Beta test for the W3C Markup Validator - version 0.7.0 beta 1 ==
    I am pleased to announce that we are starting today a Beta test period
    for the W3C Markup Validator, version 0.7.0 (beta 1).
    http://validator.w3.org:8001/
    The W3C Markup Validation Service, also known as "HTML validator", is a
    popular free service and software providing Web content authors a way to
    check their documents against their grammar. The previous stable version
    of the tool was released in July 2004, and we hope that this beta test
    will lead to a release of a new stable version in the weeks to come.
    Please send your feedback (see below for instructions) on this beta
    versions through Monday, July 26, 2005.
    ** Changes **
    The new "0.7.0" version brings a number of changes and bug fixes to the
    architecture, interface and documentation of the Markup Validator.
    The changes are listed at
    #t2005-07-12, and include:
    - Templated XHTML output
    - Better feedback mechanisms
    - User Interface improvements
    - The return of the "direct input" validation
    - Validation of documents using "custom DTDs"
    - Global updates to the documentation
    - and overall, more than 30 issues or bugs addressed.
    ** How to Test **
    In order to make the stable release as successful as possible, the tool
    will have to be tested in a variety of conditions by as many people as
    possible.
    * Test the new version online
    In addition to the usual service, a test instance of the Markup
    Validator is available online at the following address:
    http://validator.w3.org:8001/
    * Send Feedback
    When testing the beta version of the validator, you are invited to look
    for, and report, any bug or issue you may encounter. This includes
    validation bugs, software errors, User Interface issues, and other
    suggestions. Bug reports regarding the recognition of document types are
    particularly welcome.
    Instructions for feedback are given at:
    It is recommended to read through these instructions, and check the
    Mail archives and Bug database, before sending a new bug report to the
    publicly archived mailing-list www-validator (AT) w3 (DOT) org.
    * Install the validator locally
    The validator is free software, and it is possible to install it on a
    local Web server. Testing whether the latest version install and runs
    properly on all systems, and checking that the installation guide is
    correct and up to date (see below) would be valuable. People already
    maintaining a local instance of the validator are especially invited
    to install and test the beta version.
    A tarball of the latest version of the validator, as well as the
    catalogue of grammars it uses, are available:
    The installation guide is online, and distributed with the software:
    * Spread the word
    As mentioned earlier, getting a high number of people to participate
    to this beta test would help make it a success. If you are part of
    a community of Web designers, developers, or other types of users
    of the W3C Markup Validator, you may want to invite others in these
    communities to participate, too. Please refer to this announcement on
    the www-validator mailing-list:
    ** Thank you **
    Many thanks to the large, great community around the validator, for
    making this beta version happen. Thanks and congratulations to the
    volunteers of the QA Tools development group for their contribution,
    thanks to the participants of the www-validator mailing-list for
    providing invaluable feedback, suggestions, and an excellent support
    for every user of the tool. And finally, thank you all for participating
    in this beta test.
    olivier
  • No.1 | | 1048 bytes | |

    Produces very odd diagnostics indeed, :

    ->

    Unknown Document Type and Parse Mode!

    The MIME Media Type (text/html) for this document is used to serve
    both SGML and XML based documents, and no DCTYPE Declaration was
    found to disambiguate it. Parsing will continue in SGML mode and
    with a fallback DCTYPE similar to HTML 4.01 Transitional.

    This page is not Valid -//RHBNC//DTD HTML 4.01 Augmented//EN!

    1) If there was no DCTYPE declaration, how does it know that it
    should be -//RHBNC//DTD HTML 4.01 Augmented//EN

    I should add that it commences :

    1: <!DCTYPE HTML PUBLIC "-//RHBNC//DTD HTML 4.01 Augmented//EN"
    2: ""
    3: >

    Below are the results of attempting to parse this document with an SGML parser.

    1. Warning Line 76 column 27: cannot generate system identifier for general entity "nbsp".

    <td> <a target="_top" href="/" onM="MM_swapImg

    2) This diagnostic is not issued by the current validator :

    Philip Taylor
  • No.2 | | 2655 bytes | |

    Hi Philip,

    Thanks for checking the beta validator.

    Jul 12, 2005, at 22:06, Philip TAYLR wrote:
    ->

    Unknown Document Type and Parse Mode!

    I checked the part of the code that issued this warning. The said
    warning only happens when:
    - the pre-parsing found a Doctype
    - and the content-type cannot disambiguate whether to use XGML or XML
    mode (i.e, text/html)
    - but the doctype is not in our types database with info to
    disambiguate the mode

    so instead of
    [[
    The MIME Media Type (text/html) for this document is used to serve both
    SGML and XML based documents, and no DCTYPE Declaration was found to
    disambiguate it. Parsing will continue in SGML mode and with a fallback
    DCTYPE similar to HTML 4.01 Transitional.
    ]]
    I think it should be something like
    [[
    The MIME Media Type (text/html) for this document is used to serve both
    SGML and XML based documents, and it is not possible to disambiguate it
    based on the DCTYPE Declaration in your document. Parsing will
    continue in SGML mode.
    ]]
    I think Terje initially wrote this, he's really busy these days but
    I'll try to see if he can give it a look.

    Now for the other issue

    I should add that it commences :
    1: <!DCTYPE HTML PUBLIC "-//RHBNC//DTD HTML 4.01 Augmented//EN"
    2: ""
    3: >
    Error Line 76 column 27: general entity "nbsp" not defined and no
    default entity.
    This diagnostic is not issued by the current validator

    This is SGML territory, so hopefully someone will be able to confirm,
    or correct, my understanding of the situation.

    * You are using a "custom" DTD, based on a copy of the HTML 4.01 DTD,
    and which you're publishing at:

    * In that DTD, the reference to entities is made (as in HTML 4.01) with
    relative URIS, e.g:
    <!ENTITY % HTMLlat1 PUBLIC
    "-//W3C//ENTITIES Latin1//EN//HTML"
    "HTMLlat1.ent">
    %HTMLlat1;

    But there is nothing at
    Isn't that a mistake?

    Now the reason why the "usual" validator (v0.6.7) does not complain
    about this is that the SGML catalogue it uses knows how to dereference
    the "-//W3C//ENTITIES Latin1//EN//HTML" FPI, whereas the "new"
    validator has a catalogue that only knows "-//W3C//ENTITIES Latin
    1//EN//HTML". This is most likely a victim of a cleanup of the said
    catalogue. The cleanup was a bit zealous and it's possible that this
    removal was a mistake. Hmm, quite probable actually, the DTD in the
    HTML4.01 spec uses the "Latin1" FPI, not "Latin 1". Could anyone among
    our SGML gurus confirm?

    Thanks,
  • No.3 | | 886 bytes | |

    Many thanks for the feedback, I am most grateful
    to you for pointing out the defects in my DTD, which I shall
    fix immediately. As regards the DCTYPE, however, and the
    disambiguation aspect :

    The MIME Media Type (text/html) for this document is used to serve both
    SGML and XML based documents, and it is not possible to disambiguate it
    based on the DCTYPE Declaration in your document. Parsing will continue
    in SGML mode.
    - but the doctype is not in our types database with info to
    disambiguate the mode

    this does seem a slightly worrying aspect. Presumably your
    "types database" is hard-coded, and knows only about
    W3C standard DCTYPEs; do you think there is any mileage
    in allowing some "disambiguation pragmat" in non-standard
    DTDs, and if so, which is the right forum on which to raise
    this issue ?

    Philip Taylor
  • No.4 | | 2925 bytes | |

    Hello, Philip,

    13 Jul 2005, at 20:04, Philip TAYLR wrote:
    Many thanks for the feedback, I am most grateful
    to you for pointing out the defects in my DTD, which I shall
    fix immediately.

    Great. Note that unless I hear objections in the enxt few days, I am
    likely to re-add the Latin 1 entities FPI to the SGML catalogue, but
    fixing your DTD will do no harm.

    As regards the DCTYPE, however, and the
    disambiguation aspect :

    The MIME Media Type (text/html) for this document is used to
    serve both
    SGML and XML based documents, and it is not possible to
    disambiguate it
    based on the DCTYPE Declaration in your document. Parsing
    will continue
    in SGML mode.

    this does seem a slightly worrying aspect. Presumably your
    "types database" is hard-coded, and knows only about
    W3C standard DCTYPEs;

    Right.

    do you think there is any mileage
    in allowing some "disambiguation pragmat" in non-standard
    DTDs, and if so, which is the right forum on which to raise
    this issue ?

    This is a tough question, and I am probably by far the worst person
    on this list to answer it, but let's give it a try anyway. Frankly,
    even when talking about standard DTDs, we are in the realm of non-
    normative. So it should not be a surprise that for non-standard DTDs,
    the situation is even fuzzier
    - The text/html RFC is informative and makes no mention of the fact
    that such documents should be parsed as SGML or XML
    - There is no clear identification that a DTD is an SGML or XML one.
    Well, there are as far as I know rules that XML DTD must follow, that
    are stricter than SGML DTDs don't, so in a way you could use that.
    But I might be wrong. And even if I am right, that's far fetched.
    - Even for "standard" XHTML document types, I am not aware of a
    normative clarification of how content served as text/html should be
    parsed. And that's beyond the point of this thread, see: http://

    The "informative" consensus, however, seems to be that text/html is
    mostly for SGML applications, and the fact that XHTML can be served
    as such is just a necessary evil ("necessary" and "evil" being, as a
    matter of fact, both subject to endless arguing) - see http://
    #text-html. As a result, I think that
    what the validator does with documents served as text/html and with
    DTDs it doesn't know - parsing them as SGML - is correct.

    But that is still heuristic

    And as to whom people should turn to for an actual answer, I guess
    "no one" The HTML WG could say something about it, but frankly,
    they are already busy enough and the text/html situation is already
    thorny enough with just the W3C standard DTDs that I can't imagine
    they'd like to pronounce themselves on non-standard DTDs But then
    again, I might be wrong.

    Hope this helps,
  • No.5 | | 1039 bytes | |

    Hello, -- Thanks for all further clarification
    and references : one point remains, I think --
    As a result, I think that
    what the validator does with documents served as text/html and with
    DTDs it doesn't know - parsing them as SGML - is correct.

    K, I'm happy with that, but less happy with what the
    Validator /claimed/ it was about to do, which was

    "Parsing will continue in SGML mode and
    with a fallback DCTYPE similar to HTML
    4.01 Transitional. "

    which you proposed to re-cast as

    "Parsing will continue in SGML mode."

    Now what the Validator /claims/ to be about to do,
    and what it actually does, are not necessarily
    the same, so could I ask -- if you amend the wording
    to your proposed form -- will the Validator in fact

    "continue in SGML mode and
    with a fallback DCTYPE similar to HTML
    4.01 Transitional"

    or just

    "continue in SGML mode"

    I'm sure you appreciate the significance of the question!

    ** Phil.
  • No.6 | | 720 bytes | |

    A bug!

    Using direct input, with null content,
    the beta validator reports :

    This page is not Valid (no Doctype found)!

    Below are the results of attempting to parse this document with an SGML parser.

    1. Error Line 1 column 0: character "1" not allowed in prolog.

    1

    2. Error Line 1 column 1: end of document in prolog.

    1

    There /is/ no "1" in null content, so presumably
    the validator is generating it for itself

    The little envelope icon (which only manifested
    itself when I composed this message) is so small
    on the original screen that any chance of feedback
    arriving via that route must be /vanishingly/ small

    Philip Taylor
  • No.7 | | 673 bytes | |

    * Philip TAYLR wrote:
    >[%3Atext%2Fhtml%2C]
    >
    >There /is/ no "1" in null content, so presumably
    >the validator is generating it for itself


    You mean it should say "Line 0"? Well, perhaps we should simply point
    out that no content was received and the processing model for empty
    documents is yet to be defined by the wtfwg

    >The little envelope icon (which only manifested
    >itself when I composed this message) is so small
    >on the original screen that any chance of feedback
    >arriving via that route must be /vanishingly/ small


    That's intentional
  • No.8 | | 1160 bytes | |

    Bjoern Hoehrmann wrote:

    * Philip TAYLR wrote:

    >>[%3Atext%2Fhtml%2C]
    >>
    >>There /is/ no "1" in null content, so presumably
    >>the validator is generating it for itself


    You mean it should say "Line 0"? Well, perhaps we should simply point
    out that no content was received and the processing model for empty
    documents is yet to be defined by the wtfwg

    No, I think you missed the point : it quite clearly
    says 'character "1" not allowed in prolog.'. There
    /is/ no 'character "1"' in the source, so there must
    be a genuine bug in the validator code which is
    causing this 'character "1"' to be generated and
    inserted into the parsing stream


    >>The little envelope icon (which only manifested
    >>itself when I composed this message) is so small
    >>on the original screen that any chance of feedback
    >>arriving via that route must be /vanishingly/ small


    That's intentional

    :-)))
  • No.9 | | 514 bytes | |

    * Philip TAYLR wrote:
    >No, I think you missed the point : it quite clearly
    >says 'character "1" not allowed in prolog.'. There
    >/is/ no 'character "1"' in the source, so there must
    >be a genuine bug in the validator code which is
    >causing this 'character "1"' to be generated and
    >inserted into the parsing stream


    I see, then I am unable to reproduce this, there is no such message
    in <%3Atext%2Fhtml%2C>.
  • No.10 | | 693 bytes | |

    Fascinating : I see the difference, and
    can't explain it as of now

    Bjoern Hoehrmann wrote:

    * Philip TAYLR wrote:

    >>No, I think you missed the point : it quite clearly
    >>says 'character "1" not allowed in prolog.'. There
    >>/is/ no 'character "1"' in the source, so there must
    >>be a genuine bug in the validator code which is
    >>causing this 'character "1"' to be generated and
    >>inserted into the parsing stream


    I see, then I am unable to reproduce this, there is no such message
    in <%3Atext%2Fhtml%2C>.
  • No.11 | | 534 bytes | |

    No, I really can't explain it. Bjoern,
    what results do you get if you go to

    http://validator.w3.org:8001/

    and click on the "Submit" button below
    the "Validate by Direct Input" box with
    nothing in the latter ?

    Philip TAYLR wrote:

    >

    Fascinating : I see the difference, and
    can't explain it as of now
    >I see, then I am unable to reproduce this, there is no such message
    >in <%3Atext%2Fhtml%2C>.
  • No.12 | | 595 bytes | |

    * Philip TAYLR wrote:
    >No, I really can't explain it. Bjoern,
    >what results do you get if you go to
    >
    >http://validator.w3.org:8001/
    >
    >and click on the "Submit" button below
    >the "Validate by Direct Input" box with
    >nothing in the latter ?


    Aah, indeed, with I am able
    to reproduce this. I suspect this comes from CGI.pm which uses 1 to tell
    that the parameter is set. So we need to replace

    $File->{Bytes} = $q->param('fragment');

    by something else Thanks for your report!
  • No.13 | | 786 bytes | |

    Hi Philip,

    15 Jul 2005, at 00:13, Philip TAYLR wrote:

    Now what the Validator /claims/ to be about to do,
    and what it actually does, are not necessarily
    the same, so could I ask -- if you amend the wording
    to your proposed form -- will the Validator in fact

    "continue in SGML mode and
    with a fallback DCTYPE similar to HTML
    4.01 Transitional"

    or just

    "continue in SGML mode"

    I'm sure you appreciate the significance of the question!

    Indeed. My proposed wording is in sync with what the validator
    actually does. In other words, there is no doctype change or fallback
    in this case (or the parser would not have, for instance, complained
    about the undeclared entities).

    Hope this clarification suits you.
  • No.14 | | 1286 bytes | |

    Jul 15, 2005, at 1:52, Bjoern Hoehrmann wrote:
    * Philip TAYLR wrote:
    >The little envelope icon (which only manifested
    >itself when I composed this message) is so small
    >on the original screen that any chance of feedback
    >arriving via that route must be /vanishingly/ small
    >

    That's intentional

    I agree it *was* intentional When the "error message feedback"
    mailto: link was first added as a way to get users to participate to
    the improvement of error message explanations, the signal/noise ratio
    on the list plummeted with a flood of empty or awfully worded calls for
    help, and making the icon (actually not an icon, but a character)
    smaller was a "quick" fix to limit the damage while we would work on
    better feedback channels than a mere "mailto:" link.

    We have made this work, and the feedback improvements are actually one
    of the major changes in 0.7.0. The "envelope" now links to e.g:
    %3Atext%2Fhtml%2C;
    errmsg_id=47#errormsg
    with pre-filled search queries, instructions etc. It won't miraculously
    kill all noisy feedback, but as a filtering mechanism, it's hopefully
    much better. I suggest making the "envelope" normal sized again.

    Any thought?
  • No.15 | | 162 bytes | |

    olivier Thereaux wrote:
    [snip]

    >Hope this clarification suits you.

    Yes indeed. Thank you,
    ** Phil.

Re: Beta test of the W3C Markup Validator (0.7.0 beta 1)


max 4000 letters.
Your nickname that display:
In order to stop the spam: 7 + 6 =
QUESTION ON "Standards"

EMSDN.COM