Beta test of the W3C Markup Validator (0.7.0 beta 1)
15 answers - 3816 bytes -

== Beta test for the W3C Markup Validator - version 0.7.0 beta 1 ==
I am pleased to announce that we are starting today a Beta test period
for the W3C Markup Validator, version 0.7.0 (beta 1).
http://validator.w3.org:8001/
The W3C Markup Validation Service, also known as "HTML validator", is a
popular free service and software providing Web content authors a way to
check their documents against their grammar. The previous stable version
of the tool was released in July 2004, and we hope that this beta test
will lead to a release of a new stable version in the weeks to come.
Please send your feedback (see below for instructions) on this beta
versions through Monday, July 26, 2005.
** Changes **
The new "0.7.0" version brings a number of changes and bug fixes to the
architecture, interface and documentation of the Markup Validator.
The changes are listed at
#t2005-07-12, and include:
- Templated XHTML output
- Better feedback mechanisms
- User Interface improvements
- The return of the "direct input" validation
- Validation of documents using "custom DTDs"
- Global updates to the documentation
- and overall, more than 30 issues or bugs addressed.
** How to Test **
In order to make the stable release as successful as possible, the tool
will have to be tested in a variety of conditions by as many people as
possible.
* Test the new version online
In addition to the usual service, a test instance of the Markup
Validator is available online at the following address:
http://validator.w3.org:8001/
* Send Feedback
When testing the beta version of the validator, you are invited to look
for, and report, any bug or issue you may encounter. This includes
validation bugs, software errors, User Interface issues, and other
suggestions. Bug reports regarding the recognition of document types are
particularly welcome.
Instructions for feedback are given at:
It is recommended to read through these instructions, and check the
Mail archives and Bug database, before sending a new bug report to the
publicly archived mailing-list www-validator (AT) w3 (DOT) org.
* Install the validator locally
The validator is free software, and it is possible to install it on a
local Web server. Testing whether the latest version install and runs
properly on all systems, and checking that the installation guide is
correct and up to date (see below) would be valuable. People already
maintaining a local instance of the validator are especially invited
to install and test the beta version.
A tarball of the latest version of the validator, as well as the
catalogue of grammars it uses, are available:
The installation guide is online, and distributed with the software:
* Spread the word
As mentioned earlier, getting a high number of people to participate
to this beta test would help make it a success. If you are part of
a community of Web designers, developers, or other types of users
of the W3C Markup Validator, you may want to invite others in these
communities to participate, too. Please refer to this announcement on
the www-validator mailing-list:
** Thank you **
Many thanks to the large, great community around the validator, for
making this beta version happen. Thanks and congratulations to the
volunteers of the QA Tools development group for their contribution,
thanks to the participants of the www-validator mailing-list for
providing invaluable feedback, suggestions, and an excellent support
for every user of the tool. And finally, thank you all for participating
in this beta test.
olivier
No.1 | | 1048 bytes |
| 
Produces very odd diagnostics indeed, :
->
Unknown Document Type and Parse Mode!
The MIME Media Type (text/html) for this document is used to serve
both SGML and XML based documents, and no DCTYPE Declaration was
found to disambiguate it. Parsing will continue in SGML mode and
with a fallback DCTYPE similar to HTML 4.01 Transitional.
This page is not Valid -//RHBNC//DTD HTML 4.01 Augmented//EN!
1) If there was no DCTYPE declaration, how does it know that it
should be -//RHBNC//DTD HTML 4.01 Augmented//EN
I should add that it commences :
1: <!DCTYPE HTML PUBLIC "-//RHBNC//DTD HTML 4.01 Augmented//EN"
2: ""
3: >
Below are the results of attempting to parse this document with an SGML parser.
1. Warning Line 76 column 27: cannot generate system identifier for general entity "nbsp".
<td> <a target="_top" href="/" onM="MM_swapImg
2) This diagnostic is not issued by the current validator :
Philip Taylor
No.2 | | 2655 bytes |
| 
Hi Philip,
Thanks for checking the beta validator.
Jul 12, 2005, at 22:06, Philip TAYLR wrote:
->
Unknown Document Type and Parse Mode!
I checked the part of the code that issued this warning. The said
warning only happens when:
- the pre-parsing found a Doctype
- and the content-type cannot disambiguate whether to use XGML or XML
mode (i.e, text/html)
- but the doctype is not in our types database with info to
disambiguate the mode
so instead of
[[
The MIME Media Type (text/html) for this document is used to serve both
SGML and XML based documents, and no DCTYPE Declaration was found to
disambiguate it. Parsing will continue in SGML mode and with a fallback
DCTYPE similar to HTML 4.01 Transitional.
]]
I think it should be something like
[[
The MIME Media Type (text/html) for this document is used to serve both
SGML and XML based documents, and it is not possible to disambiguate it
based on the DCTYPE Declaration in your document. Parsing will
continue in SGML mode.
]]
I think Terje initially wrote this, he's really busy these days but
I'll try to see if he can give it a look.
Now for the other issue
I should add that it commences :
1: <!DCTYPE HTML PUBLIC "-//RHBNC//DTD HTML 4.01 Augmented//EN"
2: ""
3: >
Error Line 76 column 27: general entity "nbsp" not defined and no
default entity.
This diagnostic is not issued by the current validator
This is SGML territory, so hopefully someone will be able to confirm,
or correct, my understanding of the situation.
* You are using a "custom" DTD, based on a copy of the HTML 4.01 DTD,
and which you're publishing at:
* In that DTD, the reference to entities is made (as in HTML 4.01) with
relative URIS, e.g:
<!ENTITY % HTMLlat1 PUBLIC
"-//W3C//ENTITIES Latin1//EN//HTML"
"HTMLlat1.ent">
%HTMLlat1;
But there is nothing at
Isn't that a mistake?
Now the reason why the "usual" validator (v0.6.7) does not complain
about this is that the SGML catalogue it uses knows how to dereference
the "-//W3C//ENTITIES Latin1//EN//HTML" FPI, whereas the "new"
validator has a catalogue that only knows "-//W3C//ENTITIES Latin
1//EN//HTML". This is most likely a victim of a cleanup of the said
catalogue. The cleanup was a bit zealous and it's possible that this
removal was a mistake. Hmm, quite probable actually, the DTD in the
HTML4.01 spec uses the "Latin1" FPI, not "Latin 1". Could anyone among
our SGML gurus confirm?
Thanks,
No.3 | | 886 bytes |
| 
Many thanks for the feedback, I am most grateful
to you for pointing out the defects in my DTD, which I shall
fix immediately. As regards the DCTYPE, however, and the
disambiguation aspect :
The MIME Media Type (text/html) for this document is used to serve both
SGML and XML based documents, and it is not possible to disambiguate it
based on the DCTYPE Declaration in your document. Parsing will continue
in SGML mode.
- but the doctype is not in our types database with info to
disambiguate the mode
this does seem a slightly worrying aspect. Presumably your
"types database" is hard-coded, and knows only about
W3C standard DCTYPEs; do you think there is any mileage
in allowing some "disambiguation pragmat" in non-standard
DTDs, and if so, which is the right forum on which to raise
this issue ?
Philip Taylor
No.4 | | 2925 bytes |
| 
Hello, Philip,
13 Jul 2005, at 20:04, Philip TAYLR wrote:
Many thanks for the feedback, I am most grateful
to you for pointing out the defects in my DTD, which I shall
fix immediately.
Great. Note that unless I hear objections in the enxt few days, I am
likely to re-add the Latin 1 entities FPI to the SGML catalogue, but
fixing your DTD will do no harm.
As regards the DCTYPE, however, and the
disambiguation aspect :
The MIME Media Type (text/html) for this document is used to
serve both
SGML and XML based documents, and it is not possible to
disambiguate it
based on the DCTYPE Declaration in your document. Parsing
will continue
in SGML mode.
this does seem a slightly worrying aspect. Presumably your
"types database" is hard-coded, and knows only about
W3C standard DCTYPEs;
Right.
do you think there is any mileage
in allowing some "disambiguation pragmat" in non-standard
DTDs, and if so, which is the right forum on which to raise
this issue ?
This is a tough question, and I am probably by far the worst person
on this list to answer it, but let's give it a try anyway. Frankly,
even when talking about standard DTDs, we are in the realm of non-
normative. So it should not be a surprise that for non-standard DTDs,
the situation is even fuzzier
- The text/html RFC is informative and makes no mention of the fact
that such documents should be parsed as SGML or XML
- There is no clear identification that a DTD is an SGML or XML one.
Well, there are as far as I know rules that XML DTD must follow, that
are stricter than SGML DTDs don't, so in a way you could use that.
But I might be wrong. And even if I am right, that's far fetched.
- Even for "standard" XHTML document types, I am not aware of a
normative clarification of how content served as text/html should be
parsed. And that's beyond the point of this thread, see: http://
The "informative" consensus, however, seems to be that text/html is
mostly for SGML applications, and the fact that XHTML can be served
as such is just a necessary evil ("necessary" and "evil" being, as a
matter of fact, both subject to endless arguing) - see http://
#text-html. As a result, I think that
what the validator does with documents served as text/html and with
DTDs it doesn't know - parsing them as SGML - is correct.
But that is still heuristic
And as to whom people should turn to for an actual answer, I guess
"no one" The HTML WG could say something about it, but frankly,
they are already busy enough and the text/html situation is already
thorny enough with just the W3C standard DTDs that I can't imagine
they'd like to pronounce themselves on non-standard DTDs But then
again, I might be wrong.
Hope this helps,
No.5 | | 1039 bytes |
| 
Hello, -- Thanks for all further clarification
and references : one point remains, I think --
As a result, I think that
what the validator does with documents served as text/html and with
DTDs it doesn't know - parsing them as SGML - is correct.
K, I'm happy with that, but less happy with what the
Validator /claimed/ it was about to do, which was
"Parsing will continue in SGML mode and
with a fallback DCTYPE similar to HTML
4.01 Transitional. "
which you proposed to re-cast as
"Parsing will continue in SGML mode."
Now what the Validator /claims/ to be about to do,
and what it actually does, are not necessarily
the same, so could I ask -- if you amend the wording
to your proposed form -- will the Validator in fact
"continue in SGML mode and
with a fallback DCTYPE similar to HTML
4.01 Transitional"
or just
"continue in SGML mode"
I'm sure you appreciate the significance of the question!
** Phil.
No.6 | | 720 bytes |
| 
A bug!
Using direct input, with null content,
the beta validator reports :
This page is not Valid (no Doctype found)!
Below are the results of attempting to parse this document with an SGML parser.
1. Error Line 1 column 0: character "1" not allowed in prolog.
1
2. Error Line 1 column 1: end of document in prolog.
1
There /is/ no "1" in null content, so presumably
the validator is generating it for itself
The little envelope icon (which only manifested
itself when I composed this message) is so small
on the original screen that any chance of feedback
arriving via that route must be /vanishingly/ small
Philip Taylor
No.7 | | 673 bytes |
| 
* Philip TAYLR wrote:
>[%3Atext%2Fhtml%2C]
>
>There /is/ no "1" in null content, so presumably
>the validator is generating it for itself
You mean it should say "Line 0"? Well, perhaps we should simply point
out that no content was received and the processing model for empty
documents is yet to be defined by the wtfwg
>The little envelope icon (which only manifested
>itself when I composed this message) is so small
>on the original screen that any chance of feedback
>arriving via that route must be /vanishingly/ small
That's intentional
No.8 | | 1160 bytes |
| 
Bjoern Hoehrmann wrote:
* Philip TAYLR wrote:
>>[%3Atext%2Fhtml%2C]
>>
>>There /is/ no "1" in null content, so presumably
>>the validator is generating it for itself
You mean it should say "Line 0"? Well, perhaps we should simply point
out that no content was received and the processing model for empty
documents is yet to be defined by the wtfwg
No, I think you missed the point : it quite clearly
says 'character "1" not allowed in prolog.'. There
/is/ no 'character "1"' in the source, so there must
be a genuine bug in the validator code which is
causing this 'character "1"' to be generated and
inserted into the parsing stream
>>The little envelope icon (which only manifested
>>itself when I composed this message) is so small
>>on the original screen that any chance of feedback
>>arriving via that route must be /vanishingly/ small
That's intentional
:-)))
No.9 | | 514 bytes |
| 
* Philip TAYLR wrote:
>No, I think you missed the point : it quite clearly
>says 'character "1" not allowed in prolog.'. There
>/is/ no 'character "1"' in the source, so there must
>be a genuine bug in the validator code which is
>causing this 'character "1"' to be generated and
>inserted into the parsing stream
I see, then I am unable to reproduce this, there is no such message
in <%3Atext%2Fhtml%2C>.
No.10 | | 693 bytes |
| 
Fascinating : I see the difference, and
can't explain it as of now
Bjoern Hoehrmann wrote:
* Philip TAYLR wrote:
>>No, I think you missed the point : it quite clearly
>>says 'character "1" not allowed in prolog.'. There
>>/is/ no 'character "1"' in the source, so there must
>>be a genuine bug in the validator code which is
>>causing this 'character "1"' to be generated and
>>inserted into the parsing stream
I see, then I am unable to reproduce this, there is no such message
in <%3Atext%2Fhtml%2C>.
No.11 | | 534 bytes |
| 
No, I really can't explain it. Bjoern,
what results do you get if you go to
http://validator.w3.org:8001/
and click on the "Submit" button below
the "Validate by Direct Input" box with
nothing in the latter ?
Philip TAYLR wrote:
>
Fascinating : I see the difference, and
can't explain it as of now
>I see, then I am unable to reproduce this, there is no such message
>in <%3Atext%2Fhtml%2C>.
No.12 | | 595 bytes |
| 
* Philip TAYLR wrote:
>No, I really can't explain it. Bjoern,
>what results do you get if you go to
>
>http://validator.w3.org:8001/
>
>and click on the "Submit" button below
>the "Validate by Direct Input" box with
>nothing in the latter ?
Aah, indeed, with I am able
to reproduce this. I suspect this comes from CGI.pm which uses 1 to tell
that the parameter is set. So we need to replace
$File->{Bytes} = $q->param('fragment');
by something else Thanks for your report!
No.13 | | 786 bytes |
| 
Hi Philip,
15 Jul 2005, at 00:13, Philip TAYLR wrote:
Now what the Validator /claims/ to be about to do,
and what it actually does, are not necessarily
the same, so could I ask -- if you amend the wording
to your proposed form -- will the Validator in fact
"continue in SGML mode and
with a fallback DCTYPE similar to HTML
4.01 Transitional"
or just
"continue in SGML mode"
I'm sure you appreciate the significance of the question!
Indeed. My proposed wording is in sync with what the validator
actually does. In other words, there is no doctype change or fallback
in this case (or the parser would not have, for instance, complained
about the undeclared entities).
Hope this clarification suits you.
No.14 | | 1286 bytes |
| 
Jul 15, 2005, at 1:52, Bjoern Hoehrmann wrote:
* Philip TAYLR wrote:
>The little envelope icon (which only manifested
>itself when I composed this message) is so small
>on the original screen that any chance of feedback
>arriving via that route must be /vanishingly/ small
>
That's intentional
I agree it *was* intentional When the "error message feedback"
mailto: link was first added as a way to get users to participate to
the improvement of error message explanations, the signal/noise ratio
on the list plummeted with a flood of empty or awfully worded calls for
help, and making the icon (actually not an icon, but a character)
smaller was a "quick" fix to limit the damage while we would work on
better feedback channels than a mere "mailto:" link.
We have made this work, and the feedback improvements are actually one
of the major changes in 0.7.0. The "envelope" now links to e.g:
%3Atext%2Fhtml%2C;
errmsg_id=47#errormsg
with pre-filled search queries, instructions etc. It won't miraculously
kill all noisy feedback, but as a filtering mechanism, it's hopefully
much better. I suggest making the "envelope" normal sized again.
Any thought?
No.15 | | 162 bytes |
| 
olivier Thereaux wrote:
[snip]
>Hope this clarification suits you.
Yes indeed. Thank you,
** Phil.