copyright notice
link to the published version in IEEE Computer, September, 1995
accesses since April 8, 1996
HTML CLIENT COMPLIANCE AND THE WORLD WIDE
WEB TEST PATTERN
Hal Berghel
Computer Science, University of Arkansas
hlb -at- berghel dot net
http://berghel.net
[figure 1]
[figure 2]
[figure 3]
[figure 4]
[figure 5]
Introduction
There is little doubt that the hottest part of the Internet right
now is the World Wide Web. Since its inception in 1990 the Web
has proven itself as the unifying environment for the digital
resources of the Internet. By all measures, it is enormously
successful. Consider that in just a few years the Web has come
to be the leading Internet resource, providing 21.4% of the total
packet count and 26.3% of the total byte count on the NSF
Backbone. This compares with 14.0% and 21.5% for ftp, 8.1% and
8.6% for nntp, 7.5% and 2.5% for telnet, and 1.5% and 1.8% for
Gopher [1] . This is even more remarkable given that the Web
really didn't take off until 1992 when the first
navigator/browsers became available. There is no question at this
point that the World Wide Web has evolved to the point where it
has become an indispensable resource to the networking community.
THE WEB'S PROTOCOLS
As with other Internet services, the business part of the Web is
a set of client/server protocols. The first protocol, HyperText
Transfer Protocol (HTTP), provides a uniform handshaking and
format protocol for client/server communication. The client
establishes a connection with the server, makes a request,
receives a response, closes the connection and takes action. In
the simplest of cases a set of files of varied media are
requested from the server to be displayed by the client-side
navigator/browser.
The second protocol, HyperText Markup Language (HTML) [2],
defines the internal structure of the Web's "documents". It
accomplishes this through a primitive tagging convention which
identifies contained or referenced resources. For example, a
sensitive (clickable) document anchor which points to a uniform
resource locator (URL) would be couched within the tag pairs "<A
HREF=....>" and "</A> an image would be identified by the tag
<IMG SRC="....">, and so forth. While unsophisticated, it works
- at least for the most part.
The difficulty lies in the inconsistency with which Web client
developers comply with the emerging standards. This
inconsistency translates into headaches for the end user. The
HTML protocol has evolved in stages, or levels, over the past
three years, and it is in this evolution that the discomfort is
to be found. The compliance levels are specified by the World
Wide Web Consortium [3], but developers do not follow the
prescription with a consistent degree of fervor.
HTML level 0 provided specifications for basic HTML structure.
Included in level 0 were support for hypertext links, meager
format control and limited text enhancements. Level 1 defined
extensions for basic image handling, limited text enhancement and
relative resource addressing . Level 2 included specifications
for forms along with incremental gains in the other areas defined
for levels 0 and 1. Level 3 will provide extensions for tables,
a LaTeX-like, ASCII-notation standard for mathematical formulas,
and features for additional multimedia support. That comes to
four compliance levels in just under three years.
To make things worse, Web client conformance is usually discussed
in the context of HTML versions. The HTML version 1 convention
includes levels 0 and 1 standards. HTML versions 2 and 3 include
levels 0-2 and 0-3, respectively. However, the HTML version
numbers are really only discussed in the abstract, for the
typical Web client makes no claims of compatibility - they
typically add as many features as they feel they can manage
before a new release, and let it go at that. Even if the user
understood what was involved in these compliance issues, there
would be no way to relate it to a particular product. But it
doesn't end there.
There are also non-standard extensions which are emerging in
parallel with the orthodox versions. This, together with the
sometimes conflicting interests of the commercial vs. not-for-
profit developers, is the battlefield of a technology skirmish
(cf. [4]). In general, the non-standard extensions apply to the
body of HTML documents and are associated with a particular Web
client, Netscape. Extensions dealing with image alignment and
re-sizing, box graphics and greater control over typesize and
font are commonly used "Netscape" extensions.
We will ignore for the moment the problems of the feature-
imbalance for the same product across multiple platforms, and
implementation bugs, as they relate to the lack of client
navigator/browser uniformity.
HTML COMPLIANCE: Evolution or Revolution
So quite from an orderly evolution, the current state of HTML
compliance also suggests a degree of revolution. This is the
cause of most of the discomfort on the user's side of the Web at
the moment. From the user's point of view, this lack of
uniformity surfaces in improperly rendered media, incorrect
display formatting, forms which aren't seamlessly linked to their
PERL scripts, and so forth.
Figure 1. Anti-Netscape
crusade. This particular navigator/browser, Web Explorer, does not support many
of the Netscape extensions. If it did, this page would be virtually unreadable
- which is the author's intention. The effect is most pronounced when viewing
this page with side-by-side navigator/browsers.
URL=http://www.brandi.org/ralph/netscape.html To illustrate the scope of
the problem, of the eight primary navigator/browser clients which we use in our
lab, only two fully comply with all HTML level 0 specifications. While the occasional
deficiencies (e.g., the rendering of menu, directory and unordered list element
tags) are not earthshaking, they can be irritating. This problem gets worse as
we escalate HTML levels, until we reach a free-for-all at level 3. Enter into
the mix the fairly widespread acceptance of a few of the Netscape extensions,
and one produces some real confusion over standards and some hard-to-read Web
documents.
This conflict over standards has even become politicized over the
net. At this writing there are actually "digital campaigns" for
and against Netscape extensions (see Figures 1 and 2). While
little of any enduring value will likely follow from this
activity, that fact that it takes place suggests that there are
some important issues which underlie it.
Figure 2. An imaginative attempt to highlight the
potential of Netscape extensions. Be forewarned that non-
Netscape clients may behave strangely.
URL=http://thule.mt.cs.cmu.edu:8001/tools/nutscape/
THE WORLD WIDE WEB TEST PATTERN
The HTML compliance issues will not be resolved anytime soon -
anarchy is always hard to orchestrate. Web clients will come and
go. Within a few years, the descendants of those which survive
will be eventually be bundled with operating systems or Internet
connectivity packages, or be seamlessly integrated into the
desktop suites. Perhaps by then we will have de facto if not de
jure standards in place. But between now and then we have
information to process and many of our Web resources are
presented to us in disarray.
Figure 3.
Enter the World Wide Web Test Pattern. This Web site was
conceived as a general-purpose test bench for users and
developers to check for HTML compliance. While still under
construction, it already includes a standard suite of tests for
text, audio, graphics, meta links, animations, forms and tables.
The URL is http://www.uark.edu/~wrg/.
Figures 3 and 4 illustrate how the Web Test Pattern may be used.
Observe that there is a tiled background to the homepage which is
rendered correctly by Netscape version 1.2.b2 (Figure 3) but not
rendered at all by NCSA Mosaic version 2.0.0b4 (Figure 4). Tiled
background is an element of the proposed HTML 3 specifications.
Figure 4.
The subtle change in Figure 5 indicates that there can be
gradations of compliance. In this case, not only is the
background missing, but the superimposed image is not properly
centered. winWeb 1.1 B1.2 is clearly not up to the challenge.
Some of the tests, as those above, are passive. The user merely
loads the test document and views the result. In other cases,
the tests require direct user involvement. Audio files provide a
case in point for audio files are never in-line, even though
their players may be integrated into the client. Most modern
clients include user-configurable launch pads, so over time the
importance of the distinction between integrated and spawnable
perusers will vanish.
Figure 5.
Currently, cybermedia tests exist for the Netscape extensions
server push and client pull, as well as MPEG, AVI and Quicktime
animations. It is hoped that the entire HTML level 3 suite will
be operational by the time that this article appears in print.
As it develops, the Web Test Pattern will attempt to include as
rich a variety of media as is to be found on the Web, thereby
enabling both users and developers to test for compliance with
HTML levels.
CONCLUSION
The Web Test Pattern is available for use by both Web users and
developers for monitoring the degree of HTML compliance of Web
clients. A current investigation is being conducted into the
viability of reducing the multiplicity of tests and providing a
standardized report.
ACKNOWLEDGMENTS: The World Wide Web Test Pattern was produced by
the University of Arkansas Web Resources Group which includes the
author, Jon Ashley, Troy Cash, Peter Laws and John Wiggins. We
thank Ron Vetter for encouraging us to write this article for
IEEE Computer.
REFERENCES:
[1] NSFNET Backbone Traffic Distribution Statistics, April,
19995. http://www.cc.gatech.edu/gvu/stats/NSF/merit.html.
[2] World Wide Web Consortium: HyperText Markup Language,
http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html.
[3] World Wide Web Consortium. http://w3.org/.
[4] Berghel, H., "OS/2, UNIX, Windows, and the Mosaic Wars," OS/2
Magazine, May, 1995, pp. 26-35.