copyright notice


link to the published version in IEEE Computer, September, 1995


accesses since April 8, 1996

HTML CLIENT COMPLIANCE AND THE WORLD WIDE
WEB TEST PATTERN

Hal Berghel
Computer Science, University of Arkansas
hlb -at- berghel dot net
http://berghel.net



[figure 1] [figure 2] [figure 3] [figure 4] [figure 5]



Introduction

There is little doubt that the hottest part of the Internet right now is the World Wide Web. Since its inception in 1990 the Web has proven itself as the unifying environment for the digital resources of the Internet. By all measures, it is enormously successful. Consider that in just a few years the Web has come to be the leading Internet resource, providing 21.4% of the total packet count and 26.3% of the total byte count on the NSF Backbone. This compares with 14.0% and 21.5% for ftp, 8.1% and 8.6% for nntp, 7.5% and 2.5% for telnet, and 1.5% and 1.8% for Gopher [1] . This is even more remarkable given that the Web really didn't take off until 1992 when the first navigator/browsers became available. There is no question at this point that the World Wide Web has evolved to the point where it has become an indispensable resource to the networking community.


THE WEB'S PROTOCOLS

As with other Internet services, the business part of the Web is a set of client/server protocols. The first protocol, HyperText Transfer Protocol (HTTP), provides a uniform handshaking and format protocol for client/server communication. The client establishes a connection with the server, makes a request, receives a response, closes the connection and takes action. In the simplest of cases a set of files of varied media are requested from the server to be displayed by the client-side navigator/browser.

The second protocol, HyperText Markup Language (HTML) [2], defines the internal structure of the Web's "documents". It accomplishes this through a primitive tagging convention which identifies contained or referenced resources. For example, a sensitive (clickable) document anchor which points to a uniform resource locator (URL) would be couched within the tag pairs "<A HREF=....>" and "</A> an image would be identified by the tag <IMG SRC="....">, and so forth. While unsophisticated, it works - at least for the most part.

The difficulty lies in the inconsistency with which Web client developers comply with the emerging standards. This inconsistency translates into headaches for the end user. The HTML protocol has evolved in stages, or levels, over the past three years, and it is in this evolution that the discomfort is to be found. The compliance levels are specified by the World Wide Web Consortium [3], but developers do not follow the prescription with a consistent degree of fervor.

HTML level 0 provided specifications for basic HTML structure. Included in level 0 were support for hypertext links, meager format control and limited text enhancements. Level 1 defined extensions for basic image handling, limited text enhancement and relative resource addressing . Level 2 included specifications for forms along with incremental gains in the other areas defined for levels 0 and 1. Level 3 will provide extensions for tables, a LaTeX-like, ASCII-notation standard for mathematical formulas, and features for additional multimedia support. That comes to four compliance levels in just under three years.

To make things worse, Web client conformance is usually discussed in the context of HTML versions. The HTML version 1 convention includes levels 0 and 1 standards. HTML versions 2 and 3 include levels 0-2 and 0-3, respectively. However, the HTML version numbers are really only discussed in the abstract, for the typical Web client makes no claims of compatibility - they typically add as many features as they feel they can manage before a new release, and let it go at that. Even if the user understood what was involved in these compliance issues, there would be no way to relate it to a particular product. But it doesn't end there.

There are also non-standard extensions which are emerging in parallel with the orthodox versions. This, together with the sometimes conflicting interests of the commercial vs. not-for- profit developers, is the battlefield of a technology skirmish (cf. [4]). In general, the non-standard extensions apply to the body of HTML documents and are associated with a particular Web client, Netscape. Extensions dealing with image alignment and re-sizing, box graphics and greater control over typesize and font are commonly used "Netscape" extensions.

We will ignore for the moment the problems of the feature- imbalance for the same product across multiple platforms, and implementation bugs, as they relate to the lack of client navigator/browser uniformity.



HTML COMPLIANCE: Evolution or Revolution

So quite from an orderly evolution, the current state of HTML compliance also suggests a degree of revolution. This is the cause of most of the discomfort on the user's side of the Web at the moment. From the user's point of view, this lack of uniformity surfaces in improperly rendered media, incorrect display formatting, forms which aren't seamlessly linked to their PERL scripts, and so forth.


Figure 1. Anti-Netscape crusade. This particular navigator/browser, Web Explorer, does not support many of the Netscape extensions. If it did, this page would be virtually unreadable - which is the author's intention. The effect is most pronounced when viewing this page with side-by-side navigator/browsers.
URL=http://www.brandi.org/ralph/netscape.html
To illustrate the scope of the problem, of the eight primary navigator/browser clients which we use in our lab, only two fully comply with all HTML level 0 specifications. While the occasional deficiencies (e.g., the rendering of menu, directory and unordered list element tags) are not earthshaking, they can be irritating. This problem gets worse as we escalate HTML levels, until we reach a free-for-all at level 3. Enter into the mix the fairly widespread acceptance of a few of the Netscape extensions, and one produces some real confusion over standards and some hard-to-read Web documents.

This conflict over standards has even become politicized over the net. At this writing there are actually "digital campaigns" for and against Netscape extensions (see Figures 1 and 2). While little of any enduring value will likely follow from this activity, that fact that it takes place suggests that there are some important issues which underlie it.


Figure 2. An imaginative attempt to highlight the potential of Netscape extensions. Be forewarned that non- Netscape clients may behave strangely.
URL=http://thule.mt.cs.cmu.edu:8001/tools/nutscape/




THE WORLD WIDE WEB TEST PATTERN

The HTML compliance issues will not be resolved anytime soon - anarchy is always hard to orchestrate. Web clients will come and go. Within a few years, the descendants of those which survive will be eventually be bundled with operating systems or Internet connectivity packages, or be seamlessly integrated into the desktop suites. Perhaps by then we will have de facto if not de jure standards in place. But between now and then we have information to process and many of our Web resources are presented to us in disarray.


Figure 3.
Enter the World Wide Web Test Pattern. This Web site was conceived as a general-purpose test bench for users and developers to check for HTML compliance. While still under construction, it already includes a standard suite of tests for text, audio, graphics, meta links, animations, forms and tables. The URL is http://www.uark.edu/~wrg/.

Figures 3 and 4 illustrate how the Web Test Pattern may be used. Observe that there is a tiled background to the homepage which is rendered correctly by Netscape version 1.2.b2 (Figure 3) but not rendered at all by NCSA Mosaic version 2.0.0b4 (Figure 4). Tiled background is an element of the proposed HTML 3 specifications.


Figure 4.
The subtle change in Figure 5 indicates that there can be gradations of compliance. In this case, not only is the background missing, but the superimposed image is not properly centered. winWeb 1.1 B1.2 is clearly not up to the challenge.

Some of the tests, as those above, are passive. The user merely loads the test document and views the result. In other cases, the tests require direct user involvement. Audio files provide a case in point for audio files are never in-line, even though their players may be integrated into the client. Most modern clients include user-configurable launch pads, so over time the importance of the distinction between integrated and spawnable perusers will vanish.


Figure 5.
Currently, cybermedia tests exist for the Netscape extensions server push and client pull, as well as MPEG, AVI and Quicktime animations. It is hoped that the entire HTML level 3 suite will be operational by the time that this article appears in print.

As it develops, the Web Test Pattern will attempt to include as rich a variety of media as is to be found on the Web, thereby enabling both users and developers to test for compliance with HTML levels.



CONCLUSION

The Web Test Pattern is available for use by both Web users and developers for monitoring the degree of HTML compliance of Web clients. A current investigation is being conducted into the viability of reducing the multiplicity of tests and providing a standardized report.






ACKNOWLEDGMENTS: The World Wide Web Test Pattern was produced by the University of Arkansas Web Resources Group which includes the author, Jon Ashley, Troy Cash, Peter Laws and John Wiggins. We thank Ron Vetter for encouraging us to write this article for IEEE Computer.

REFERENCES:

[1] NSFNET Backbone Traffic Distribution Statistics, April, 19995. http://www.cc.gatech.edu/gvu/stats/NSF/merit.html.

[2] World Wide Web Consortium: HyperText Markup Language, http://www.w3.org/hypertext/WWW/MarkUp/MarkUp.html.

[3] World Wide Web Consortium. http://w3.org/.

[4] Berghel, H., "OS/2, UNIX, Windows, and the Mosaic Wars," OS/2 Magazine, May, 1995, pp. 26-35.