copyright notice
link to published version: Communications of the ACM, February, 1996


accesses since December, 1995



Hal Berghel's Digital Village....

HTML Compliance and the Return of the Test Pattern





ABSTRACT

Test patterns are with us again, and this time they're on the Web.

In their last incarnation, test patterns were associated with vacuum tube televisions. Short-lived and unstable components, noisy broadcast signals and clumsy, manual adjustments made test patterns a necessary ingredient for successful television repair. There was no other way to align the images correctly or provide the proper contrast and brightness.

This time, test patterns are being used to determine the level of HTML compliance of Web client navigator/browsers.

HYPERTEXT MARKUP LANGUAGE

The Web is a pair of "killer protocols." The first of which, HyperText Transfer Protocol, provides a standard for communication between client and server computers. In its most basic form, it defines the procedures necessary for the client to establish a connection with the server, make its request, receive a response and close the connection. HTTP is conceptually simple, if cumbersome to implement correctly for a broad range of computing environments.

The second protocol is HyperText Markup Language, or HTML. HTML defines the internal structure of the Web's documents by means of a primitive tagging convention which associates a "meaning" or "function" with each document element.

To illustrate, Figure 1 is a screenshot of the ACM's home page on its Web server at http:/WWW.ACM.ORG/. The boxed insets in the main graphic are actually "sensitized" parts of the overall image, each one of which is connected to another ACM Web resource by means of a uniform resource locator (URL) . We have used the Web navigator/browser to insert the HTML source document beneath this sensitive image.

Figure 1. Typical Homepage as Rendered by a Web Browser with Corresponding HTML Script



The associations between the HTML tags and the document imagery is straightforward, if a bit verbose. Tags are marked with angled brackets which denote the beginning and end of document elements. For example, markers for pre-formatted text (pre) and document heading (head) appear in Figure 1. These are semantic or logical tags which define the function or contribution of the contained expression to the organization of the document.

In this case, the document "head" contains only a document title and the base address, the latter of which provides a relative address from which all successive internal anchors within the document will be resolved. The directory addresses in quotation marks which appear in the document body a few lines down will be interpreted as offsets from this base address on the ACM server.

The main image on the ACM home page is a file named "acmtfs_sbar.gif". There is also a corresponding "imagemap" file which divides the GIF file into sub-regions specified by the regions (X,Y] coordinates. This file is used by the server when it records the [X,Y] coordinates of the Web client's cursor when the mouse click occurred. Each destination link is associated with these mutually-exclusive geometrical regions which may be thought of as "super-imposed" over the main image. The details of this arrangement are actually in the document though do not appear in the screen shot.

WYSINWEOS

With a standard document formatting language in place, document preparation and dissemination should be straightforward. Not quite! The problem is not with HTML per se, but the way in which its standards (or lack thereof) are evolving.

The problem has to do with the different agendas held by the information providers and consumers, the Web client developers, and the two Web standards communities - the World Wide Web Consortium (W3C) (http://www.w3.org) and the Internet Engineering Task Force (IETF) (http://www.ietf.cnri.reston.va.us/home.html). At the moment synchronicity seems elusive.

From the standards side, there is the understandable desire to sustain a smooth, uniform evolution of purposeful and robust HTML conventions for the sake of consistency and uniformity. This translates into a steady stream of well-reasoned proposals and revisions, including the most mature proposed standard, revision 5 for HTML Version 2 (as of August 18, 1995). Before this proposal can become a standard, it must be further circulated as an IETF Request for Comments (RFC) document, so it is unlikely that it will be adopted any time soon.

Therein lies the rub. There is a growing impatience on the part of information providers and consumers to access and deliver newer and more far-reaching cybermedia. To these Web constituencies, three months is an eternity. To make matters worse, client-side developers are trying to position themselves in the marketplace as state-of-the-art service providers, while looking for opportunities to include innovative technology for competitive advantage. The consequence is a bit of Web anarchy amidst what has been called the "Mosaic War."

The problem may be explained with a brief HTML chronology. One way of looking at the evolution of HTML proposed standards since 1990 is in terms of stages - actually four levels and an extension. Level 0 provided specifications for basic HTML structure with support for hypertext links, meager format control and limited text enhancements. Level 1 defined extensions for basic image handling, limited text enhancement and relative resource addressing . Level 2 included specifications for forms along with incremental gains in the other areas defined for levels 0 and 1. Level 3 provided extensions for tables, a LaTeX- like, ASCII-notation standard for mathematical formulas, and features for additional multimedia support. In addition, there are the "Netscape extensions" which deal with a variety of features including image alignment and re-sizing, box graphics and greater control over type size and font. These levels may be somewhat imprecisely matched with HTML Versions in the following way: levels 0 and 1 and Version 1; levels 0-2 with Version 2; and 0-3 with Version 3. The Netscape extensions are independent of the W3C and IETF standards initiatives.

Despite the fact that the W3C HTML documents are clearly marked as drafts and the warning that "it is inappropriate to use Internet drafts as reference material or to cite them as other than "work in progress'," they are routinely used to varying degrees as working specifications by both developers and document creators. This results in a confusing mixture of accessible Web documents and a dramatic imbalance in performance from Web clients. We capture this present state in the acronym "wysinweos", pronounced wizinwus for "what you see isn't necessarily what exists on the server."

Figures 2a, 2b and 2c illustrate the problem. Though rendered differently, these are screen shots of the same Web HTML document.

Figure 2a. Web Document Rendered as Created



Figure 2b. Web Document Rendered without Background



Figure 2c. Web Document Rendered without Background and Image Centering



THE TEST PATTERN RETURNS

To attempt to introduce order amidst HTML disorder, a few Web experimentalists have created Web test sites to help end-users and developers determine the degree to which their client browsers support various HTML features in actual use - irrespective of their status as a standard. As one might suspect, the features of greatest interest deal with the categories of media which are the most variegated in cyberspace. This includes, but is not limited to, audio, graphics and animations, each in a variety of formats.

One rigorous test site is the "WWW Viewer Test Page" developed at Lawrence Livermore Laboratory (http://www- dsed.llnl.gov/documents/WWWtest.html/). This test site lists a variety of media formats ranging from "plain text" through "SGML documents", with special emphasis on recent formats used within the Unix community. This service is particularly useful in testing the robustness of the client's launchpad for spawn able perusers or "helper apps" as they are commonly called.

On an entirely different level, Lycos provides an imaginative test site for purely Netscape typographical extensions at http://agent2.lycos.com:8001/tools/nutscape/. What makes this site unusual is that it interactively interprets existing Web documents according to prescribed Netscape enhancements. To the extent that the browser will support it, this utility will render the source document as if it had used these enhancements.

Our own contribution is the Web Test Pattern (http://www.uark.edu/~wrg/) shown in Figures 2a through 2c. Our approach has been to provide a reasonable range of tests within a GUI organized by feature category and to emphasize the ability to render media internally (vs. through external perusers). As the figures show, a great deal of variation may still be found in even the most recent navigator/browser clients as they render even simple HTML documents. Casual use of the Test Pattern will typically reveal multiple deficiencies of clients. As a related data point, we note that many of the current clients either fail to support or improperly render such level 0 features as unordered lists markers and menu, DIR and teletype elements!

It should be noted that while we won't deal with them here, there are also test benches for the server side of the Web. Some of these may be reached through the W3C server via http://www.w3.org/hypertext/WWW/Test.

CONCLUSION

Many authors have referred to the Internet as anarchy that works. The same may be said of the Web. Anarchy is difficult to control, and such is the case with the evolution of de facto and de jure HTML standards. Regrettably, the limitations of the client browser are largely hidden from the user. In the absence of side-by-side comparisons, the user is unaware that the viewed document doesn't appear as it was created. Web test sites such as those mentioned above provide an effective means for Web users to determine the strengths and weaknesses of their clients.

There are several reasons to suggest that the problem with compliance will be with us for awhile. For one thing, within just the last year the Web has been transformed from a primarily Unix world to a mostly Microsoft Windows world. Similarly, in just a few years the dominant navigator/browser client has changed from NCSA Mosaic to Netscape. As the use of the Web moved away from the communities who originally developed it, the role of standards and the pacing of the Web's evolution changed dramatically and irrevocably.

Second, there are ever-increasing numbers of new media forms being developed. As the type and variety of cybermedia increases, the demands placed on both the server and client sides will also increase. Client side technology will be continuously stretched to accommodate these new media.

Third, there are now alternatives to the basic Web protocols. Hyper-G and Java are examples. Both of these protocols have their own navigator/browser clients in development, Harmony and Amadeus for Hyper-G and Hotjava. New protocols will continue to extend, re-shape or even supplant those of the Web in new and unforeseen ways. And each new protocol will make its own contribution to the confusion over standards.

Finally, there will be the continuous demand on richer and more variegated resources. The Web now operates at a very limited level of interactivity, largely through interactive forms. Future demands will stretch interactivity to it's limits. Virtual reality applications will require the addition of a participatory dimension to the successors of today's navigator/browsers. Current sensory I/O will also be widened to include force - and more.

So, it appears to us that the latest incarnation of the Test Pattern is likely to be with us for quite awhile.


For Further Reading:

Berghel, H. "Using the WWW Test Pattern to check HTML Client Compliance." IEEE Computer, 28:9, pp. 63-65. A description of our Web HTML test site which we called the Web Test Pattern. Also available as http://www.uark.edu/~wrg/ under "perspectives".

Berners-Lee, T. and D. Connolly. "Hypertext Markup Language - 2.0". (August, 1995) [ftp://ds.internic.net/internet- drafts/draft-ietf-html-spec-05.txt]. This is the "official", latest revision of the proposal for HTML Version 2.0 standards. Official in this case means that it was produced by the HTML Working Group of the Applications Area of the IETF.

December, J. and N. Randall. The World Wide Web Unleashed. Sams Publishing. Indianapolis. (1994) An excellent overview of Web resources.

Gilster, P. The Internet Navigator. John Wiley and Sons. New York. (1993). A good history of the Internet and the Web with solid, if somewhat dated, overview of Web resources.

Morris, M. HTML for Fun and Profit. SunSoft Press/Prentice Hall. Englewood Cliffs. (1995) A beginner's guide to HTML editing complete with basic tools on CDROM.

Pitkow, J., et al. "The GVU Center's WWW User Surveys", URL= http://www.cc.gatech.edu/gvu/user_surveys/User_Survey_Home.html. These reports present the results of the three GVU Web surveys taken from the late 1993 to the present. The comparisons between these three reports are very revealing of Web evolution.