Yonina Cooper and Hal Berghel
Department of Computer Science
University of Nevada at Las Vegas
The World Wide Web (or, popularly, "the Web")  was conceived by Tim Berners-Lee and his colleagues at CERN (the European Laboratory for Particle Physics) in 1989 as a shared information space which would support collaborative work. At that time, Berners-Lee defined the communications protocol-pair, Hypertext Transfer Protocol (HTTP) and Hypertext Markup Language (HTML), which forms the backbone of the Web. Berners-Lee ushered in the World Wide Web in 1990 with the first Web client navigator-browser, developed as a proof of concept, called the WorldWideWeb which "was the only way to see the web"  at that time. (Screen shots of that browser-editor can be seen at http://www.w3.org/People/Berners-Lee/WorldWideWeb.html.) Nicola Pellow wrote the first cross-platform browser which was released in 1991. By 1992, interest in the Web had grown sufficiently to produce four additional browsers - Erwise, Midas, and Viola for X Windows, and Samba for Macintosh.
Early in 1993, Marc Andreesen of the National Center for Supercomputer Application (NCSA) released the first version of Mosaic for the X Windows System which soon became the browser standard against which all others would be compared. 1993 also saw the release of a number of other browsers including Cello for Windows developed at the Cornell Law School and Lynx 2.0 developed at the University of Kansas . Lynx quickly became the preferred browser for non-graphics mode (or character mode) terminals, while the other Web clients shared the workstation client-side market. In 1994, Andreesen left NSCA to co-found Netscape whose browser was released late in 1994 and then proceeded to dominate the browser market. When Microsoft released their Windows 95 operating system in August, 1995, it included the web browser, Internet Explorer, which had conquered a third of the market by the fall of 1996. Today, Internet Explorer is the leading web browser having passed Netscape in 1999.
Despite its original design goal of supporting collaborative work, the Web has diverged into many highly variegated uses all evolving from the two protocols: HTML and HTTP. In this paper we examine the current state of HTML and related technologies. HTTP will be examined in a later paper.
Hypertext Markup Language (HTML)
Pioneering and independent visions led to the hypertext orientation of HTML. In 1945, Vannevar Bush  described a device which could create and follow links between documents on microfiche. In the 1960's, Douglas C.Englebart  prototyped an online system that, among other things, enabled a browsing environment enhanced with a new rapid cursor movement innovation called a "mouse" . In 1965, Ted Nelson  coined the term 'hypertext' in a presentation to the 20th National Conference of the Association for Computing Machinery.
From a technical perspective, HTML is a sequence of extensions to the original concept of Berners-Lee which was text-oriented. The international standard underlying HTML, the Standard Generalized Markup Language (SGML) , is based on the Generalized Markup Language (GML), developed at IBM in 1969 by the research team of Charles Goldfarb, Edward Mosher and Raymond Lorie. The markup described the structure of a document, not its appearance. The document structure is written in a Document Type Definition (DTD) that specifies a set of document elements and their relationships together with a set of tags with which to mark up the document. The SGML standard was adopted in the mid-1980's.
HTML started out as an SGML DTD. The tags were adapted for the distributed, hyperlinked environment described above. The first version of HTML of the early 1990's provided only basic structure with rudimentary graphics and hypertext. But by 1993, HTML standards were a moving target. There were two organizations overseeing Web and Internet standards, including HTML: the World Wide Web Consortium (W3C) and the Internet Engineering Task Force (IETF). (Berners-Lee currently serves as the director of the W3C (www.w3.org) which he founded in 1994.) However, Netscape had gone on its way to offering new features which were not endorsed by the W3C/IEFT, including some that were actually inconsistent with the purist SGML orientation intended by the designers of HTML. Under pressure to gain market share, this trend continued as navigator/browser developers attempted to add as many useful "extensions" to the HTML standard as could be practicably supported. This competition, initially called the "Mosaic Wars."  still remains, although with diminished impact, today.
HTML Version 2.0, proposed by the IETF (www.ietf.org) HTML Working Group, was a specification which roughly corresponded "to the capabilities of HTML in common use prior to 1994." Basically, this version added forms and lists. The IETF HTML Working Group closed in 1996. HTML+ and HTML 3.0 were HTML versions which were never standardized. HTML 2.0 was replaced by HTML 3.2 in January, 1997. Standards for HTML are now released as W3C Recommendations. HTML 3.2 aimed to "capture recommended practice as of early 1996" and included tables, applets, scripts, advanced Common Gateway Interface (CGI) programming, security, and text flow around graphics. HTML 4.01, a subversion of HTML 4.0, was released in 1999 and "supports a wider range of multimedia options, scripting languages, style sheets, better printing facilities, and documents that are more accessible to users with disabilities." Frames including inline frames, client-side image maps, advanced forms/tables, TTY and Braille support, compound documents with a hierarchy of alternate rendering strategies and internationalization are added features in HTML 4.01 as well as document formatting being achieved via cascading style sheets (CSS). And scripting capabilities have been added to most of the HTML elements. Specification of the document type such as
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
is now required. This allows validation of the document using the W3C's validator.w3.org which checks HTML documents for conformity to W3C Recommendations. Similarly, CSS documents can be validated via the W3C CSS Validation Service (http://jigsaw.w3.org/css-validator/). CSS2  became a Recommendation, May 12, 1998, and CSS3  is currently in the works. A quick Web search will identify numerous validators and checkers, whose services are available freely or commercially (e.g., http://www.w3.org/MarkUp/html-test/ from the World Wide Web Consortium).
Many innovations for HTML were invented by the browser vendors, particularly Internet Explorer and Netscape, as they vied for the market share and many of these became standards in subsequent HTML standards. HTML and the independent browser implementations, in many ways, evolved away from its nicely thought out roots. The original designers were careful to not confuse form with content. However HTML became a patchwork of ideas as it quickly evolved and as a result muddied the difference between form and content. The new kid on the block, called XML (discussed below), is set to reunite HTML with its SGML root. Yet Web users and Web developers will still be faced with browsers which do not or only partially implement the HTML standards. The introduction of the validators for HTML documents encourages the adherence to the standards by Web developers. As a complement to online validators, the World Wide Web Test Pattern (www.uark.edu/~wrg/) developed at the University of Arkansas in the mid 1990's provided a test bench for determining the level of HTML compliance of arbitrary browsers. The "WWW Viewer Test Page" developed at Lawrence Livermore Laboratory (www-eng.llnl.gov/documents/WWWtest.html) allows the testing of a variety of media formats.
A step toward reuniting HTML with its SGML root is the conversion to the Extensible Hypertext Markup Language (XHTML).
Extensible Hypertext Markup Language (XHTML)
XHTML is a reformulation of HTML 4.0 in XML, thus combining the strength of HTML 4 with the power of XML. XHTML is to replace HTML as the primary venue for describing Web content. The features of XHTML are richer, more robust and extensible than those of HTML. W3C seeks to create standards for providing these features on the ever increasing range of browser platforms, e.g. cell phones, televisions, cars, wallet-size wireless communicators, kiosks and desktops. Dave Raggett has created an open source utility, HTML Tidy , to assist Web developers in converting HTML documents to XHTML and in general "tidying" up sloppy HTML code and thereby rendering maintenance easier. Currently the tool is only available for UNIX platforms. As an aside, there are mountains of incorrect HTML code which is currently rendered by forgiving Web browsers. Such problems disappear with XHTML.
XHTML is a family of document types which are XML based and ultimately designed to be used in conjunction with XML based user agents. XHTML 1.0  has three document types: XHTML 1.0 STRICT, XHTML 1.0 TRANSITIONAL, and XHTML 1.0 FRAMESET reformulated from HTML 4. Each has its own DTD that sets out the rules and regulations for using HTML in a definitive, succinct manner. XHTML 1.1 was released in May, 2001; however support is not ubiquitous. XHTML 1.1 continues the evolution of separating presentation from content and in this sense is more restrictive than XHTML 1.0. For instance, XHTML 1.0 TRANSITIONAL or FRAMESET document types contained a number of presentational elements which are now relegated to being handled via style sheets or other mechanisms. For example, frames are in this category. Hence XHTML 1.1 can somewhat be considered XHTML 1.0 STRICT except for the removal of certain features as a result of the underlying strategy of providing markup which is rich in structural functionality but leaves presentation to style sheets. To understand this philosophy/strategy requires a study of the Extensible Markup Language (XML).
Extensible Markup Language (XML)
As previously noted, the Web was originally a publishing avenue for scientific documents. Today, it is a full-fledged medium, not only on equal footing with print and television but more importantly, an interactive medium. To accommodate the phenomenal popularity and growth of the Web, HTML was repeatedly extended, introducing new tags. The first version of HTML had only a dozen or so tags while the last version (HTML 4) has nearly a hundred "official" tags, not counting the non-standard, typography and format extensions.
HTML is currently supported by literally thousands of applications: browsers/navigators,
editors, e-mail software, spreadsheets, databases, contact managers, word processors,
and more. Even with the existing rich set of tags, there is a need for more
flexibility as specialized software seeks to utilize the basic Web infrastructure.
While on the one hand there is the demand for more tags, there is a conflicting
need to simplify in order to make Web use accessible to a broader range of computing
devices (e.g., PDAs, Japanese I-mode phones, European WAP phones, convergent
technologies, etc.) that may access pages with more markup than content.
XML development began in late 1996, and was completed in early 1998. Almost immediately, it was extended to applications domains in mathematics, science and medicine. New applications seem to pop up almost daily. (See http://www.oasis-open.org/cover/sgmlnew.html for up to date information on XML products and releases.) XML treads new territory only where it is appropriate. XML will not replace HTML in the near term but HTML will converge toward XML through the XHTML standard. Yet the International Standards Organization (ISO) has standardized HTML (ISO/IEC 15445) on the conviction that HTML will persist for another 25 years. As such, ISO expects W3C to remain responsible for HTML.
The philosophy behind XML was to answer the conflicting demands being made on HTML. The resolution was particularly simple and essentially two significant changes were made to HTML: XML has no predefined tags and XML is strict. The first is the eXtensible part; the author creates the tags needed for his/her application. Secondly, HTML was most forgiving in the area of syntax - great for lazy authors but taxing on browsers. According to some estimates, as much or more than 50% of the code in a browser handles errors or sloppiness on the part of the author. As a result browsers are growing in size and becoming slower which does not bode well for the owners of the handheld devices. Thus the decision for a strict syntax will facilitate the development of smaller, faster, lighter browsers.
XML does not describe how to render the data; it merely indicates the structure and content of the data which, it should be remembered, was the original objective of HTML qua SGML offspring. Thus XML is ideal for document publishing since it is independent of format and delivery medium. As the figure below suggests, documents that are created and maintained in XML may easily be transformed into formats that are optimal for the manner or method of dissemination, be that "smart" telephony or fax, the Web or printing. While this is appealing, the operative is "automatically," but that is being addressed. If the tool or platform is not available today, it will be tomorrow. See W3C's website (www.w3.org) for a list of XML processors.
Traditionally the web page was a static HTML document offering minimal interactivity and relying heavily on an overloaded server and CGI scripts. XML is poised to offer web applications as opposed to just web pages. The ability of web sites to do so much more than deliver text, graphics, even multimedia (and without requiring massive amounts of Internet traffic) is the momentum behind the ascendancy of XML. In a keynote address Adam Bosworth (then with Microsoft) demonstrated two such web applications: an art auction which enabled a user to view and bid on pieces of art as well as watch the bidding process with minimal round-trips to the server; and a frequent flyer awards program allowing a user to review their frequent flyer miles, determine their available awards as well as plan future flights in the context of building frequent flyer awards for the future. 
Numerous tools are available for parsing and verifying XML documents. XML processors are typically implemented as Java applications but regardless of the implementation means, most still do not conform completely to the standard. The median seems to be about 80% conformance. And because of varying and inconsistent support for XML documents by the current popular browsers, many find it more convenient to ignore the browser and apply style sheets on the Web servers to generate HTML. The usual solution is to use the Extensible Stylesheet Language (XSL) to produce HTML which is then rendered by a current browser, or even a former generation browser . A subset of XSL, XSL Transformations (XSLT) , is the standard for transforming data from one XML document to another XML document, usually XHTML. Numerous tools exist for converting many document types using XSL. See http://www.w3.org/Style/XSL. In the most general form, an XML document with its XSLT style sheet are processed via an XSLT processor producing XSL-Formatting Objects(XSL-FO) and an XSL-FO processor is then used. See http://dmoz.org/Computers/Data_Formats/Markup_Languages/XML/Style_Sheets/Implementations/ for a listing of available processors.
Lastly, XML is a low-level syntax for representing data intended for supporting a wide variety of applications. Implemented applications are the Mathematical Markup Language (MathML) , the Chemical Markup Language (CML) , the Synchronized Multimedia Integration Language (SMIL) , the Scalable Vector Graphics (SVG)  format, the Resource Description Framework (RDF)  for describing meta-data, etc. RDF is touted to be the venue for the Semantic Web  in much the way HTML served the original Web. Implemented RDF applications include the Platform for Internet Content Selection (PICS) , the Platform for Privacy Preferences (P3P) , among others.
The latest addition to DHTML is the Document Object Module (DOM)  that introduces a new concept for event detection and the subsequent calling of event handlers. The later versions of both Microsoft and Netscape browsers both support DOM but the support differs dramatically.
The past decade has seen great strides in both the development of new Web technologies, particularly in the area of hypertext markup. Although many technologies have been added, more are being added, almost daily it seems, to advance the Web to the point of being fully interactive, participatory and immersive. Advances during the last decade in the programming technologies used for the Web will be examined in Part 2 of this series.
 Adler, Sharon, et.al., eds., "Extensible Stylesheet Language(XSL) Version 1.0," REC-xsl-20011015, October 15, 2001. http://www.w3.org/TR/xsl/.
 Altheim, Murray and Shane McCarron, "XHTML 1.1 - Module-based XHTML," REC-xhtml11-20010531, May 31, 2001, http://www.w3.org/TR/xhtml11/
 Ayars, Jeff, et.al.,eds., "Synchronized Multimedia Integration Language(SMIL 2.0)," REC-smil20-20010807, August 7, 2001. http://www.w3.org/TR/smil20/
 Berghel, Hal, "Using the WWW Test Pattern to check HTML Client Compliance," IEEE Computer, 28:9, pp.63-65.
 Berghel, H.: "The Client Side of the Web," Communications of the ACM, 39:1 (1996), pp. 30-40. See also, revised version of the same name in Kent, A.: "Encyclopedia of Library and Information Science," Marcel Dekker, 64:27, pp 39-51. (1999)
 Berghel, Hal and Yonina Cooper, "The World Wide Web," Encyclopedia of 20th-Century Technology, Fitzroy Dearborn, London [in press].
 Berners-Lee, Tim,Weaving the Web, Harper Collins, San Francisco, 1999.
 Berners-Lee, Tim, "The WorldWideWeb browser", http://www.w3.org/People/Berners-Lee/WorldWideWeb.html
 Berners-Lee, T. and D. Connally, "Hypertext Markup Language - 2.0", RFC, November 1995. Available at http://www.w3.org/MarkUp/html-spec/ .
 Berners-Lee, Tim, James Hendler, and Ora Lassila, "The Semantic Web," Scientific American, Vol. 280, No. 5(May 1999), pp. 24-30.
 Bos, Bert, Hakon Wium Lie, Chris Lilley, and Ian Jacobs, editors, "Cascading Style Sheets, level 2: CSS2 Specification," REC-CSS2, May 12, 1998. Available at http://www.w3.org/TR/REC-CSS2/.
 Bosak, Jon and Tim Bray, "XML and the Second-Generation Web," Scientific American, Vol. 280, No. 5(May 1999), pp. 89-93.
 Vannevar Bush, "As we may think," Atlantic Monthly, July, 1945. Available at http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm.
 Carlisle, et.al., eds., "Mathematical Markup Language(MathML) Version 2.0," REC-MathML2-20010221, February 21, 2001. http://www.w3.org/TR/MathML2/.
 Clark, James, ed., "XSL Transformations(XLST) Version 1.0," REC-xslt-19991116, November 16, 1999. http://www.w3.org/TR/xslt.
 D. C. Englebart, "Augmenting human intellect: a conceptual framework," Stanford Research Institute, Menlo Park, CA, Summary Report, Contract AF 49(638) 1024, SRI Project 3578, October, 1962.
 Douglas C. Englebart, "The Augmented Knowledge Workshop," in K. Anderson, The History of Personal Workstations: Proceedings of the ACM Conference, 1986, ACM Press, New York, pp. 73-83.
 William K. English, Douglas C. Englebart, and Melvyn L. Berman, IEEE Transactions on Human Factors in Electronics, March 1967, Vol. HFE-8, No. 1, pp. 5-15.
 Ferraiolo, Jon, ed., "Scalable Vector Graphics (SVG) 1.0 Specification," REC-SVG-2001-0904, September 4, 2001. http://www.w3.org/TR/SVG/
 Charles F. Goldfarb, Edward J. Mosher, and Theodore I. Petersen, "An Online System for Integrated Text Processing," Proceedings of the American Society for Information Science, 7, 1970, pp. 147-150.
 Marchiori, Massimo, "The Platform for Privacy Preferences 1.0 (P3P1.0) Specification," REC-P3P-20020414, April 16, 2002. http://www.w3.org/ TR/P3P/
 Eric A. Meyer and Bert Bos, editors, "Introduction to CSS3: W3C Working Draft," WD-css3-roadmap-20010523, May 23, 2001. Available at http://www.w3.org/TR/2001/WD-css3-roadmap-20010523/.
 Ted Nelson, A File Structure for the Complex, the Changing, and the Indeterminate, 20th National Conference, New York, Association of Computing Machinery, 1965.
 Ted Nelson, "The heart of connection: hypermedia unified by inclusion,"
Communications of the ACM, 38:8, pp. 31-33. 1995
 Raggett, Dave. "HTML 3.2 Reference Specification ," REC-html32, January, 1997. Available at http://www.w3.org/TR/REC-html32
 Raggett, Dave, Arnaud Le Hors, and Ian Jacobs, editors, "HTML 4.01 Specification," REC-html1401, December 24, 1999. Available at http://www.w3.org/TR/html401/
 "XHTML 1.0: The Extensible HyperText Markup Language," REC-xhtml1-20000126, January 26, 2000, http://www.w3.org/TR/xhtml/