XML and HTML
What is XML? The Extensible Markup Language is a simple text-based format for representing structured information: documents, data, configuration, books, transactions, invoices, and much more. It was derived from an older standard format called SGML (ISO 8879), in order to be more suitable for Web use.
What is HTML?
HTML is the language for describing the structure of Web pages. HTML gives authors the means to:
- Publish online documents with headings, text, tables, lists, photos, etc.
- Retrieve online information via hypertext links, at the click of a button.
- Design forms for conducting transactions with remote services, for use in searching for information, making reservations, ordering products, etc.
- Include spread-sheets, video clips, sound clips, and other applications directly in their documents.
With HTML, authors describe the structure of pages using markup. The elements of the language label pieces of content such as “paragraph,” “list,” “table,” and so on.
What is XML used for? It is one of the most widely-used formats for sharing structured information today: between programmes, between people, between computers and people, both locally and across networks. A short example:
<part number="1976">
<name>Windscreen Wiper</name>
<description>The Windscreen wiper
automatically removes rain
from your windscreen, if it
should happen to splash there.
It has a rubber <ref part="1977">blade</ref>
which can be ordered separately
if you need to replace it.
</description>
</part>
If you are already familiar with HTML, you can see that XML is very similar. However, the syntax rules of XML are strict: XML tools will not process files that contain errors, but instead will give you error messages so that you fix them. This means that almost all XML documents can be processed reliably by computer softwards. The main differences from HTML are:
All elements must be closed or marked as empty.
Empty elements can be closed as normal, <happiness></happiness> or you can use a special short-form, <happiness /> instead.
In HTML, you only need to quote an attribute vale under certain circumstances (it contains a space, or a character not allowed in a name), but the rules are hard to remember. In XML, arribute values must always be quoted:
<happiness type="joy" />
In HTML there is a built-in set of element names (along with their attributes). In XML, there are no build=in names (although names starting with xml have special meanings).
In HTML, there is a list of some built-in character names like &eac
ute; for é but XML does not have this. In XML, there are only five built-in character entities: &It;, >:, &, " and ' for <, >, &, " and ' respectively. You can define your own entities in a Document Type Definition, or you can use any Unicode character.
In HTML, there are also numeric character references, such as & for &. You can refer to any Unicode character, but the number is decimal, whereas in the Unicode tables the number is usually in hexadecimal. XML also allows hexadecimal references: & for example.
XML has a number of advantages over many other formats. For any particular scenario, you might be able to come up with a better format, but then you would have to include costs of converting and processing your format, and of training, and of the XML-specific editing and searching tool that are now very widely available. Some of the advantages of XML include:
Redundancy; XML markup is very verbose. For example, every end tag must be supplied, such as </description> in the example. This lets the computer catch common errors such as incorrect nesting.
Self-describing; The readability of XML (it is a text-based format) and the presence of element and attribute names in XML means that people looking at an XML document can often get a head start on understanding the format.
Network effect and the XML Promise; Any XML document can be read and processed by any XML tool whatsoever. Of course, some XML tools might want specific XML markup, but the XML format itself can be read by any XML parser: you can't say, this XML document is only to be processed by such-and-such a tool. This means that every new XML document increases the calue of every other XML document, and of every XML tool, and every new XML tool increases the value of every XML document and hence of every other tool. Today, XML is the most widely-used format of its kind any
where in the world.
Examples:
XML is very widely used today. It is the basis of a great many standards such as the Universal Business Language (UBL); of Universal Plug and Play (UPnP) used for home electronics; word processing formats such as ODF and OOXML; graphics formats such as SVG; it is used for communication with XMLRPC and Web Services, it is supported directly by computer programming languages and databases, from giant servers all the way down to mobile telephones.
If you double-click an icon on your computer desktop (the icon may well have been drawn with SVG), chances are that an XML message is sent from one component of the desktop to another. If you take your car to be repaired, the engine's computer sends XML to the mechanic's diagnostic systems. It is the age of XML: it is everywhere.
CSS
What is CSS?
CSS is the language for describing the presentation of Web pages, including colors, layout, and fonts. It allows one to adapt the presentation to different types of devices, such as large screens, small screens, or printers. CSS is independent of HTML and can be used with any XML-based markup language. The separation of HTML from CSS makes it easier to maintain sites, share style sheets across pages, and tailor pages to different environments. This is referred to as the
separation of structure (or: content) from presentation.
Examples:
The following very simple example of a portion of an HTML document illusrates how to create a link within
a paragraph. When rendered on the screen (or by a speech synthesiser), the link text will be "final report"; when somebody activates the link, the browser will retrieve the resource identified by "http://www.example.com/report":
<p class="moreinfo">For more information see the
<a href="http://www.example.com/report">final report</a>.</p>
The class attribute on the paragraph's start tag ("p>") can be used, among other thing, to add style. For instance, to italicise the text of all paragraphs with a class of "moreinfo," one could write, in CSS:
p.moreinfo { font-style: italic }
By placing that rule in a seperate file, the style may be shared by any number of HTML documents.
XHTML
What is XHTML? XHTML is a variant of HTML that uses the syntax of XML, the Extensible Markup Language. XHTML has all the same elements (for paragraphs, etc.) as the HTML variant, but the syntax is slightly different. Because XHTML is an XML application, you can use other XML tools with it (such as XSLT, a language for transforming XML content).