XML Syntax

The syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use.

Because of this, creating software that can read and manipulate XML is very easy to do.

An example XML document

XML documents use a self-describing and simple syntax.

<?xml version="1.0" encoding="ISO-8859-1"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.

The next line describes the root element of the document (like it was saying: "this document is a note"):

<note>

The next 4 lines describe 4 child elements of the root (to, from, heading, and body):

<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>

And finally the last line defines the end of the root element:

</note>

Can you detect from this example that the XML document contains a Note to Tove from Jani? Don't you agree that XML is pretty self-descriptive?

All XML elements must have a closing tag

With XML, it is illegal to omit the closing tag.

In HTML some elements do not have to have a closing tag. The following code is legal in HTML:

<p>This is a paragraph
<p>This is another paragraph

In XML all elements must have a closing tag like this:

<p>This is a paragraph</p>
<p>This is another paragraph</p>

Note: You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.

XML tags are case sensitive

Unlike HTML, XML tags are case sensitive.

With XML, the tag <Letter> is different from the tag <letter>.

Opening and closing tags must therefore be written with the same case:

<Message>This is incorrect</message>

<message>This is correct</message>

All XML elements must be properly nested

Improper nesting of tags makes no sense to XML.

In HTML some elements can be improperly nested within each other like this:

<b><i>This text is bold and italic</b></i>

In XML all elements must be properly nested within each other like this:

<b><i>This text is bold and italic</i></b>

All XML documents must have a root tag

The first tag in an XML document is the root tag.

All XML documents must contain a single tag pair to define the root element. All other elements must be nested within the root element.

All elements can have sub elements (children). Sub elements must be correctly nested within their parent element:

<root>
  <child>
    <subchild>.....</subchild>
  </child>
</root>

Attribute values must always be quoted

With XML, it is illegal to omit quotation marks around attribute values.

XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:

<?xml version="1.0" encoding="ISO-8859-1"?>
<note date=12/11/99>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

<?xml version="1.0" encoding="ISO-8859-1"?>
<note date="12/11/99">
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>

The error in the first document is that the date attribute in the note element is not quoted.

This is correct: date="12/11/99". This is incorrect: date=12/11/99.

With XML, White Space is Preserved

With XML, the white space in your document is not truncated.

This is unlike HTML. With HTML, a sentence like this: Hello my name is Tove, will be displayed like this: Hello my name is Tove, because HTML strips off the white space.

With XML, CR / LF is Converted to LF

With XML, a new line is always stored as LF.

Do you know what a typewriter is?. Well, a typewriter is a type of mechanical device they used in the previous century :-)

After you have typed one line of text on a typewriter, you have to manually return the printing carriage to the left margin position and manually feed the paper up one line.

In Windows applications, a new line in the text is normally stored as a pair of CR LF (carriage return, line feed) characters. In Unix applications, a new line is normally stored as a LF character. Some applications use only a CR character to store a new line.

Comments in XML

The syntax for writing comments in XML is similar to that of HTML.

There is Nothing Special about XML

There is nothing special about XML. It is just plain text with the addition of some XML tags enclosed in angle brackets.

Software that can handle plain text can also handle XML. In a simple text editor, the XML tags will be visible and will not be handled specially.

In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application.