Krista's Coding Corner


XML and structure

Hardly anyone has normally even heard of XML but actually it is a really common thing. Probably HTML rings the bell? HTML or especially XHTML is a variant of XML. Mostly HTML is just broken XML as it doesn’t have to have so strict rules to obey as XML. XHTML is more advanced type or version of HTML and we are slowly moving towards using only it (or at least we should). XHTML is more like proper XML and it isn’t broken in the same way as HTML can be.

XML, HTML and XHTML are markup languages. They should not be mixed with programming languages as they aren’t the same thing. This markup thing pretty much means that we can group similar elements together and give structure to information (+store the information in the same time). If we look at a webpage like mine, you can see that it is constructed by header(s), blocks of text (no matter what the text means), maybe some cursive text like quotations and sometimes pictures. All things that are similar (like blocks of text) are marked in the same way in the code. This gives us a structure and they tell if some sort of text is similar to another (not the meaning but the style!). Like if I have multiple headers, in the markup language they are put into same category and then it is easy and fast to change the style of all headers in same time. Just like you can change the style of all your headers’ in MS Word (2010) by changing their settings.

The similar groups are marked with tags. Tags are the <something> and </something>. The first one is opening tag: after that comes stuff that belongs under the tag. The second one is closing tag: it tells us that nothing after it belongs to the tag.

This post (and page) is written in HTML5 (version of HTML) and this version is in practice compatible with XHTML 1.2 (if written well! HTML5 has united HTML / XHTML). Note that this code is autogenerated, so it isn’t that pretty!

Let’s take an example of my post 29 “Am I stupid...”. As you see here, there are blocks both in the code and in the appearance:

In HTML we don’t always have to use the ending tag which breaks the code if it would be XML. Not using ending tag is like not ever telling anyone that you found the grand unified theory and others just use tons of money and time trying to find it... Even thou in theory HTML is ok with not having ending tags, it actually causes a lot of problems: people become lazy as HTML doesn’t demand you to put your tags smartly and even if the code works with one computer the other one might not be so lucky that it can read it. Most of the browsers also interpret HTML in their own way (-> pages might look different with different browsers), which can give us quite a headache.

The most important thing with XML is closing tags correctly. One can’t just close the tag when they want. It has to be done a bit like with closing Matryoshka dolls. You have to put smaller ones inside of the larger ones and smaller ones bottom don’t fit with larger’s upper side. Tags can be totally separated or they can be inside each other. But a tag is inside another it also has to end inside the other tag. Tags can’t be overlapping, just like Matryoshka dolls can’t overlap with each other.

It also could be though as a tree. Tags can be either next to each other or inside other tag(s):

HTML (and XHTML) differs with XML mostly because HTML has predefined tags. Like <b>jdsljl</b> creates text “jdsljl” with bold letters. In XML there are no predefined tags, so you have to invent them. XML is just a strange looking file format if some other program wouldn’t use it. But it is world widely accepted and used, so using it is really practical and easy as many platforms support it or even give you information in XML-format.

PS. There are many checkers in the internet in which you can find out whether you have written proper XML or not. If you want to amuse yourself, do similar checks with HTML and pages you find from the Internet. You’ll be surprised.

PPS. XML is to programming like geometry is to math: even it seems like the same stuff, it isn’t totally and you mostly need a lot of visualizing ;)

blog comments powered by Disqus