Krista's Coding Corner

25.06.2012

Formats matter

I don’t know if you remember the time when the Internet was full of different kind of music/video-clips. It used to be real pain to find just the right program that could repeat the file you found.

Nowadays it isn’t that bad but you may still bump into these situations: somewhere you have to have Flash, somewhere Silverlight, others demand QuickTime and so on… And in the end you will still just see / hear some clip that has no difference what so ever over what program is running it. The quality isn’t (most of the time) defined by what program you use. Music stays as music and video is still video.

The reason why you have to use different programs is because different programs are able to use different file formats and different encodings. (I’m actually starting to get a headache from wondering the difference between file format and encoding system, and how they are related…. So, I’ll leave them out until I’ll figure out this problem.)

I think we could say that the encodings and file formats are like different languages. Mostly you can say the same things with English, Portuguese, Hindi… Maybe not totally in same manner but the result is still quite the same. But if you don’t know the language, you won’t comprehend what is been said even if the content wouldn't be hard to understand. Same applies with computers and programs: you have to speak the same language, in this case use the same encoding system / file format, to be able to get the information correctly. Maybe if the languages are close to each other, then you may understand something but not everything. Getting some of the info will help you but it's never nice no notice that someone said "don't do it" when you heard "do it".

Why we then have many different kind of file formats or encoding systems? Because we have many different needs: we want to save videos, text, images… Sure it would be a nice idea that same format and encoding would work with all but that isn’t practical. Now we can have optimal and small sized files that are suited to our needs. The possibility to be able to represent many different kind of information with same format takes space –a lot of it. The more complicated things get, the more there will be errors that are hard to notice and fix. And something is always forgotten to take into consideration.

Let's take examples from image formats:

JPEG, these files may have also have some other extension than .jpeg like .jpg because Windows didn’t support more than 3-letter sized suffixes, so people made up shorter versions. What is interesting in JPEG, is that it has compression that loses information. So, you get nicely smaller images but they may have encoding artifacts. They look like strange faults in the image, like this:

PNG, (portable network graphics), this format won’t lose information but then again it won’t make images so much smaller even though compression is used. JPEG compresses the image smaller and more simple visually but PNG just packs the file like it would be any zipped file. PNG doesn’t care that it is an image unlike JPEG (and that’s why JPEG is better making images smaller as it is created to do just that).

BMP, bitmap images, this format is known for being huge. It has no compression and it will store the image data pixel by pixel. The image will stay as it is, no faults, but the size isn’t small at all.

Then we have more program specific file-formats like PSD or XCF. Specially these two save much more than just the 2D-image. There are layers and history what has been done to the image. Mostly these files can only be opened with the specific program and they are by no means standards like PNG and JPEG are.

There is one file format I’d like to present to you:

CSV, (“comma separated values”), if you have an Excel-table, this is like its simple version. There are no columns, just commas telling us where column’s separators would be. If you handle data, you will probably encounter this one. This is just like pure text but the text has specified marks to tell us different things. So, don’t add commas into this as they will mess up you data.

Why I wanted to show you CSV? Because it probably will be quite close to a file format you will once create. You don’t need your own suffix to make a file format, it is enough that you make up rules (like using commas to separate values) your program knows. You can save your data in a .txt-format but the real format is inside this file. Maybe there is three “columns” separated with a star (*) and in the first “column” is saved a person’s name, in the second his address and in the third his phone number. And now you can save many peoples information into one file, in good order that is usable also later. This is your own personal standardization that makes life easier later.

In conclusion: use right file format. They can be really different and have many very unequal features. Some suit better to your case then others. Or if there is no good ready one, make your own. Especially, if this just means saving data in some specific order and not creating own suffix.

Remember that if you are doing anything big, it’s better not to invent the wheel again! We already have a massive amount of file formats and with some we just seem to be incapable to agree which one to use. That sucks big time as then programs either has to be able to cope with many formats (this takes programmers time because all this has to be coded into programs) or we need to use many programs (this takes users time as he has to figure what program to use, how to use it, and these programs use memory for no reason…).

At least I would prefer having just couple different formats and just couple different programs. It feels a bit silly to have at least four different text editors or music/video players…

blog comments powered by Disqus