JUNE 2008 IN

Unicode Takes Over the Internet

In the last three years, Unicode has gone from being a curiosity on the web to being the dominant character set of web pages. Basically what this means is that any character you might see printed in a magazine can also be displayed on a web page.

The significance of Unicode — and specifically its most standard encoding, UTF-8 — is as big as the significance of the web itself. With UTF-8, you can create a document, such as a web page, and show it to anyone in the world. They’ll see the same characters you wrote, no matter what language you’re writing in or what kind of computer you’re using.

In a blog post, a Google engineer shows a graph that lets you see how rapidly Unicode has caught on. It passed 5 percent around the end of 2005 and doubled in each of the years since.

It used to be that people who wrote web pages in different languages, whether Norwegian, Japanese, or Russian, had to select a character encoding that specifically supported that language. The problem with that is that many computers around the world wouldn’t display those character encodings correctly. And a third of the web pages out there are still done this way. But with Unicode and UTF-8, this problem is avoided. There is no longer a need for any special character encodings for documents, and their use is declining rapidly.

Unicode is a boon for email messages for the same reasons. Email is a few years behind the web in technology, but it seems fair to guess that Unicode will be the standard for email within two years too.

The Unicode transition has been nearly effortless for most of us. The horizons of our personal communication are expanding dramatically, yet when we look at a web page, we don’t have to notice that anything is changing.

Fish Nation Information Station | Rick Aster’s World | Rick Aster