21 Comments

Summary:

Get ready for a fracturing of the Internet domain name space and the global Internet looking far less homogenous and more like the ancient town of Babel. And a lot of it has to do with International Domain Names in Applications or IDNA. The Internet and […]

Get ready for a fracturing of the Internet domain name space and the global Internet looking far less homogenous and more like the ancient town of Babel. And a lot of it has to do with International Domain Names in Applications or IDNA.

The Internet and its user applications have long been dominated by the ASCII character set that is familiar to English speakers – or at least a combination of characters that are designed to be spoken and understood.

With the implementation of international domain names (officially International Domain Names in Applications or IDNA) in the root of the DNS by ICANN in the near future, it appears that your favorite applications (email, browser, IM, etc.) are about to be multilingual, potentially leading to a fracturing of the Internet domain name space and the global Internet looking far less homogenous and more like the ancient town of Babel.

I first learned of international domain names from a recent conversation with David Conrad, the General Manger of Internet Assigned Numbers Authority (IANA) at the Internet Corporation for Assigned Names and Numbers (ICANN). As David explained, IDNA is a mechanism for domain names to appear to contain non-ASCII characters.

This means that a business that operates in a geography (or wants to reach a community from a geography) that does not use ASCII characters would be able to represent themselves on the Internet by using their native character set. Without diving into the social and political issues that are far beyond my areas of expertise, this intuitively makes sense to me but has global significance on how we use domain names.

For example, if a local Chinese bakery wants to build a website, why should it have to come up with an ASCII character representation of its domain name, especially when most of its customers may speak Chinese and have a Chinese character keyboard? Users with a Chinese input device would type the proper characters into their browser and be directed to the bakery’s website.

For users without a Chinese input device who want to go to the same bakery, things get more complex. IDNA describes a manner where users enter into their application a Punycode, or a string that provides a way to translate non-ASCII to ASCII characters, defined by RFC3492. This mechanism is beneficial as it removes any concerns about backwards compatibility with existing Internet applications and services.

However, this means that the ASCII character user has to type a string that begins with “xn--” into a browser or email client if they want to reach the Chinese bakery (Wikipedia gives a good example of Punycode encoding for the Swiss domain bücher.ch into the Punycode xn--bcher-kva.ch). Today, both IE and Firefox support Punycode, as do other browsers and some email servers.

And here is where the Internet starts to resemble Babel. If IDNA takes root (no pun intended) and every country starts to use domain names with non-ASCII characters, how will those of us with ASCII input devices find these domain names and their associated Punycodes to enter into our applications?

How will we know that the Chinese bakery even exists? Better yet, how will someone with a Chinese input device be able to reach a website or email address in a different non-ASCII domain (such as Cyrillic, for example) if they cannot enter a Punycode string with the “xn—” characters?

In the end, I think that international domains make intuitive and practical sense even if they will force a change in user behavior. IDNA is leading us toward a fractured domain name space based on native language characters sets and while I am not a sociologist, this feels like the globally correct thing to do.

Anyone have a business plan for a Punycode Babel fish that I can fund?

Allan Leinwand is a venture partner with Panorama Capital and founder of Vyatta. He was also the CTO of Digital Island.

  1. If Dynamic Hash Table (DHT) mechanisms continue to develop (beyond say the torrents they are currently used with) — won’t they just replace the application (content) functionality that DNS provides today ? By analogy — isn’t DNS as meaningless (in such a future) as area codes are today (with number portability)… The topic BEGS this question ….

    Share
  2. This makes sense, when you first think about it. But after realizing the global implications, it’s not possible or functional to be implemented.

    Consider this situation:
    -Currently, Google owns localized domain names to target certain viewership, like Google.uk [The UK, duh], Google.com [primarily the U.S.], Google.cn [China], etc.
    -These localized domains are all provided in the native language of the area
    -These localized domains all have content specific to the area

    So if we have a current and working solution now, why implement a complicated one that will cause problems all over?

    Would Google, Yahoo, or any other company have to go and register the translation of their name in every single language in order to maintain a stronghold on that copyright/trademark? That could mean a large sum of money being wasted on purely domain registration.

    Also, once the domains are non-ASCII’d, will the designer have to develop using different methods and standards? Right now the W3 Consortium does a very good job… but can that be maintained once the GLOBAL internet fractures into many LOCAL intranets?

    This is not worth the hassle, as it is problematic from the start. I sincerely hope this topic goes nowhere and is forgotten.

    –Kyle

    Share
  3. “Better yet, how will someone with a Chinese input device be able to reach a website or email address in a different non-ASCII domain (such as Cyrillic, for example) if they cannot enter a Punycode string with the “xn—” characters?”

    Most non-latin input methods use the standard ASCII keyboard. For instance, most Chinese IME’s have some variation of: you type the sound of the word using standard ASCII keyboard and then software shows you the characters that match that sound.

    In other words, the above quote is not an issue at all. Everyone still types initially in ASCII and the IME will always allow you to switch back to that.

    The internet is already babel. Does sina.com make anymore sense to a none-Chinese speaker just because the domain name is easy to read in English? I don’t think so.

    Share
  4. J – You bring up a good point, but why does sina.com even need an ASCII domain name representation of their business? Why should the domain name not be in Chinese characters?

    It’s true that there are multilingual pages on the web today, not entirely that you need a multilingual input device or Punycode to get navigate – that day is coming.

    Share
  5. Kyle – I think you make some good arguments, but IDNA is being implemented. As you can see from this timeline, the lab tests were just this March and we appear to be moving toward implementation. Get ready to practice your “xn--” URLs and email addresses….

    Share
  6. great news for phishers… who’ll be first to register

    http://www.e“bay.com etc etc.

    Share
  7. Allan, as you noted briefly in your post, the primary motivation of the current IDNA design (use of punycode) is that no plan to support international domain names would have succeeded if it required changes to the DNS servers all over the world.

    As far as typing xn--* in the browser is concerned, whether a website uses an international hostname or not will depend on the market it is catering to. For example, a website with chinese URL will be concerned with people not knowing how to type chinese. I expect that websites which have both chinese and non-chinese users will continue to have ASCII URLs, and with IDNA support in browsers have the option to have an additional chinese url for people who only understand chinese.

    For phishing all the browsers have adopted interesting and different strategies to combat that problem. How it works out in the long run is still to be seen.

    My views…

    -Vishu

    Share
  8. How many times do I use the URL for going to a website. For me personally it would be google.com, bloglines.com etc. Most of the time I google it. But yes IDNA would be helpful as browsers will have better support for URLs containing unicode chars. Consider two similar URLs below

    http://ms.pnarula.com/200704/%e0%a4%ae%e0%a5%87%e0%a4%b0%e0%a5%80-%e0%a4%9a%e0%a4%be%e0%a4%af-%e0%a4%b2%e0%a5%88%e0%a4%ac/

    http://ms.pnarula.com/200704/मेरी-चाय-लैब/

    Both of them point to same page.

    Pankaj

    Share
  9. Correction to previous comment:

    For example, a website with chinese URL
    will be concerned with people not knowing
    how to type chinese.

    For example, a website with only a chinese hostname will probably not be concerned with people not knowing how to type chinese.

    Share
  10. I’ve bought hundreds of these “IDNs.” If you guys could see the traffic increase in terms of percentages from October of 2006 compared to April 2007… it might just give someone a stroke.

    Lets just say I’m sitting pretty right now.

    Share

Comments have been disabled for this post