Every time one of my posts on this journal ends up somewhere on Reddit,
Whenever you launch your favorite browser and
type in e.g.
soylentnews.org, your web browser needs to look up the server
from which it can request the website from. On the internet there are literally
a gazillion of servers running 24/7, each with different websites on them.
Unfortunately, those servers don’t have actual names, like
instead listen to something called an IP address. It’s sort of an
internet phone number. Sort of. Anyway.
In order for the browser to know a website’s IP address, it needs to look it
up first, using something called the Domain Name System, or short DNS.
The DNS is basically the phone book of the internet and contains a huge
table in which every website’s domain is assigned to the IP address of its
server(s). The DNS basically translates a website’s domain, e.g.
soylentnews.org, into its IP address, e.g.
When this internet’s phone book (DNS) was initially created, it only allowed
for a limited set of ASCII characters to be used in host (
soylentnews.org) names. With the growth of the internet and its reach
to non-English speaking countries however, the need for international domain
names that could contain Unicode characters – like á, ț or even す –
In order to be able to represent these characters throughout the ASCII-based services that form the internet, it was required to implement an ASCII character representation of such Unicode domains, that could be used by systems like the DNS. This implementation is called Punycode, an instance of the more general algorithm called Bootstring, as described in the RFC 3492. Punycode “allows strings composed from a small set of ‘basic’ code points to uniquely represent any string of code points drawn from a larger set.”
So how does this look in practice? Assuming you type málaga.es, which is the
website of the Málaga municipality in Spain, then your browser will convert this
xn--mlaga-xqa.es, which is a valid ASCII-character domain,
and it will continue to send the converted domain to the DNS in order to
retrieve its IP address and open the website.
Rule of thumb: Punycode prefixes international domain names with
xn--, meaning that whenever you see a domain starting with
xn--you can be sure that it’s an international domain name that contains non-ASCII characters.
You could however manually type
xn--mlaga-xqa.es into your browser’s address
bar, in which case the browser would detect that you’ve provided an already
normalized Unicode domain and simply proceed with the DNS request and page
Browsers have different default settings for displaying Unicode domain names.
Some browsers choose to display the actual Unicode representation, e.g.
málaga.es, while others will display the ASCII domain
instead. Some browsers offer an option for you to choose what to display. The
main reason for browsers to not display the Unicode representations and instead
go with the ASCII format is the fraud potential that Unicode domains bring.
Unicode domains allow for URL spoofing. For example, the letter “h” is virtually indistinguishable from the Unicode character “һ” (Shha in Unicode). This makes it possible for fraudsters to register e.g. the domain mcbseycһelles.com, in an attempt to impersonate the actual Mauritius Commercial Bank on the Seychelles, and try to trick customers into logging in to the fake bank with their actual credentials.
Effectively this means that the Unicode representation of international domain
names is actually more dangerous, and that the Punycode representation
xn--...) should be preferred for the sake of clarity.
PS: In case of málaga.es it just so happens that the web server automatically redirects people to malaga.es, which is a separate domain that the municipality also owns, and that it uses as its primary domain. Obviously for non-Spanish speakers, the plain ASCII domain malaga.es is easier to read, remember and type – even though that might be irrelevant these days (see below).
“But why are you using such an impossible to remember Punycode domain, that only Japanese speakers can possibly type out?!”
I’m a software architect and engineer by trade and a hacker by heart; I like to provoke chaos in order to see how systems react. Even though Punycode is probably older than the average reader of this site, it turns out that many modern, widely used systems still cannot handle it properly.
For example, I cannot log into my Patreon account anymore, ever since I had the support change my email address from a regular ASCII-domain to this Unicode domain, because the login form doesn’t recognize my email address as a valid address.
Similarly it was impossible for a company that I purchased something from to create an order for a replacement part in their system with this email address added to my account. Their system would simply not allow for the order to be created.
These are just two of many comical situations that I keep running into, and that show flaws that sometimes puzzle even me, leading me to consider cases within the infrastructures and systems, which I’m building for my clients, that I otherwise might not have had considered.
Okay but aren’t you making it impossible for people to find you?
Not at all. Domain names are dead. Unless you’re one of the conglomerates
running what is today’s internet, there’s no point in spending any effort in
finding a short and representative domain name. The majority of internet users
don’t type domain names anymore. They follow the URL that was linked in a post
on a social media app, or they use their browser’s search engine to look for a
specific topic, or they ask ChatGPT. People don’t even type conglomerates’
domain names anymore. Adding the
.com part to
already become too much of an effort, even with smartphone keyboards offering
.com buttons and the overall round-trip time being significantly
shorter than going through a search engine.
Especially with the content that I’m creating, there’s no real benefit in having a memorable domain name. The average Joe simply isn’t interested in this site, while people searching for awkwardly specific things will usually find it right away, regardless of the domain name. Whether they will actually click a URL that says マリウス.com or even xn–gckvb8fzb.com is a whole other story. However, judging by my privacy-friendly analytics tool this site is not doing too bad.
Last but not least, I’m happy to part-take in making the average internet user more conscious about the fact that there are different types of writing systems and that large parts of the world use e.g. logographic systems and not solely Latin script. Not only does the domain name I’m using stir up a bit of discussion, it also brings the opportunity for people to learn something new, especially with great explanatory comments like this one:
My Japanese is rusty, but it should display as: マリウス “ma” “ri” “u” “su”, which looks to be a foreign name (katakana), “Marius.” That matches the username on this user’s github email address, also at the same domain, which means their email address should be pronounced, “Marius at Marius dot com”
Well done, Marius. Well done.
published [ ] · updated [ ]