Settings

Theme

Ask HN: Why do Greek or Russian characters get URL encoded?

1 points by greeklish 4 years ago · 3 comments · 1 min read


Why don't browsers URL encode just the special characters such as ?, & and whitespaces when a URL is copied?

The resulting URLs would be much cleaner and easier to understand for native speakers.

ignoranceprior 4 years ago

I think technically a URL/URI is only supposed to contain ASCII characters, and certain things that expect URL input will want ASCII only. However, all modern browsers can convert from Unicode to punycode (in the domain name) and percent-encoding (in the path). So I don't really understand why browsers only let you copy the percent-encoded form easily.

WizardOfLight 4 years ago

It’s to avoid malware domains and similar applications of various malfeasance. In short, because they can be used to obfuscate the actually intended url copied, such as fасеbook <— this has Cyrillic characters in it you can’t otherwise notice.

  • ignoranceprior 4 years ago

    Try going to e.g. https://el.wikipedia.org/wiki/%CE%A7%CF%81%CE%BF%CE%BD%CE%BF...

    When visiting the page (at least on Firefox), the location bar displays the intended characters, but when copying it you get the percent-encoded form. If the reason for percent-encoding was just to make scams more obvious, shouldn't the form shown in the browser interface also be URL-encoded?

    By the way, it seems like that is a much stronger argument when applied to domian names than when applied to the part of the URL after the slash.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection