In search of the perfect URL

41 points by owksley 11 years ago · 29 comments

Reader

Xvideos uses the same technique to good effect as well. All of their videos links follow the format:

www xvideos com/videoNNNNNNN/description_of_activity_which_can_be_easily_updated (NSFW)

I wanted to mention a mystery concerning Xvideos. Here's a business that is very much in-your-face (i.e. it is not a defense contractor or an organization that wants to be discreet), but its ownership is totally unknown.

I researched it. There are literally zero articles or information about who owns it. No interviews with the founders. Nothing. I haven't been able to even figure out what country it's based in.

Somewhere out there is a very rich person whose family and friends probably don't realize that he founded a major Internet business.

Yes, a major Internet business: they have several million videos (far more than competing "tube" sites), hundreds or thousands of fast servers, and an Alexa rank of 47 which is higher than imdb.com and only a couple steps below microsoft.com.

But in this age of little privacy, they've managed to be super private.

icebraining 11 years ago

It was actually blown recently (Aug. 15th) thanks to a lawsuit:
Another infringement suit has been waged by the MetArt Network against a well-known online adult brand. (...) This time around the target is adult tube site XVideos.com and two related web properties (...) along with defendant owners Stephane and Malorie Pacaud of France
http://www.xbiz.com/news/197942
Of course, those names could just be covers.
eli 11 years ago

Or maybe some corporate conglomerate that doesn't want the association to harm the reputation of other holdings. I once worked for a company that also operated an adult brand, and they went out of their way to obscure their ownership.
camillomiller 11 years ago

Hey, you should try and ramble off-topic sometimes!

lwf 11 years ago

It also means you can trick people:

http://www.amazon.com/Intel-Quantum-Computing-Module/dp/B001...

nmjohn 11 years ago
Amazons URL's are actually quite interesting -
```
    Original: 
        http://www.amazon.com/Structure-Interpretation-Computer-Programs-Engineering/dp/0262510871/

    Equivalent:
        http://www.amazon.com/dp/0262510871
        http://www.amazon.com/dp/0262510871/something-else
        http://www.amazon.com/something/dp/0262510871
        http://www.amazon.com/something/dp/0262510871/something-else
```
It appears so long as 'dp/0262510871' is in the url (without dp/# appearing before it, but a second one after is fine) it works.
- cpeterso 11 years ago
  
  Or simply http://amzn.com/0262510871
  - rawdisk 11 years ago
    
    HTTP/1.1 301 Moved Permanently
    This is a URL shortener that just redirects to the full URL that has same number. Easier to type but otherwise acomplishes nothing. Server with the content still needs full URL. All this shorter URL gets you is the full URL.
ademarre 11 years ago

You can trick people, and possibly even search engines. I've wondered if blackhat SEOs could abuse such URLs to discredit content on competitors' sites.
I believe it can be a negative signal when sites stuff too many keywords in their URLs, especially if those keywords aren't relevant to the page's content. A server accepting arbitrary URLs is in a way blindly sanctioning loaded URLs.
Granted, Google's algorithms are surely very sophisticated in this regard, but fighting web spam is hard.
- MartijnHoutman 11 years ago
  
  Surely, a correct canonical URL will prevent this from happening.

X-Istence 11 years ago

Stack overflow does this too:

http://stackoverflow.com/questions/32672492/python-3-5-start...

Is the same as:

http://stackoverflow.com/questions/32672492/

adventured 11 years ago

I find it interesting the author mentions making an effort to remove the numeric ID from the URL.

I love using numeric IDs in the URL, for one specific reason: perma-short-link.

http://qz.com/365810/whats-missing-from-this-13-year-old-gir...

Becomes:

http://qz.com/365810

Which then redirects to the proper full url. Total effort: almost nil.

userbinator 11 years ago

Not only numeric but alphanumeric IDs; they also work as a nice shorthand in communication. I've seen plenty of people referring to e.g. "video jI3i9Lq4BcX on YouTube" on sites which would otherwise censor actual URLs.

franze 11 years ago

my battle-proven URL rules. important: rule 1 is more important then rule 2 to 6 added up, rule nr 2 is more important than rule 3 to 6 totaled, rule 3 is more important than 4 to 6 together, rule 4 is more important than 5 + 6, rule 5 and rule 6 are a tradeoff (it's short, not shortest possible URL).

the targeted phrase is term(s) you want to get found for (i.e.: in google search)

URL-Rule 1: unique (1 URL == 1 resource, 1 resource == 1 URL)

URL-Rule 2: permanent (they do not change, no dependencies to anything)

URL-Rule 3: manageable (measurable, 1 logic per site section, no complicated exceptions, no exceptions)

URL-Rule 4: easily scalable logic

URL-Rule 5: short

URL-Rule 6: with a variation of the targeted phrase

most common mistake, rule 6 (least important) invalidates rule 1 (most important)

i stand with these url-rules, evertime you compromise on them - or change the priority in between the url-rules, you - your company/startup/business/website/webapp - will regret it in the longterm.

about: >This is the sort of solution that I really like. The SEO folks can fiddle with the URL until the cows come home, the engineers have the luxury of a straightforward rule, and the user never sees a broken link. Is this simple structure enough to keep everybody happy?

every redirect has a cost:

- server ressources

- (web)performance a.k.a. speed

- long term project costs: redirects needs to be maintained (they will not) and documented (they are not)

- added complexity (redirect complexity add up fast, more info see https://news.ycombinator.com/item?id=8891553 )

lwf 11 years ago

> every redirect has a cost:
If you are actually just keying your content lookup on the ID and don't redirect the user, what's the performance problem?
And use rel=canonical so search engines do the right thing.
- franze 11 years ago
  
  no
  simplified google works like this
  discovery (queue) -(quality check)-> crawling(optional) -QC-> indexing
  google does not "follow" canonicals, but whenever google discovers (during crawling) a canonical it pushes it back to the discovery queue -> needs to crawl again -> needs to figure out indexing
  canonical is an indexing directive
  so basically there are two quality checks before google can actually apply the indexing directive after it has discovered the canonical during crawling. also you can never be sure when - if ever - it will fetch the canonical URL or choose to canonical it.
  for small sites this is not a big an issue (you will have internal duplicate pages for google for an unknown amount of time, but at one point they will probably be canoncalized). for big sites with millions and millions of URLs this is a big issue. basically in your example is the worst case: URL rule 6 (least important) breaks rule nr 1. then why do it at all
  additionally to communicate different URLs to the users (based on the way which they came to your site) which is just bad UX.
  don't do it.

ckluis 11 years ago

I like this solution.

Essentially qz.com/122345/{anything-here} will redirect to the canonical url allowing for experimentation on the title of articles and urls.

thephyber 11 years ago

I thought this was fairly common knowledge.

Using a DB PKID is a faster lookup than a text slug and uses much less storage space in the DB.

For SEO / URL permanence reasons, the PKID is always the authoritative key while the slug can be updated to represent the current content of the URL.

jjsewell-ff 11 years ago

When building content management systems, we've taken a approach similar to this to keep URLs constant when names of articles, posts, objects might get changed by an site admin. The first time I noticed this approach was Trello.

Here's an example trello URL: https://trello.com/x/1234567/203-make-the-buttons-bigger

If you change the name of the card, the ID (203) stays the same, but the friendly part of the URL stays the same. When directing you to the card, the system doesn't care past the ID.

giancarlostoro 11 years ago

Interestingly enough I think I tried the same thing when I saw a link from the same site. It is indeed a great workaround to the changing URL's dilemma.

ambirex 11 years ago

We have reversed it to be example.com/seo-go-nuts/%d/ to bring the text closer together.

Walkman 11 years ago

The problem with that if the user chop off the last bits (e.g. Pasting in simewhere where it cannot fit) the id lost and you can't look it up. it uappens more than you would think. It's important to have it early.
- eli 11 years ago
  
  We use this scheme as well. If only the last part with the ID is cut off and you keep the slug text unique, you can still redirect to the correct article.
  - Walkman 11 years ago
    
    Then you don't need the ID at all :) because you use the slug
    
    eli 11 years ago
    
    I want the slug to be able to change and I'd prefer not to have to keep track of every variation ever assigned to that piece of content.
    
    Walkman 11 years ago
    
    if you go with /id/slug, you can redirect anything that is not exactly the same as the current, so any older links would still work because no matter what the slug was, you can redirect because the ID doesn't change.
    
    eli 11 years ago
    
    right, same as /slug/id
    
    Walkman 11 years ago
    
    no :D /id/slug is safer

Settings

In search of the perfect URL

Keyboard Shortcuts