In search of the perfect URL
ollysco.deXvideos uses the same technique to good effect as well. All of their videos links follow the format:
www xvideos com/videoNNNNNNN/description_of_activity_which_can_be_easily_updated (NSFW)
I wanted to mention a mystery concerning Xvideos. Here's a business that is very much in-your-face (i.e. it is not a defense contractor or an organization that wants to be discreet), but its ownership is totally unknown.
I researched it. There are literally zero articles or information about who owns it. No interviews with the founders. Nothing. I haven't been able to even figure out what country it's based in.
Somewhere out there is a very rich person whose family and friends probably don't realize that he founded a major Internet business.
Yes, a major Internet business: they have several million videos (far more than competing "tube" sites), hundreds or thousands of fast servers, and an Alexa rank of 47 which is higher than imdb.com and only a couple steps below microsoft.com.
But in this age of little privacy, they've managed to be super private.
It was actually blown recently (Aug. 15th) thanks to a lawsuit:
Another infringement suit has been waged by the MetArt Network against a well-known online adult brand. (...) This time around the target is adult tube site XVideos.com and two related web properties (...) along with defendant owners Stephane and Malorie Pacaud of France
http://www.xbiz.com/news/197942
Of course, those names could just be covers.
Or maybe some corporate conglomerate that doesn't want the association to harm the reputation of other holdings. I once worked for a company that also operated an adult brand, and they went out of their way to obscure their ownership.
Hey, you should try and ramble off-topic sometimes!
It also means you can trick people:
http://www.amazon.com/Intel-Quantum-Computing-Module/dp/B001...
Amazons URL's are actually quite interesting -
It appears so long as 'dp/0262510871' is in the url (without dp/# appearing before it, but a second one after is fine) it works.Original: http://www.amazon.com/Structure-Interpretation-Computer-Programs-Engineering/dp/0262510871/ Equivalent: http://www.amazon.com/dp/0262510871 http://www.amazon.com/dp/0262510871/something-else http://www.amazon.com/something/dp/0262510871 http://www.amazon.com/something/dp/0262510871/something-elseOr simply http://amzn.com/0262510871
HTTP/1.1 301 Moved Permanently
This is a URL shortener that just redirects to the full URL that has same number. Easier to type but otherwise acomplishes nothing. Server with the content still needs full URL. All this shorter URL gets you is the full URL.
You can trick people, and possibly even search engines. I've wondered if blackhat SEOs could abuse such URLs to discredit content on competitors' sites.
I believe it can be a negative signal when sites stuff too many keywords in their URLs, especially if those keywords aren't relevant to the page's content. A server accepting arbitrary URLs is in a way blindly sanctioning loaded URLs.
Granted, Google's algorithms are surely very sophisticated in this regard, but fighting web spam is hard.
Surely, a correct canonical URL will prevent this from happening.
Stack overflow does this too:
http://stackoverflow.com/questions/32672492/python-3-5-start...
Is the same as:
I find it interesting the author mentions making an effort to remove the numeric ID from the URL.
I love using numeric IDs in the URL, for one specific reason: perma-short-link.
http://qz.com/365810/whats-missing-from-this-13-year-old-gir...
Becomes:
Which then redirects to the proper full url. Total effort: almost nil.
Not only numeric but alphanumeric IDs; they also work as a nice shorthand in communication. I've seen plenty of people referring to e.g. "video jI3i9Lq4BcX on YouTube" on sites which would otherwise censor actual URLs.
my battle-proven URL rules. important: rule 1 is more important then rule 2 to 6 added up, rule nr 2 is more important than rule 3 to 6 totaled, rule 3 is more important than 4 to 6 together, rule 4 is more important than 5 + 6, rule 5 and rule 6 are a tradeoff (it's short, not shortest possible URL).
the targeted phrase is term(s) you want to get found for (i.e.: in google search)
URL-Rule 1: unique (1 URL == 1 resource, 1 resource == 1 URL)
URL-Rule 2: permanent (they do not change, no dependencies to anything)
URL-Rule 3: manageable (measurable, 1 logic per site section, no complicated exceptions, no exceptions)
URL-Rule 4: easily scalable logic
URL-Rule 5: short
URL-Rule 6: with a variation of the targeted phrase
most common mistake, rule 6 (least important) invalidates rule 1 (most important)
i stand with these url-rules, evertime you compromise on them - or change the priority in between the url-rules, you - your company/startup/business/website/webapp - will regret it in the longterm.
about: >This is the sort of solution that I really like. The SEO folks can fiddle with the URL until the cows come home, the engineers have the luxury of a straightforward rule, and the user never sees a broken link. Is this simple structure enough to keep everybody happy?
NO
every redirect has a cost:
- server ressources
- (web)performance a.k.a. speed
- long term project costs: redirects needs to be maintained (they will not) and documented (they are not)
- added complexity (redirect complexity add up fast, more info see https://news.ycombinator.com/item?id=8891553 )
> every redirect has a cost:
If you are actually just keying your content lookup on the ID and don't redirect the user, what's the performance problem?
And use rel=canonical so search engines do the right thing.
no
simplified google works like this
discovery (queue) -(quality check)-> crawling(optional) -QC-> indexing
google does not "follow" canonicals, but whenever google discovers (during crawling) a canonical it pushes it back to the discovery queue -> needs to crawl again -> needs to figure out indexing
canonical is an indexing directive
so basically there are two quality checks before google can actually apply the indexing directive after it has discovered the canonical during crawling. also you can never be sure when - if ever - it will fetch the canonical URL or choose to canonical it.
for small sites this is not a big an issue (you will have internal duplicate pages for google for an unknown amount of time, but at one point they will probably be canoncalized). for big sites with millions and millions of URLs this is a big issue. basically in your example is the worst case: URL rule 6 (least important) breaks rule nr 1. then why do it at all
additionally to communicate different URLs to the users (based on the way which they came to your site) which is just bad UX.
don't do it.
I like this solution.
Essentially qz.com/122345/{anything-here} will redirect to the canonical url allowing for experimentation on the title of articles and urls.
I thought this was fairly common knowledge.
Using a DB PKID is a faster lookup than a text slug and uses much less storage space in the DB.
For SEO / URL permanence reasons, the PKID is always the authoritative key while the slug can be updated to represent the current content of the URL.
When building content management systems, we've taken a approach similar to this to keep URLs constant when names of articles, posts, objects might get changed by an site admin. The first time I noticed this approach was Trello.
Here's an example trello URL: https://trello.com/x/1234567/203-make-the-buttons-bigger
If you change the name of the card, the ID (203) stays the same, but the friendly part of the URL stays the same. When directing you to the card, the system doesn't care past the ID.
Interestingly enough I think I tried the same thing when I saw a link from the same site. It is indeed a great workaround to the changing URL's dilemma.
We have reversed it to be example.com/seo-go-nuts/%d/ to bring the text closer together.
The problem with that if the user chop off the last bits (e.g. Pasting in simewhere where it cannot fit) the id lost and you can't look it up. it uappens more than you would think. It's important to have it early.
We use this scheme as well. If only the last part with the ID is cut off and you keep the slug text unique, you can still redirect to the correct article.
Then you don't need the ID at all :) because you use the slug
I want the slug to be able to change and I'd prefer not to have to keep track of every variation ever assigned to that piece of content.
if you go with /id/slug, you can redirect anything that is not exactly the same as the current, so any older links would still work because no matter what the slug was, you can redirect because the ID doesn't change.
right, same as /slug/id
no :D /id/slug is safer