LinkedIn, HiQ spat presents big questions for freedom, innovation
sfchronicle.comAs someone who once oversaw the operation of a web crawler I can tell you its pretty simple, if it is "Okay" then the robots.txt file will tell you its allowed. If you look at the LinkedIn robots.txt (https://linkedin.com/robots.txt) you will see it is carefully groomed to allow various search engines look through specific sections of their web site, the rest are disallowed.
Pretty much all of the case law comes down as there is a perfectly valid copyright on the 'collection' of a web site regardless of ownership of particular pieces, and the robots.txt is a well known and well understood mechanism for informing 'authorization'
There is a "value" to LinkedIn to letting Google and other search engines crawl them, you get to see pages in your search results pointed at LinkedIn, so LinkedIn lets them crawl their pages.
At the end of the day this is exactly a question of value. Microsoft knows that the collection of information in LinkedIn is valuable for a number of uses, if you want to pay them some of that value to get access to it, fine, if not then don't use it.
Here is one possible outcome; Microsoft will tell them what it will cost to use their info, HiQ will probably not be able to meet it because they've built their existing pricing structure around "free" access, and then as they are going down the drain Microsoft will buy their assets and technology and LinkedIn will get this new service you can buy from them to help you find and retain people.
From what I've been told, if the data is factual, such as current employment information, then it doesn't fall under copyright.
Interpretation of that factual data would fall under copyright though.
Kind of, kind of not. It is true that the language of Copyright law calls out 'facts' as something that is not being protected by copyright, how you get the facts has a large bearing on whether or not you can reproduce them.
There is a lot of case law around this stuff as you might imagine. I certainly haven't followed all of it but my interest in information economics has lead me to read fairly extensively about it. And I'm not a lawyer, and especially not a Copyright lawyer so it is entirely possible that everything I have come to know is pure bollocks, consider yourself so warned :-).
Generally in reading about these things there are 'facts' and 'how you got access to them' that come out. There are lots of cases where the "collection" of facts has been upheld to be protected. So for example the "Machinists Handbook" is a collection of facts about machining and the handbook is protected by copyright, even though the specific dimensions of various thread pitches are just 'facts'. Perhaps more interesting has been cases involving national sports leagues against companies and fans who do things like "live tweet" a sports event. They have argued successfully that by buying a ticket to the event you have agreed to the terms of that admission which expressly prohibits you from reproducing those facts in any form. So while it may be a "fact" the Buster Posey just struck out, if you learned of that fact by sitting in AT&T park at a game you can't legally "tweet" it without violating your agreement with Major League Baseball that you agreed to when you bought the ticket.
It has similarly been held (look at a lot of CraigsList vs a bunch of people) that automating access to a web site through scraping is an access that you have to be explicitly allowed. That allowance comes in the terms of service of the web site and is expressed by the robots.txt file (and the available terms of service contracts on the site).
What it boils down to is that the collection of facts in a web site ARE protected by Copyright. Further, in exchange for granting you access to the information, the Copyright owner CAN put restrictions on how you may further use the facts you discover there. If you wish to use the information in a way the Copyright owner objects too, you must get the 'facts' through some other source and not the Copyright owner's collection.
And yes, getting it out of Google's cache of the pages does not count. See the Craiglist vs 3Taps (https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc.) dispute to get a feel for how the court views things. The simplest interpretation I can make from those events was that Google's caching pages counts as fair use (it makes results faster) but people taking the page from Google's cache is either a CFAA or Copyright violation and thus disallowed.
It's interesting to think about how this differs from Craigslist going after scrapers. Linkedin is objecting on the basis of DMCA (copyrights) and the Computer Fraud and Abuse Act (alleging unlawful access of their public website).
>Nate Cardozo, a senior staff attorney for the Electronic Frontier Foundation in San Francisco, said copyright law doesn’t apply to this case because information from LinkedIn profiles, like when someone worked at a particular company, are facts, not creative works like music or films.
I wonder if copyright applied to Craigslist posts or if the fact that a house is for rent is just a fact.
There's a fine line between limiting the free speech of these personal information aggregators and violating the privacy of people. Right now, there seems to be a lawless and limitless environment for the aggregators. They freely present people's physical address history, phone numbers, relatives, date of birth, etc. to the open web.
I hope some reasonable restraints will be put in place. Something like aggregated address history can only be displayed on official government websites with rate limits.
Seems identical to podmappr/3taps v Craigslist which ended up in a settlement of something like a million USD to Craigslist.
The CFAA is such a horribly written statute. It needs to be completely rethought.
On phone and not sure of how to format a quote. Can anyone explain how this from their terms of services affects copyright claims. It appears that they do not own the content on user profiles and shouldn't be able to sue based on it:
> You own all of the content, feedback, and personal information you provide to us, but you also grant us a non-exclusive license to it.
I had the very unfortunate pleasure of being neighbors with HiQ when we shared a coworking space in downtown SF. They were aggressively selling and marketing HR software that alerted employers when their employees updated their LinkedIn profiles, indicating an intent to quit.
I've got no problem with the business, and LinkedIn's suit is probably an overreach, but by God were the HiQ people annoying. I remember the CEO pretend boxing with their salesmen to get them amped up (in our rather open, shared office). Quintessential white good ol boys club mentality.