Google is funding the creation of software that writes local news stories

189 points by tokyoSurfer 9 years ago · 139 comments

Reader

I ran across this article while researching a stock and as I read, I kept thinking, "This was not written by a person. This was written by software." [0]

I checked the attribution, and there is a person's name on it. Sure, any hack can write and publish and this is probably just another example. But the odd style doesn't even strike me as 'writing the way I think' or writing and publishing quickly without editing. For example, from the 2nd paragraph, "The corresponding low also paints a picture and suggests that the low is nothing but a 97.89% since 11/14/16." I can't gather any meaning from that statement, yet it has oddly specific details.

I am not glad to see this trend and not glad that Google is embarking on this path. I suppose it is inevitable, but unless there is expertise built into this AI that can extract meaning from data on my behalf and present it in a way that is more insightful and interesting than I am, it will become yet another source of chaff I'll have to filter.

Can we at least, please, flag AI generated prose as such?

[0] https://www.nystocknews.com/2017/07/05/tesla-inc-tsla-showca...

iak8god 9 years ago

That "author" published 87 such articles on July 7 2017[0], and a total of over 8,500 so far this year[1].
[0] https://www.nystocknews.com/author/mack-tyler/page/9/
[1] https://www.nystocknews.com/author/mack-tyler/page/854/
- mikeyouse 9 years ago
  
  As far as I can tell;
  None of the authors are real people, the website is registered behind a anonymization service, there is no company registered with their name, the address of their office doesn't exist, the phone number connects to a 'subscriber not in service'...
  If you look at their Google Ad ID, it was used in the past on the now defunct "TheSportsTruth.com" -- which looks like it primarily existed to shuffle people to a supplement site. From there, there are a ton of links to other random affiliate schemes with sports, 'internet marketing', etc. No sense outing anyone, but I believe I figured out who's behind a few dozen of this shitty sites. The NYStockNews site seems to make its money by referrals to some penny stock scam sites.
  It's crazy how much 'content' on the internet exists solely to get people to click on links to supplements & penny stock scams.
  - mrkrab 9 years ago
    
    How did you find out that stuff of the google ad id? Especially associating it with past websites.
    
    mikeyouse 9 years ago
    
    In the source of every page with Google ads, there's a "Publisher ID" which is a unique identifier for the account. In the case of NYStockNews.com, it shows up as:
    > google_ad_client: "ca-pub-6009540024781990"
    From there, there are specialty services that keep track over time, otherwise, you can just search for the trailing digits on Google.
    When doing so for "6009540024781990", a few sites come up, GDPInsider.com - Another stock bot-written site, and then a dead link with a Google Cache:
    https://webcache.googleusercontent.com/search?q=cache:Ccudn6...
    Using various other tools, you can see the domain registration information over time or ID which servers hosted it, or just find out who was linking to a domain earliest. Reddit is a great site for the latter. Often times, when a 'marketer' sets up a site like these, they immediately run to social media to try and promote it. If you can find the first time it's linked publicly, you can often find out who posted it.
    That last part is actually one of the ways they tied Ross Ulbricht to The Silk Road -- They found the first public mention of The Silk Road online (The post: https://www.shroomery.org/forums/showflat.php/Number/1386099...) which was written by 'Altoid' and directed users to a Wordpress page that had been set up a few days earlier. They then found a series of posts on BitcoinTalk by 'Altoid' looking for an IT Pro in the Bitcoin community with instructions to email Ross.Ulbricht@gmail.com if they were interested in a job... He was doing deeply illegal stuff and couldn't be bothered to mask his ID, imagine how easy it is to find rando affiliate marketers.
nickdavidhaynes 9 years ago

I work for Automated Insights, a company that makes a SaaS platform very similar to what's in the article. Here's an example "in the wild" of the content we produce - http://www.thenewstribune.com/news/business/article158774809...
Many of your criticisms are totally valid. Lots of the phrasing is awkward - even the lede is really bad ("Tesla, Inc. (TSLA) has been having a set of eventful trading activity"...wat). And it feels really deceptive to put a human byline on an automated article.
We're pretty open about the fact that our solution to this problem is not "magical" at all [1, 2] - it's good, old-fashioned automation. This approach allows our customers to QA their content heavily before pushing it to production, which eliminates many of the problems with awkward/incorrect phrasing that people who rely more heavily on machine learning tend to run into. And the news articles we publish always have a note at the end saying that they were generated by Automated Insights, and don't include a human byline.
There is real value in this type of reporting - a recent study [3] found that the articles we produce for less well-known publicly-traded companies has increased the trading volume for those companies. The idea is that, yes, the content is fairly formulaic, but there's now reporting on companies that had very little coverage before we existed. There are similar arguments for mass personalization work we've done for companies like Activision Yahoo - having prose that describes raw data (even if it is formulaic to an extent) is often better than not having prose.
[1] https://automatedinsights.com/blog/the-state-of-artificial-i...
[2] https://automatedinsights.com/blog/creating-great-automated-...
[3] https://insights.ap.org/industry-trends/study-news-automatio...
- Borealid 9 years ago
  
  I don't understand what value the prose provides over spending the same amount of effort producing clear, easy to read infographics.
  Instead of producing awkward and difficult-to-read English sentences, why not use the same content generator to produce completely accurate and easier to read dynamic data visualizations?
  - nickdavidhaynes 9 years ago
    
    If you do automated content well, it's not awkward and difficult to read ;)
    As far as visuals vs prose, I see it as "both-and" rather than "either-or". And in addition to our journalism and personalization work, we also integrate with interactive visualization tools like Tableau.
- nogbit 9 years ago
  
  Increased the trading volume? Seriously, you call that value? What are you smoking, that's called pump and dump and is illegal my friend, and if it isn't actually being dumped is downright sleazy car salesman to me. Was that recent study also automated.
  - consz 9 years ago
    
    Increased trading volume generally just means better price discovery. Why do you think increased trading volume means it's "pump and dump?" When I trade SPY, I increase the trading volume in the underlying S&P 500 components -- am I pumping and dumping then?
    
    notahacker 9 years ago
    
    Increased trading volume driven by bot-written blogspam produced by the company PR department with the express intent of pumping their share price definitely isn't "better price discovery"
    
    nickdavidhaynes 9 years ago
    
    It's a good idea to follow the links before accusing someone of illegal behavior.
    Here's the link to the study again: [1]. This is specifically in reference to the reporting on quarterly earnings reports that we automate for the Associated Press. It's an objective summary of the financial performance of these companies that appears in news outlets across the country (for example, [2,3,4,5,6]). The companies being reported on have no influence over the content of the articles.
    From the summary:
    >These articles synthesize information from firms’ press releases, analyst reports and stock performance, and are widely disseminated by major news outlets within hours of publication...This study found a positive effect between the public dissemination of objective information and market efficiency.
    [1] https://insights.ap.org/industry-trends/study-news-automatio...
    [2] http://www.thenewstribune.com/news/business/article158779784...
    [3] http://wtop.com/business-finance/2017/05/yum-beats-street-1q...
    [4] http://www.foxbusiness.com/markets/2017/04/27/dominos-sales-...
    [5] http://www.businessinsider.com/ap-fedex-beats-street-1q-fore...
    [6] https://www.usnews.com/news/business/articles/2017-04-26/her...
    
    notahacker 9 years ago
    
    Fair response and I withdraw and apologise for the implicit accusation against your company specifically. I'm sure you would agree less benign actors exist, which is why I'm reflexively sceptical of the idea that a link between more reporting and more trading volume is an indication of its merit.
    I guess if I'd taken the time to read your original link we could have a more interesting discussion on whether pretty basic earnings information in a format more friendly and available to non-professional investors was adding noise to the market or providing a useful counterweight to the amount of free publicity that more prominent companies' earnings get. But I've probably already poisoned the well on this one.
dheera 9 years ago

> "The corresponding low also paints a picture and suggests that the low is nothing but a 97.89% since 11/14/16." I can't gather any meaning from that statement, yet it has oddly specific details.
Maybe it was a horribly sleep-deprived person in Wall Street at 2am and made a cut-and-paste error while half-asleep.
semperdark 9 years ago

I've also noticed this on financial/stock news articles. I've seen a few of them use full 4-word names for corporations ("The Coca-Cola Company") dozens of times in one article, and multiple times per sentence.
- mingabunga 9 years ago
  
  That's where companies like Arria https://www.arria.com/ doing natural language generation step in so you can't tell the difference if it's machine or human generated. Well that's the promise.
- fjdlwlv 9 years ago
  
  Yes, finance and sports and weather and other metrics reporting is already highly automated.
  - lspress 9 years ago
    
    Automated Insights https://automatedinsights.com - writes all the routine corporate earnings stories for Associated Press so they can cover more companies and let journalists actually investigate and write more nuanced pieces. Some earnings reports start with the software writing and get augmented by human journalists.
- dom0 9 years ago
  
  These are indeed frequently generated automatically.
elihu 9 years ago

I'd guess that about a third to half of the "financial news" articles I see on finance.google.com are generated from some sort of template, with a script filling in the details. I'd love to see Google identify and remove these sites from their search results (and any sites that link to them), but I think they either don't think it's a high priority or they don't know how to solve the problem.
What will be creepy is if the auto-generated story algorithms get good enough that you can't tell what's written by a human and what isn't, there will no longer a human filter between what some powerful institution wants a news article to say and what makes it into print. Most journalists have a sense of journalistic ethics or at least a reputation to defend; an algorithm has neither of those.
- jgalt212 9 years ago
  
  Bloomberg has these articles as well. If a service that costs $20K a year is promoting these (and not removing), it's a safe bet "free" Google Finance will be showing more rather than less of these in the future.
akhilcacharya 9 years ago

This is definitely machine generated.
http://www.npr.org/sections/money/2015/05/20/406484294/an-np...
- DanBC 9 years ago
  
  That's a great example, because I much prefer the robot version of the Denny article to the human created version.
  - brudgers 9 years ago
    
    For me, neither was worth reading and since the machine generated version was shorter it was less not worth reading than the longer one...but the longer one contained the logical possibility of being worth reading whereas the machine written one was as good as it could possibly be.
    Or to put it another way, while it was not worth the human effort to write the story, it wasn't worth the CPU cycles to write the machine generated version either. The story was not worth writing or publishing or reading at all because nobody cares including the author (which is why a machine can write it).
    
    TheSpiceIsLife 9 years ago
    
    > The story was not worth writing or publishing or reading
    Of course, these are the wrong metrics.
    The correct metrics are: ad impressions vs. cost to generate content.
    
    anigbrowl 9 years ago
    
    That's only for the publishing side. The reader's utility calculus matters too, and I agree with the other poster that both stories are garbage.
    
    sametmax 9 years ago
    
    Ad impressions read by who ? Triggering which action ?
    
    brudgers 9 years ago
    
    In that vein and thinking about click-bots, I'd favor revenue over ad impressions.
  - justonepost 9 years ago
    
    They both had their advantages. The human one wasted words on being cutesy but had the Las Vegas analysis.
    
    DanBC 9 years ago
    
    You call it analysis, I call it bullshit.
    Doesn't every bit of research we have show how hopeless all this analysis is?
phreeza 9 years ago

https://duckduckgo.com/?q=%22When+you+combine+the+technical+...
Seems to be at least some sort of copy and paste going on...
edit: This is so bizarre, one of the sites has a section with editor "bios" but they read like some sort of very poor odesk/fiverr profiles, wouldn't be surprised if thats what they are...
https://nystocknews.com/our-staff/
- userbinator 9 years ago
  
  This line from those "bios" caught my attention:
  I’d affection to help you with your written work, altering and substance needs!
  ...because I mentally "autocorrected" the latter half to "and mind-altering substance needs"... Looks like they used a "thesauriser" on it. Not hard to see love->affection, editing->altering, and content->substance.
  Of course, if you are under the influence of a mind-altering substance, you would probably not notice anything wrong with that page. ...and unfortunately, so would many people who aren't.
  - phreeza 9 years ago
    
    Ha, and the "de-thesaurised" version of that phrase is indeed from this upwork profile!
    https://www.upwork.com/o/profiles/users/_~012388b8f7c8ed8aa2...
- dgacmu 9 years ago
  
  The whole thing is full of rehashed phrases:
  (link to google for "A deeper exploration of the setup is sure to yield a clear picture"):
  https://www.google.com/search?q=%22A+deeper+exploration+of+t...
  Craziness. Auto-generated soup to farm SEO?
Shivetya 9 years ago

I am curious though, will these systems have pen names? A simple name easy enough to recognize as machine written without the need for disclaimer? Could competition be more easily obtained between different companies based on which pen name attracts the most viewers?
the one concern I have is someone has to give ths system enough information to create a story and what prevents a fake news machine?
- andai 9 years ago
  
  I vote for them all to be called Writey McWriterson.
  - tripzilch 9 years ago
    
    My vote is Botty McBotface :)
ssivark 9 years ago

From the article (mistakes highlighted):
> Human news writers regularly point out that AIs tend to lack nuance and a _flare_ for language in the stories they churn out. That’s probably a _fare_ criticism [...]
Maybe they used speech-to-text transcription for this, given that the mistakes are homophones? It seems very unlikely that either a human typing this, or a computerized system would make these mistakes (if it learns word associations from a corpus).
PS: the article also claims to be human generated:
> This story was not generated by an AI, but to be fair, I haven’t had my coffee yet.
EDIT: Oops, I might have misunderstood which article you were referring to, since the reference was not placed next to "this".
- dozzie 9 years ago
  
  > It seems very unlikely that either a human typing this, or a computerized system would make these mistakes (if it learns word associations from a corpus).
  You underestimate people's ability to make language errors, including spelling ones. Every time I see somebody I suspect is a native English speaher using "it's" for "its", I grind my teeth. (Another instance is somebody using phrase like "as a programmer, the data bus should be written..." to mean "I, as a programmer, think that..."; this phrasing makes me simply furious.) With those errors they make reading my second language so much harder, and I can't even point their bad spelling or writing style out, because I'm seen as being nitpicking or something.
  - bshimmin 9 years ago
    
    and I can't even point their bad spelling or writing style out
    There is a certain delicious irony that you managed to contrive such a perfect example of a dangling preposition in the very next sentence after your complaint about a dangling modifier.
    Skitt's Law in effect once again!
    
    dozzie 9 years ago
    
    I believe you overlooked the "out" word at the end of the sentence ("[...] I can't even point [it] out"). Or am I mistaken and you meant something else? How should I have written the sentence?
    Remember that English is not my native language, and having an already established carreer, I don't have many opportunities to learn the grammar more. I'm bound to make errors and not even know about them, because there's nobody who would point them out.
    
    bshimmin 9 years ago
    
    It should be "I can't even point out their bad spelling or writing style". When the preposition gets separated from its object, it's referred to as "dangling". There's a famous (probably apocryphal) example where Winston Churchill humorously wrote, "This is a situation up with which I will not put." - the humour being, of course, that the arguably more grammatical phrasing sounds absurdly unidiomatic.
    If you cc me on all your work correspondence, I'll be happy to point out any grammatical errors I find (for a fee, obviously).
    
    dozzie 9 years ago
    
    OK, Wikipedia has a nice article about it. https://en.wikipedia.org/wiki/Preposition_stranding
    Though my sentence was grammatically and semantically correct, but the sentence I was complaining about was semantically invalid, so it was a little too much from you to point that out (stranding here fully intended).
    > If you cc me on all your work correspondence, I'll be happy to point out any grammatical errors I find (for a fee, obviously).
    A "nice" offer, but I'll pass. First, I'm not in a position to copy my work correspondence to a random dude from the internets. Second, my work correspondence is mainly in my native language.
    
    bshimmin 9 years ago
    
    Your sentence was unidiomatic to this native English speaker - "point" and "out" always belong together in a construct like that.
    My offer was a joke, but never mind.
    
    dozzie 9 years ago
    
    > Your sentence was unidiomatic to this native English speaker
    [emphasis mine]
    Fair enough, but I obviously picked up this manner somewhere, and from what I remember, it was long before internet got crowded by people from all over the world, so I think it was from some other native English speakers.
    > My offer was a joke, but never mind.
    Your joke went over my head so high that I recoiled from the superiority vibe I got from reading it as an actual offer. (I just wanted you to know that, now that it's clear what it was.)
    
    bshimmin 9 years ago
    
    I think we probably win the award for the most worthless exchange on HackerNews today. Well done us!
    
    andai 9 years ago
    
    Not with that attitude!
- saurik 9 years ago
  
  FWIW, I absolutely believe that a human would make those kinds of typos while typing... I myself didn't realize "flair for criticism" was spelled like that (and the top hit searching on Google for that, without quotes, is actually a book title using the other spelling, though it may very well have been a purposeful pun...). It would be one thing if those weren't themselves "correctly spelled words" (and so a text editor might catch it), but both "flare" and "fare" could easily slip by unnoticed. I will often even make much more interesting typos, where the word just "sounds sort of like the other word but no one would ever confuse the two", as I tend to speak to myself in my head as I type (and as I read) and I swear all language in my brain is at some point represented as audio... I'm not coming up with any examples right now, but trust me that when they come up they are incredibly strange.
  - TheSpiceIsLife 9 years ago
    
    Only slightly tangential, have you come across the term eggcorn[1] before?
    I always get a chuckle out of thinking about it.
    1. In linguistics, an eggcorn is an idiosyncratic substitution of a word or phrase for a word or words that sound similar or identical in the speaker's dialect (sometimes called oronyms). The new phrase introduces a meaning that is different from the original but plausible in the same context, such as "old-timers' disease" for "Alzheimer's disease". - https://en.wikipedia.org/wiki/Eggcorn
hellbanner 9 years ago

"AP's robot journalists are writing their own stories now"
https://www.theverge.com/2015/1/29/7939067/ap-journalism-aut...

jimrandomh 9 years ago

I would strongly prefer that robo-written news not exist, not appear in the results of any searches I make, and not appear in any feed that I read. It is pollution that makes real information and insight harder to find. Does anyone actually like this stuff?

Simon_says 9 years ago

I feel differently. I think there's a place for information that people would like to read as prose but the economics don't make sense to pay someone to write it. I feel like this is an advance that's increasing the wealth of the world because we're going to get more text while spending fewer man-hours on it.
And it's just going to get better over time. It's obvious now when something was written by a bot, but I doubt that's going to be true for much longer.
But, it should probably be labeled as such. Giving such text a human name as the author is indefensible.
- harperlee 9 years ago
  
  > Giving such text a human name as the author is indefensible.
  Why is it indefensible in your opinion? I think it is only fair that if I have a job of writing some output based on some inputs, and I define a process to do that that can be automated, the end product is still my labor, and I can sign it.
  - Simon_says 9 years ago
    
    That's a tough question, and now I'm not so sure I can defend my opinion. It just comes from the fact that (today) the output will not be as good as a human writer. It's disingenuous to pass off a low quality product as a higher quality product, when it will take my time and attention reading it to differentiate them based on quality. Think of it in terms of branding and counterfeiting.
    Later, after general AI, I will argue the opposite, that human-written articles should be labeled as such so I don't waste my time on them. ;)
tokyoSurferOP 9 years ago

This is actually making an opening for "human only" papers, that will be easier to distinguish from noise. Non-profit independent, governing body is a must though to issue certification.
- rahimnathwani 9 years ago
  
  Like craft beer or artesanal cheese?
andai 9 years ago

I expect that a couple years from now, this viewpoint (and most of this thread!) will be seen as really offensive.

_xnmw 9 years ago

Color me extremely skeptical that current-generation AI can ever write decent quality news articles, even on the "easiest" subjects (i.e. non-emotional topics that may be most amenable to computerization). Sure, an AI might be able to produce the type of contradictory and fundamentally meaningless strings-of-words that characterize Trumpian speech, but even that would lack a unified agenda behind it. Even Trump seems to appeal to some raw masculine #MAGA emotion pretty consistently, but I doubt an AI would be able to do even that. If an AI could produce decent news articles with consistently meaningful statements, I think it would have huge implications for linguistic theory. Currently, we have no way of representing abstract semantic meaning in a computer, for example when I say the word "justice", you all know what I'm talking about because you all have embodied experiences of "justice" (or its opposite) in your personal lives (e.g. bullying). An AI simply never had access to this kind of embodied, experiential input. It only has access to patterns of strings that we humans produce. And so why should we expect an AI to be able to ever produce the same output that we produce when it has access to less input? Sure, one might argue that news articles are not novels, and so require a lower threshold of understanding to produce. I don't think so. We tend to underestimate the embodied nature of even the most basic use of language[0].

[0] See The Embodied Mind by Varela, Thompson, and Rosch.

gumby 9 years ago

The irony is that the I how radio news got its start. Ronald Reagan became an actor after being a "sports announcer" -- what he really did was read the ticker tape of a game in progress ("smith at plate. first ball strike no swing. Second ball base hit") and create an exciting story to go with it "And smith steps up to the plate. He flexes his muscles, kicks the ground and takes his stance. He passes on the first ball....strike! Here's the next pitch...he swings...solid towards third base. Is it a foul? NO!! AND HE'S SAFE ON FIRST BASE!!!"

Really most "news" articles are only a couple of paragraphs long anyway and could be expanded or contracted on the spot to match the interest of the reader.

adorable 9 years ago

What would those article-writing robots use as their primary source of information?

If they write local news, will they use social media as their datasource? Other sources?

ams6110 9 years ago

Don't most reporters start out with obscure/niche stories so they can hone their writing styles on relatively "unimportant" or filler pieces? If machines do all of that work, how do reporters develop the experience to be able to write an organized, in-depth important story?

notyourwork 9 years ago

I don't think reporters exist in this scenario.

downandout 9 years ago

Ironically, the Adsense "valuable inventory" policy prohibits showing Google ads on automaitcally generated content [1]. I wonder if they will follow their own rules and refuse to show ads on content generated by this tool.

[1] https://support.google.com/adsense/answer/1346295?hl=en

usmannk 9 years ago

Adsense policy: "Examples of unacceptable pages include ... Automatically generated content without manual review or curation"
Article: "People will be involved in the curation and editing of the stories"

methodin 9 years ago

Random thoughts:

  * Facts delivered with arbitrary fluff words is pointless even when written by a human - it obfuscates the
    real purpose which is the data
  * Companies pay humans to deliver articles in most cases and the bias of the writer or the institution
    that paid for it shines through. I cannot find a real difference between intentional angling by
    payment or by algorithm
  * When the day arises where computers could generate actually new, intelligent and thoughtful pieces I
    for one will be very interested in reading them. Sadly there would be millions of variations that could
    occur at an astounding pace. We'd then need algorithms to filter the generated content for the things
    that are really noteworthy.
  * News at its core is a sequence of facts which begs the question if we really need the cruft around
    those facts which can often lead to misinterpretation?

drawkbox 9 years ago

True but think about this though, Hunter S. Thompson started out as a sports writer. Sports is a big area where this is used currently. Gonzo journalism wouldn't have even been a thing if it was all robo-writing at the time.
I think it comes down to flavor/style. Even in food for instance, yes a robot can make a meal but a chef can make a dish (maybe later reproduced by robots but still). There will always be a need for style, which is really hard to automate.
- lspress 9 years ago
  
  Unless the writer with style is the one training the AI...

kronos29296 9 years ago

What guarantee is there that the published news isn't fake? This might start something like viral fake facebook posts. We already have enough of those. Now we have automated fake news generator where you post your own fake news for free.

This is what it will become one day. Hope they have something to stop it.

tokyoSurferOP 9 years ago

I am more afraid of options that this tool would have. Like fine tune it to be 5% friendlier on candidate 1 than 2. Report news on XXX 10% less than on news YYY. Make people happy. Make people sad. Make people compelling. All this by adjusting few options and changing the wording of articles.
- heartbreak 9 years ago
  
  But media companies could already do this with human writers if they wanted to. Maybe they do?
  - tokyoSurferOP 9 years ago
    
    It is much easier to fine tune few variables than fine tune human writing. Thus human writing will be easy to determine which side it is siding to.
untog 9 years ago

I don't really see what this has to do with fake news. Traditionally produced news stories can be fake, and algorithmically generated stories can be fake (when fed bad data).
The only solution to take news is news organisations that people can trust. Historically, local news organisations have always been the most trusted, but their income has been absolutely decimated in the last decade or so. This feels like a desperate cost cutting measure, not something that will help the overall problem.
- kronos29296 9 years ago
  
  This decreases the barrier to entry. Sometimes that's all it takes to encourage the wrong things. The overall system is failing because it cannot adapt to the new internet age. Hopefully something changes and we get reliable news.
  - illumin8 9 years ago
    
    Ability to automate creation of news = ability to automate creation of propaganda = ability to automate censorship of news.
minikites 9 years ago

Why would this even be on Google's radar? They make money either way.

DanBC 9 years ago

There are 2 things I hope google or other AI companies focus on.

1) Making board papers more readable. There's a bunch of trusts in the NHS who have a stream of very complex board papers. Something to reduce un-needed complexity would save a lot of time and potentially money.

2) Converting all important documents to an Easy Read version. There are a bunch of writing styles for people with learning disability, low IQ, or low literacy. Easy Read is one. A company like Google focusing on this would be good because they'd improve the evidence base; they'd bring a bit more standardisation; and they'd improve access to information for many people.

mc32 9 years ago

At least in the near future, this has the potential to make facts-and-figures based news less biased (less influenced by author idiosyncrasies). Personally, I would rather news not be laden with personal flourishes that authors add either as filler or due to personal opinion.

I do imagine further into the future, the automated systems will be "improved" with tone and bias to better fit the tastes of the individual reader, to the detriment of us writ large.

nkozyra 9 years ago

Presentation of facts alone doesn't really preclude bias, though. Certainly all journalists carry bias - and styleguides and journalistic standards are intended to mitigate this - but any software intended to encapsulate some facts into some allotted space will also carry bias. Source/quote selection and omission will always lend some bias and news by nature has a finite space. Print news in particular imposes bias through strict space requirements, driven editorially. And then there's the weight of the sources quoted, defining a "side" or "angle" to a story and making sure there's balance in opposing voices, etc.
I think a lot of people see bias as overt when it can be quite negligible and minor. But then they also often conflate news commentary with news. It's a pretty blurred line.
That said, local news (politics, business, crime) tends to skew less toward prescribed narrative and more toward facts and points because it's often very dry.
- mc32 9 years ago
  
  Understood. Yet, this would still be an improvement over newsstories which try to read insight into something where what they do's more or less projection and speculation without saying so much.
  "Amazon buys WF". vs "Jeff Bezos buys WF so you never have to talk to a cashier"
  Or, "Physician runs over pedestrian" vs. "Physician accused of insurance fraud runs over pedestrian"
  - Dylan16807 9 years ago
    
    I'm confused. Your first example seems to be adding pointless false commentary, while the second is adding real information about the person.
    
    mc32 9 years ago
    
    Right, but that information is irrelevant to the cause, it's there to color opinion. Accused (not even proven) fraudster, therefore adds to possibility of fault thru negative association.
    
    Dylan16807 9 years ago
    
    If it's the most notable bit of public info about the person, go for it. If they're cherrypicking out of dozens of factoids to make them sounds bad, then it's a problem. The same headline could be fair or unfair depending on how it came about.
- chrischen 9 years ago
  
  Also choosing what to report or what not to report—even as consumers choosing what news to read—introduces bias into the system.
  - mc32 9 years ago
    
    I would think that an automated system would have less ad-hoc bias.
    It would report on whatever triggers newsworthiness[1] rather than whatever gains the attention of a reporter (which would have fewer if any stated minimums or maximum requirements).
    Concretely speaking it's the difference between human-curated search engines and automated search engines like Google.
    [1] Since such a system would require thought and design, there would be less impulsive influence.

tannhaeuser 9 years ago

What about developing a counter-bot that detects and flags algorithmic content?

Edit: come to think about it, isn't it what Google should be rather doing?

notahacker 9 years ago

Yeah. Google has spent years taking the position that extremely thin content generated at a massive scale involving no genuine research or insight is spam that should be penalised. Now it's writing software to do just this.
- lspress 9 years ago
  
  I think this may be a misconception on Google's stance on content generated at a massive scale. They dock you for text spinner type content, stuff with little variation, nonsense phrasing, and identical sentences pieced together in different articles from the same source. Content generated at massive scales that are actually useful and disseminated from multiple publishers (like local news sources) are what they don't punish.
- askvictor 9 years ago
  
  Maybe they're doing it to train their spam story detection algorithms?

speeq 9 years ago

I recently found a YouTube channel with news videos that seem generated mostly programatically with a robot voice over and a combined +44M views on the channel:

https://www.youtube.com/channel/UCzhc-N5YynO_shpHhzP2zuw/vid...

I wonder who's behind these and similar channels.

endswapper 9 years ago

This submission is at least tangentially relevant: https://news.ycombinator.com/item?id=14673489.

Combining these presents an interesting opportunity to create "future news" (news that is technically fake until it isn't) thereby owning the news cycle by always being first.

PaulHoule 9 years ago

About 15 years ago I had the good luck of covering a news story before it happened and got a truly tremendous amount of traffic as a result.
- slig 9 years ago
  
  Can you tell us more about it?
  - PaulHoule 9 years ago
    
    Sure.
    There was a series of anti-war protests at a mall around Albany NY in the run up to the second gulf war.
    I had been covering the story on a regular basis, but it had gotten very little attention because the media had decided that anti-war protests were not newsworthy.
    Well, finally at one of these protests they arrest a judge and that is newsworthy. People start looking for names and places and these match the story I'd written at the last protest (where there had not been any arrests.)
    
    fjdlwlv 9 years ago
    
    So you didn't write about a future event
    
    catshirt 9 years ago
    
    They said they covered a news story before "it" happened.
    If we want to take a considerate interpretation- they wrote about the "news" before the "real news" decided it was "news".
    
    PaulHoule 9 years ago
    
    No, a similar set of events happened repeatedly. A group of protesters assembled at a mall, just the last time, the judge got arrested.
    Prior to that the exact same judge would make some statement, as would other people. People searching for the story about the arrest would get a story about a previous protest without an arrest and find that everything, except the arrest itself, was pretty much the same.
akhilcacharya 9 years ago

I saw a very interesting documentary [0] on just that thesis.
[0] http://www.imdb.com/title/tt0120347/

akadien 9 years ago

Google is the problem. I thought they didn't want to be evil.

tyingq 9 years ago

Somewhat ironic as Google has been fighting link spammers that use autogenerated content for years. Software like this is popular in that space: https://wordai.com

andy_ppp 9 years ago

I think the disgust factor will go away in a few years (maybe less) when most content is written by machines with slanting that the models say you will enjoy. Or that will cause you to spend money. Or click ads.

You think you won't succumb to their influence now, but it'll happen and there will even be "journalists" who are machines that you like. The filter bubble will completely adapt to your every need to make you feel fantastic about reading their copy, humans won't be able to compete.

babyrainbow 9 years ago

I am not sure. Do you read news for entertainment, or do you read it for truth?
- andy_ppp 9 years ago
  
  The algorithm will adapt to your every need.

fiatjaf 9 years ago

Why? This is horrible. Why not just publish the raw data reporters got?

kevinphy 9 years ago

A relevant and inspiring project with the statement:

"Only Robot Can Free Information"

https://medium.com/rosenbridge/only-robot-can-free-informati...

Focusing on building robot for reader instead of news provider would be the future.

divbit 9 years ago

For the reporters friends I have, not sure how I feel about that - if I was a reporter, I feel I would want some software which enhances and improves my job experience and reporting ability, rather than flat out replaces it. (Not to criticize google, I'm sure any company startup, could be doing the same).

cwp 9 years ago

Ugh. The last thing the world needs is more formulaic news stories. We need to move past the idea that the web is a virtual newspaper.

News sites don't even use hyperlinks effectively, let alone audio/video/interaction. We should use AI to replace newspapers, not reporters.

calafrax 9 years ago

I was just thinking that journalism needed less on the ground reporting and critical thought.

More mindless aggregation and repeating of existing data custom tailored to the views of the people reading it is really whats missing.

zzalpha 9 years ago

Having just finished a play-through of Deus Ex: Mankind Divided, this immediately makes me think of Eliza Kassan... it really is odd how many ideas in that game don't seem especially far-fetched these days.

mnglkhn2 9 years ago

The question is: How are those news items going to be named: Robo news?

Or maybe "fake news", until 'elevated' by Google curators?

Maybe Microsoft's Ai bot experiment might offer a cautionary tale.

mingabunga 9 years ago

It's not actually Google doing it, they're just funding it. Here's a better article https://www.recode.net/2017/7/7/15937436/google-news-media-r...

Kenji 9 years ago

Without machines acquiring true understanding of what is happening, this is going nowhere. I applaud their effort but it is misguided.

em3rgent0rdr 9 years ago

We still haven't solved concerns about computers controlling our news feed, much less writing our news...

chrismealy 9 years ago

IIRC there was a small town paper in the early in 1990s that wrote high school sports stories with a HyperCard stack.

nkozyra 9 years ago

Print journalism is pretty formulaic but one of the bigger challenges is finding and interviewing sources. Most news organizations have requirements on the minimum # per story and who those people should be. It's laborious.
Sports, on the other hand, can be presented as a narrative of pre-defined, linear events. Those and crime stories represent probable the easiest form to automate. Pro and college typically warrant some quotes but so often preps are written by a stringer who just details the game.

velobro 9 years ago

Good! I'm sure a bot is a lot better writer than the high school graduates my local paper employs.

mtgx 9 years ago

So what happens when these bots are manipulated into writing fake news (in the same way the way better funded Google search is still manipulated for SEO purposes) ?

owebmaster 9 years ago

Google makes a lot of money
maybeiambatman 9 years ago

Then they are just as good/bad as bloggers today :)

apeacox 9 years ago

Welcome to the Ministry of Truth

reallydattrue 9 years ago

Very Relevant: https://www.youtube.com/watch?v=K2Ut5GqQ1f4

Google will one day be the arbitrators of news. If it doesn't fit in their world view, whether it's true or not. Will be removed from the results.

I think now is the time to setup a different model and remove their monopoly. Internet freedoms are at stake here.

Do no evil? Yeah right.

elihu 9 years ago

Google doesn't have any good options here. They can either ignore the problem of fake news, or they can become the arbiters of what is or is not fake news. If the latter, the boundary between filtering out misinformation and actively manipulating public opinion is rather fuzzy.
As a news consuming public, our best option seems to be to not use Google as our primary news filter. Long term, we probably need an entirely different kind of news aggregator that isn't under the control of any single entity as you're suggesting, but I'm not sure what that would look like and how it would work.
- mjevans 9 years ago
  
  Imagine if various organizations you cared about had a feed (RSS or something) of links to stories and/or important search terms.
  Your reader polls those feeds and uses some weighted algorithm to produce a set of custom news, possibly by consulting a public index like Google (or even a news outlet directly, redirect in to it with the search terms of choice).
  It's still an echo chamber, but if you've got fake news pushing fiends on the list of sources you trust you've already got problems.
org3432 9 years ago

Also relevant: https://www.youtube.com/watch?v=GvtNyOzGogc
Sinclair Broadcast Group is doing basically this already, just in a low tech way and requiring its local TV stations to promoting their political agenda.
OscarTheGrinch 9 years ago

I disagree.
There has always been a definable difference between fact and fiction. Both have a place on the net but where fiction masquerades as fact with the intent to decieve, we have a duty to use all the tools at our disposal to destroy that ruse and choose more factual sources on which to base our decisions.
I for one welcome our new news overlords.
- briholt 9 years ago
  
  This is a naive viewpoint I see pop up a lot. People pushing "fiction" think it's a "fact" and there's no full-proof way to convince them otherwise or even perfectly distinguish the two. People will tell you that it's a fact that vaccines cause autism or Iraq has WMDs or America is a white supremacist country. In reality, "facts" are linguistic simplifications of reality that inherently omit information and the distinction of "fact" from "fiction" is itself a simplistic way to attempt to describe the accuracy of a statement. On a personal note, the people most convinced that they are pushing facts are the ones I'm most skeptical of.
  - OscarTheGrinch 9 years ago
    
    To insist that all points of view are valid, and that truth is somehow unknowable is a viewpoint very common in academia.
    Using the scientific method, meta-studies and suchlike our species has built an enormous corpus of knowledge, I'm happy to let that evolving concensus be the basis of our decisions.
    For example the fact that vaccines do a more good than harm by an order of magnitude is no longer up for debate. There is a lunatic fringe who disagree, their unfounded fiction should not be shown to curious first time parents on google.
    To insist that everyone can pick and choose reality and that we should all be wary of "facts" is to cause real harm.
    
    briholt 9 years ago
    
    You're bringing up the other extreme (postmodernism), which I wouldn't agree with either. I'm arguing against certainty. We are stumbling along trying to communicate information as accurately as we can in order to produce predicted outcomes we deem desirable. Hopefully the evidence on an issue becomes so overwhelming that we can broadly reach a reasonable consensus. However, that consensus is subjective and dynamic. There is no objective process that will perfectly delineate "fact" from "fiction" as our Google News overlords would have you believe.
- ivanbakel 9 years ago
  
  What makes you believe that Google will destroy this deceptive content?
mtgx 9 years ago

Even more worrisome would be that the more they go down this path, the more governments will come to them to censor "offensive" stuff.
Same with advertisers. How long until the recent "advertiser-friendly" policies, which have been implemented for Youtube and stop monetization for any Youtuber that might offend an advertiser in any way, will be implemented for news, too?
sergiotapia 9 years ago

Picus News - The global leader in fair, unbiased, and impartial reporting.
https://www.youtube.com/watch?v=d9OrmOAuuWc
- dom0 9 years ago
  
  We also do security, surveillance and operate 100.5 % of the satellites around earth. (Looks familiar yet?)
justonepost 9 years ago

I read Apple news exclusively. It's pretty awesome and hopefully will soon support micropayments.
4ad 9 years ago

One day? That day is now.

Settings

Google is funding the creation of software that writes local news stories

Keyboard Shortcuts