Automating the detection of fake news assumes we humans know how to distinguish real from fake in the first place, thus training machine learning models with accurately labeled examples. That’s not a great assumption, especially if we use the perceived credibility of the source to decide what’s real and what isn’t. Many fake news stories are repeated widely by so-called “credible” news outlets, leading to the fatal flaw in most approaches to fake news discovery. These stories are widely believed not necessarily because they are particularly convincing or supported by hard evidence; they are believed simply because the stories come from “trustworthy” sources. That trust is unfounded and unearned across the board.
Let’s first define what we mean by “fake news”. Fake news refers to stories where the core elements are completely made up. Fake news is fiction presented as non-fiction. It doesn’t matter who made it up, why they made it up, or how they made it up. It doesn’t matter whether we like the stories or not, or whether the goal of the deception is morally good or bad. It also doesn’t matter who in the supply chain of news production and distribution believes it’s true. What matters, in this context, is that it’s fake.
If we instead take a misguided approach by defining fake news as “news that doesn’t come from a credible source”, we’ll build models based on data that rely much too heavily on the source rather than the information itself and the supportive evidence – real evidence, not just “verification” from other “credible” sources or authorities. Repeating a fake story a thousand times doesn’t make it true.
Fake news finds its way into traditional media in a variety of ways, serving many different agendas. They can be news stories made up by ambitious or lazy reporters, satire presented as real, state-sponsored psychological operations, public relations stunts, or just elaborate hoaxes perpetrated to show how easy it is to fool the media and large segments of the public.
Here are some examples of fake news from traditional outlets:
In 1980, Janet Cooke of the Washington Post wrote a completely made up story about an eight-year-old heroin addict for which she won the Pulitzer Prize. From the Wikipedia article:
“In a September 28, 1980, article in the Post, titled ‘Jimmy’s World’, Cooke wrote a profile of the life of an eight-year-old heroin addict. … Marion Barry, then mayor of Washington, D.C… organized an all-out police search for the boy, which was unsuccessful and led to claims that the story was fraudulent. Barry, responding to public pressure, lied and claimed that Jimmy was known to the city and receiving treatment; Jimmy was announced dead shortly after. Although some within the Post doubted the story’s veracity, the paper defended it and assistant managing editor Bob Woodward submitted the story for the Pulitzer Prize. Cooke was awarded the Pulitzer Prize for Feature Writing on April 13, 1981.”
Leading up to the 1991 Gulf War, Pentagon officials likely made up a story that Iraq was massing troops on the Saudi border, preparing to invade. This claim was repeated by politicians and virtually every major news outlet as absolute, unquestioned fact, leading to Operation Desert Shield and ultimately the first Gulf War. Only one mass media reporter decided to check the claim. To this day, no evidence has been provided that such a threatening troop build-up had taken place.
During the same time frame, the public relations firm Hill and Knowlton worked with the Kuwaiti Government to promote a fake story about Iraqi soldiers throwing babies out of incubators to die on the cold hospital floor. The “nurse” who tearfully conveyed the story to congress turned out to be the daughter of the Kuwaiti ambassador to the US.
In the 1990’s, Stephen Glass wrote dozens of fabricated stories published in the prestigious magazine The New Republic. He made up companies, names, events – anything he needed for the story. “Fact checkers” checked the “facts” by comparing Glass’s stories to his own notes. Unsurprisingly, they matched, so the stories were deemed true and fit for publication.
In 2003, The New York Times published an article outlining the “long trail of deception” left by Jayson Blair in their own newspaper. The article admits “He fabricated comments. He concocted scenes… He selected details from photographs to create the impression he had been somewhere or seen someone, when he had not.”
During the peak of the US conflict with ISIS, several sensational and frightening news stories were found to be completely bogus, many of which were quietly retracted. The fake articles were published in a wide range of major media outlets.
There are many more examples like these. I chose these examples because of their serious implications and because there is little or no dispute about their fakeness.
Identifying fake stories like these is just one step in removing human bias from detection models. After all, it would be fair to ask me why I believe the above stories are fake. It’s at least in part because they were acknowledged as fake by the very media outlets that are clearly not worthy of total trust, so even their retractions and “debunking” must be questioned.
But assuming they are fake – and they are – how many more stories like these are out there? Does the discovery of these examples prove that fake news in major outlets is rare and always eventually exposed? Or does the boldness of these lies hint that these may just be the proverbial tip of the iceberg? And how many stories from questionable news outlets have important true elements (mixed with unproven assertions) that are shockingly under-reported in traditional media?
Let’s seek to train our AI models properly and find out. We can start with training sets that include fake stories from “real” outlets and real stories from “fake” outlets. Ultimately, models need to be much more sophisticated – somehow learning to evaluate evidence based on the strength of the evidence itself.
When we let these kinds of well-trained models loose on all media, what will we discover? We might be very surprised.