Hunting for Mexico’s mass graves with machine learning

8 min read Original article ↗

As many as 300,000 victims of Mexico’s drug war could be hidden in fosas clandestinas.

Over the last decade, Mexican drug cartels have been fighting each other—and corrupt police and military units—for control of the lucrative drug trade, plunging the country into chaos. Outsiders might think of Mexico as sunny and tequila-soaked, but beyond the beach resorts of Cancun and Mazatlan there hides a grimmer tale: levels of murder, rape, and kidnapping are hitting levels rarely seen outside hotspots in Africa, Asia, and South America.

So grim the tale, when 43 college students went missing in Mexico’s southern state of Guerrero in 2014, investigators found 129 other bodies in 60 fosas clandestinas (mass graves) before stumbling on badly burned remains in a mass grave they think might—possibly, maybe—contain what’s left of the missing students. Mexico’s attorney general says the local mayor conspired with the town’s police force to abduct the students and turn them over to a local gang, who murdered them and burned the bodies, and dumped the charred corpses into a river.

And just a few weeks ago, in March, 2017, Mexican press reported that more than 250 human skulls have been found so far—excavations are not complete—in a mass grave in the state of Veracruz.

The situation is so bad that, after six decades of gains, the average life expectancy in Mexico has decreased, according to recent research.

Now one group of human rights researchers believe they have a high-tech way of discovering mass graves: machine learning. They hope that finding many more burial sites with clever software will expose the true cost of the conflict, help grieving family members find peace, and perhaps stymie the tide of desaparecidos—the disappeared.

Murder and mayhem in Mexico

Jorge Ruiz Reyes is a grave hunter, a bespectacled, bearded human rights researcher at the Universidad Iberoamericana in Mexico City who radiates intensity. He speaks in rapid-fire Spanish. More than 30,000 people are missing in Mexico, he says, many dead and buried in mass graves. He wants to find those graves.

“The war on drugs has resulted in a high level of murders and disappearances, and other violations of human rights in Mexico,” he tells Ars. “The official numbers don’t tell the whole story.”

Official numbers are a conflicting hodge-podge of what many consider to be under-reported horrors. According to the attorney general’s office in Mexico, 662 bodies were found in 201 hidden graves between 2005 and 2015. Mexico’s armed forces counted 534 bodies in 246 graves between 2001 and 2015.

But, Ruiz Reyes says, the Mexican press frequently report on cases not included in the official numbers, including the 43 missing college students, and the mass grave found recently in Veracruz.

Officially more than 30,000 people are missing in Mexico. That tally is mostly from people who went missing last decade, Ruiz Reyes says, and only includes people whose families have filed a missing person report—which, given the high level of violence in the country, many are afraid to do. One Mexican NGO disputes the official tally, though, saying the number could be as high as 300,000.

“This is what’s happening in Mexico,” Ruiz Reyes says. “The number of disappeared is very high.”

It’s impossible to know how many of those missing people lie undiscovered in a mass grave, he says. That’s why, with the help of the Human Rights Data Analysis Group (HRDAG) in San Francisco, he’s using a machine learning model to predict where grieving family members are most likely to find the bodies of their loved ones.

Grave hunting, meet machine learning

Patrick Ball, director of research at HRDAG, working with Ruiz Reyes and other researchers at the Universidad Iberoamericana and Data Cívica in Mexico City, analysed data from Mexico’s 2,547 municipios (“counties”), and built a machine learning model that predicts where similar graves are likely to be found.

“If we know which counties have some discovered mass graves, we can predict which other counties have mass graves that are similar,” he tells Ars.

The model uses obvious predictor variables, Ball says, such as whether or not a drug lab has been busted in that county, or if the county borders the United States, or the ocean, but also includes less-obvious predictor variables such as the percentage of the county that is mountainous, the presence of highways, and the academic results of primary and secondary school students in the county.

Using records from 2013, Ball mixed together data from counties with observed graves, and from counties researchers were confident did not have graves, and then randomly divided the resulting set into two groups. He used one half of the data to train a random forest algorithm, he says, then used the resulting model to see if it could predict the second half of the data.

The random forest method uses random subsets of the training data to create a “forest” of decision “trees,” and then averages the predictions of the individual trees. The algorithm takes the predictor variables for each county and assigns them a numerical score—the likelihood that the county contains a mass grave similar to those previously seen—much like a spam filter.

“How good a job does the model do at predicting the data that it’s never seen before?” Ball asks. “Our answer is it perfectly predicted the counties that have graves and those that didn’t. It had no false positives and no false negatives.”

Alarmed at such perfect results, Ball says he repeated the experiment 500 times, each time splitting the data randomly into testing and training data.

The machine learning model perfectly predicted the testing data every time, he says. “So the result is not an accident of how the testing and training data were split, but it still might be an artifact of how really really different the kinds of counties are.”

Ball repeated the experiment using 2014 data, and got the same results. “There’s a really big difference between the counties that have have graves and the ones that don’t,” he says. “It’s easy for the model to disambiguate them.”

“This is,” he admits, “not a huge surprise to people who are looking at patterns of violence in Mexico.”

The real value of the machine learning model is not just confirming existing intuition among grave hunters, he says, but guiding the search strategy by predicting which counties are most likely to have hidden graves.

“It would be interesting to know which of those counties are most like the counties that have graves—’like’ in this complicated statistical sense,” Ball explains. “This would essentially provide you with a ranking of 2200 counties in terms of the ones most like the ones where we’ve found mass graves.”

Some caveats

The machine learning model can’t predict graves that are not ‘like’ ones it has seen before. So far, most hidden graves in Mexico are barely hidden at all, a handful of corpses dumped in a shallow grave. The model is not capable of unearthing graves hidden by more sophisticated perpetrators, Ball points out, comparing the Mexico case to his time investigating war crimes in Serbia.

“The Serbian army after Kosovo moved the bodies,” he says. “They put the victims’ bodies on big trucks and moved them a long way from where the killing happened, and buried them in places that the public does not have access to, army bases and so forth.”

“This model is not going to find graves that are created in this sort of hypothetically sophisticated way,” he adds. “We’ve never seen graves that sophisticated in Mexico.”

“The outlook is not good”

Ruiz Reyes is eager to apply the machine learning model to 2016 and even early 2017 data. The murder rate in Mexico in the first quarter of 2017 is the worst it’s ever been, he explains, and there is no shortage of corpses needing to be found, or relatives to mourn the remains.

“The outlook is not good,” he says.

But he hopes that finding where the bodies are buried, and exposing the horrors committed in the name of the war on drugs, will help end the cycle of violence afflicting Mexico.

“Around the country, groups of family members are using whatever resources they have to find their missing loved ones,” he says. “The predictive strength of the machine learning model is very strong.”

Grave hunters—forensic anthropologists, law enforcement, even church groups—scour the countryside looking for clues. Machine learning gives them one more tool to help find the desaparecidos, and so let the families of the victims find peace.

“The model gives us a high probability of where a good place might be to start the search,” Ruiz Reyes says.

Now read about how the NSA’s SKYNET program might have killed thousands of innocent people with machine learning…

J.M. Porup Contributing Security Reporter

J.M. Porup is a Contributing Security Reporter who lives in Toronto, but bounces back and forth to work with the Berkman Klein Assembly at Harvard University. When he dies his epitaph will read simply, "Assume breach." Follow him on Twitter at @toholdaquill.

4 Comments

  1. Listing image for first story in Most Read: “Yo what?” LimeWire re-emerges in online rush to share pulled “60 Minutes” segment