Gonca Tokyol
11 min readDec 3, 2020

--

Non-existent zeros: Mystery of Turkish Ministry’s coronavirus statistics

Turkey introduced a nationwide weekend curfew from 8 p.m. to 10 a.m. as an effort to curb the increasing number of COVID-19 patients across the country. (Photo by Osman Örsal)

“Not every case is a patient,” Turkish Health Minister Fahrettin Koca said on Sep. 30, revealing to the nation that the government had, in fact, been reporting merely the number of symptomatic COVID-19 patients since July 29.

A month after this initial revelation, for the first time in five months, Minister Koca once again announced the number of all COVID-19 cases in Turkey on Nov. 25.

The Health Ministry reported 193 deaths from COVID-19 in the past 24 hours, and 31,923 diagnoses were made as a result of 183.624 tests conducted.

A host of critics, including the country’s leading medical professional organization Turkish Medical Association (TTB), opposition parties, and local politicians, claim that the state’s numbers are an under-representation of the reality of the pandemic. At the same time, Turkey’s total death toll more than 14,000.

Data analyst Nick Brown, a resident of Spain’s Mallorca island, says that the Turkish Health Ministry’s data has been inconsistent since the start of the pandemic. The expert believes that Ankara’s data display a scene that’s an unlikely result from natural patterns unless the ministry can provide a credible explanation for the occurrence.

Brown’s analysis, methodologically consistent with global laws of statistics, are highly supportive of the suspicion that Ankara had been downplaying the numbers of deaths and cases nationwide according to Professor Cem Başlevent, an academic from the economics department of Bilgi University.

The Health Ministry refused to make comments about the claims, and ministry officials turned down questions about the subject on the grounds that Minister Koca is the only one who is authorized to comment on the coronavirus subject.

In previous statements, Minister Koca had said that all the necessary data validation had been conducted as official reports were made of the numbers and added that “everyone should be making an effort to carry out their own part and focus on the results.”

Turkey’s COVID-19 numbers never-ending in zeros

Brown’s interest in Turkey’s state data was a result of a Twitter user’s recommendation after the data analyst said something about validating countries’ COVID-19 numbers by checking if the data is statistically sound.

The first thing Brown noticed was a technical inconsistency in the reporting of the share of patients with pneumonia: The separator between the ministry’s data alternated between commas and periods.

This is a common style error resulting from the fact that a period is the decimal separator in English, but Turkish people use a comma for that purpose. Confusion of the two is common in bilingual individuals, but data software on a computer would have corrected the error, Brown noted.

“The Health Ministry’s alternation between a comma and a period could be an indication that the official numbers come from a hand-written document, rather than a digital database,” he says.

Turkey closed schools for the remainder of the semester as an effort to curb the increasing number of COVID-19 patients across the country. (Photo by Osman Orsal)

Another point that piqued Brown’s curiosity was the fact that very few of the numbers released by the Health Ministry each day ended in zeros. Before the publishing of this article, the last time the official number of cases ended in zero was on June 5, and the number of tests conducted ended in a zero for the last time on Sep. 29. The number of patients who died hasn’t ended in a zero since Aug. 26 and the number of recovered patients since Nov. 13.

Between September and November, only four out of 364 daily data revealed by the ministry in those four categories have ended in a zero. Three of them were the number of recovered patients and one the number of tests conducted.

After the publication of this article, which the Ministry refused to comment on, two of the four daily data published on the 1st of December was ending with zeros.

Source: The Health Ministry

Indicating that the last digit of the number that gives the result of tens of thousands of tests should be arbitrary, in his analysis, Brown calculates the randomness of the numbers in the Ministry of Health’s data set with the help of the Chi-square test based on Benford’s Law. His analysis revealed that the Health Ministry’s numbers didn’t display a uniform distribution.

Benford’s Law also called the “first-digit law,” says that naturally occurring sets of numbers start with lower numbers more often than they start with higher numbers. The number “1” is 30.6 percent likely to emerge as the first digit in a set of naturally occurring data, while the number “9” is only likely to appear as a first digit 4.7 percent of the time.

Upheld by naturally occurring phenomena like rainfall and populations, Benford’s Law is more often referred to in relation to first digits, but it also makes predictions about last digits and can be useful in detecting inconsistencies in many areas of daily life: While Australia employs Benford’s Law to validate customs declarations in anti-smuggling efforts, Ukraine implemented it as an election safety method to authenticate ballots.

Benford’s Law holds that second digits ‘usually’ display a random distribution among numbers, while the third, fourth, and increasing digits are always distributed randomly. Although the possibility of each number being in digits three and on are close to one another, 0, 1, 2, and 3 are more likely to be found in these digits than 6, 7, 8, and 9.

The Health Ministry’s data, on the other hand, rarely displays numbers ending in 0 and often show death tolls ending in 7, 8, and 9, when Benford’s Law mandates that these be distributed randomly.

The chi-squared test that Brown used to validate Health Ministry data compares a dataset’s theoretical distribution to its empirical distribution, measuring their compatibility and, as a result, the randomness of the actual occurrence. A chi-squared value above 20 ora p-value below 0.01 indicates an unusual distribution, Brown said, adding that both these thresholds were crossed by Turkey’s official data.

While the number of deaths each day displayed the least anomalies, the fact that the inconsistencies appeared across four different categories indicates the data wasn’t naturally occurring, Brown noted.

Although Brown’s findings aren’t related to the conflict around Ankara veiling the number of asymptomatic patients, they speak to the overall validity of official data.

“I believe it’s important to differentiate between those two. The findings show that the number of cases, tests, deaths, and other statistics all have been inconsistent since April. That’s another problem,” Brown said.

People wearing protective masks ın Istiklal Avenue, amid the spread of coronavirus. (Photo by Osman Örsal)

‘No issues revealed in Turkey’s daily case count, but ‘new case’ statistics reveal anomalies’

Meanwhile, another study titled “Truth or Dare? Detecting Systematic Manipulation of COVID-19 Statistics” by Duke University’s Fatih Serkant Adıgüzel, Oxford University’s Aslı Cansunar and Kadir Has University’s Gözde Çörekçioğlu İshakoğlu also revealed that Turkey’s COVID-19 data was “problematic.”

Dissecting COVID-19 data from nine countries including the United States, China and Russia, the scholars based their work on a 2012 study by Bernd Beber and Alexandra Scacco, where they developed a statistical method to determine whether election results were manipulated by human beings, or whether they were naturally occurring, Cansunar said.

Dubbed a “last-digit analysis,” the Turkish analysts focused their study on both the total number of cases and the number of new cases revealed by Ankara daily, Cansunar added.

“We didn’t detect any issues with Turkey’s total case counts, but there were anomalies in the numbers of new cases that were released daily. We noticed very few zeros in the last digits, which hints at Beber and Scacco’s finding that human beings avoid zeros ‘because they don’t feel random.’ It’s also predictable that anomalies should appear in the numbers of new cases, as this is the statistic that most news outlets focus on in Turkey, as well as a majority of the population.”

Cansunar said that their study was similar to Brown’s in terms of the tests run on the data, but that the timeframe and datasets they focused on were different: Last-digit analysis doesn’t point to what the anomalies in a dataset are, but rather to an abnormality in the creation of datasets.

Turkey introduced further restrictions on elderly citizens in November. (Photo by Osman Örsal)

‘The official numbers may have been set to not end in zeros to seem realistic’

Bilgi University’s Prof. Başlevent said that Brown’s analysis was comprehensive enough as it used all the data the Health Ministry has released since the start of the pandemic.

“He says that this many large numbers should have last digits that frequent numbers 0 to 9 uniformly, but that this wasn’t the case in the official dataset, adding that very few of the data points ended in zeros. The statistical test he conducted also points to an irregular distribution.”

Brown’s analysis is an interesting one as it employs the chi-squared test that statistical students are familiar with to a real-life situation, Başlevent said, adding that the irregular distribution of numbers in last digits of Health Ministry data points could be interpreted as an aversion to using zero in the last digit to make the statistics seem more realistic.

“When that effort was kept up for a long time, it created a different type of anomaly.”

‘An unusual last digit doesn’t always imply fraud’

Meanwhile, Nick Brown maintains that many different factors could explain a dataset’s divergence from randomness, adding that “an unusual last digit doesn’t mean we can always say there’s been fraud.”

“When you look at a receipt for groceries, you’ll notice that many numbers end in either a 0 or a 9. This is because using even numbers or point 99 as prices are sales strategies. However in the Netherlands, the same receipt would include a bunch of 5 and 0 because they don’t use one-cent coins.”

“That being said, in light of a lack of a logical explanation for the anomalies, these could imply that a dataset was handmade, and not collected from a sample. In a dataset from multiple samples, it’s unlikely to have anomalies even if all the samples were handmade.”

“The daily COVID-19 numbers released by Turkey seem off. I don’t know why, there could be a host of reasons for it. I couldn’t tell you why this is happening, but I would love to hear an explanation if anybody has one.”

Brown obtained a bachelor’s in engineering from Cambridge University, but ended up as an executive in the informatics industry because “wasn’t great at math.” After meeting British psychologist Richard Wiseman at a human resources conference, he took an interest in his field and went on to get a master’s in psychology. His psychology studies exponentially piqued his interest in statistics, and he now has a multitude of studies in statistical validation.

‘People tend to avoid 0 for the last digit of a number they made up’

While Prof. Başlevent said that Health Ministry data points rarely ending in 0 might be an indication that they were handmade, Brown says that his analysis didn’t indicate the fabrication of the datasets in question. However, he does point to an alternative explanation based on findings from a 2009 study titled “The value of the last digit: statistical fraud detection with digit analysis.”

“I’d like to restate that my analysis doesn’t indicate the datasets were created by one person. However, research shows that when you ask a person to make up a number, they avoid using 0 as the last digit. If I asked you to guess how many people were in a football stadium, and there were 27,714 people in there, and the club manager asked you to overestimate the number, you probably wouldn’t say 35,000. Instead, you’d say something like 35,406, but the possibility of the real number being 27,714 is the same as the possibility it’s 28,000 or 35,000.”

“Kids won’t do this if you ask them for a large number. They don’t know about last digits and they don’t have any prejudice about last digits. They might say 300, 500, or 3,000. They don’t yet know the manipulations that adults do.”

“This isn’t just about last digits either. People trying to come up with ‘random’ numbers are equally prejudice toward all numbers. So I conducted chi-squared tests to see the distribution of all numbers in the last digit, but a lack of 0 is usually my first criteria to check.”

Turkey’s uphill battle against the coronavirus pandemic took another turn as the nation added weekday curfews and expanded weekend restrictions into lockdowns. (Photo by Osman Örsal)

“People who aren’t great at doing something often also lack the knowledge that they’re not good at it”

If we accept at this point that the inconsistencies in Health Ministry data are in fact a result of the datasets being handmade, then does Brown’s observation of a lack of 0's simply a rookie mistake?

Noting once again that his analysis doesn’t directly imply that the data is handmade, Brown says that it’s theoretically very possible for anyone who made the mistake of leaving out 0s to have failed to foresee anyone noticing the error in question.

“I’m going to adapt the popular psychological phenomenon of Dunning-Kruger here: People who aren’t great at doing something often also lack the knowledge that they’re not good at it. If you’re not good at collecting the correct data, you will also fail at gathering acceptable fake data. I would say that if anyone were theoretically assigned with making up numbers, they would need to be bad with numbers as well. Because if you’re good with numbers, you shouldn’t be in a position where you’re asked to make up data.”

“Here’s my code and the data; take a look and tell the world why my analysis is wrong.”

Brown noted that officials are welcome to question his qualifications as well and that he finds that would be appropriate.

“I conducted this analysis as an independent researcher who has no involvement with the inner debate in Turkey. I have no arrogance as to my qualification for such an analysis, but here’s my code and the data. These numbers look off. I don’t know what may have caused that, because a lot of things could have, and I wouldn’t know which is the case. But here’s my code and the data; take a look and tell the world why my analysis is wrong.”

Turkey is looking forward to mass vaccination as the ultimate solution for the pandemic. Health Minister Fahrettin Koca announced on Tuesday that a Chinese vaccine would be delivered first to health care workers after Dec. 11 and to millions more the following months. (Photo by Osman Örsal)

Nick Brown’s analysis of official COVID-19 data from Turkey has been reported to the Health Ministry, which failed to respond to Brown’s results before this story was published. An official who wished to remain anonymous said that it’s likely the ministry will not respond, as Minister Koca has made all the necessary statements.

Parliamentary Health, Family and Social Work Committee member and medical professor Sefer Aycan said that they believe the official data to be clean, and credible. Also a deputy of ruling alliance member Nationalist Movement Party (MHP), Aycan believes the data is unproblematic.

“There’s not much to this debate. What does it matter if the numbers are a few digits higher or lower? Now is the time to fight in unity. We should talk about what else can be done, instead of worrying about the numbers,” Aycan said, adding that the suspicion around official numbers is reserved to oppositional fractions of society.

“Distrust doesn’t serve anyone, including the opposition and the country. It’s geared to create chaos. What does it matter if the numbers were a few more or few less?”

*This piece originally published in Turkish.
**Special thanks to Osman Örsal for the photos he provided for the English version of the article.
***And special thanks to others who shared their knowledge with me during the conducting process of this piece, and to
Azra Ceylan for the translation.

--

--

Gonca Tokyol

Freelance journalist, former senior editor and reporter at T24. Covered a wide range of issues - from terrorist attacks to protests, elections, refugee crisis.