How to Verify Your Data and Data Sources

How to Verify Your Data and Data Sources

COVID-19 presented the world with conflicting tidal waves of information that were so difficult to parse through at first glance, that several pieces of misinformation made it to the front lines. Reports about using Vitamin C to prevent what was then dubbed the “Wuhan coronavirus” by the Chinese government made it to the news in China, and some of that misinformation traveled overseas even more quickly than the virus.

This information can be maddening to wade through. Finding sources at first that don’t report some aspect of misinformation can be challenging as well, since the truth takes time to circulate more so than panic. How can journalists sort out fact from fiction in the early days of covering a story, any story?

REPUTABLE SOURCES WILL BE TRANSPARENT ABOUT HOW THEY OBTAIN AND COMPILE THEIR DATA

They will be transparent around all processes, algorithms, or technology that they use in order to maintain their data collection as well. Where they get the data is also important, because a reliable database will most likely rely on primary or official sources. Websites like FiveThirtyEight also grade the reliability of the source of their data. 

THE METHODOLOGY AND APPLICATION OF THE DATASETS WILL BE CLEAR

Again, to use FiveThirtyEight as a reference, the site has published a very detailed redistricting tracker of the United States House of Representatives. In order to figure out how to use it, they’ve included a companion metadata piece to explain how and why they are collecting this data and presenting it to the public. They acknowledge biases and automations within the system, as well as acknowledge room for and the likelihood of error. 

CONTEXT

Correlation does not always equal causation, so the context of the applied data is very essential. For example, a 2017 study pitted two groups of women, five-year or longer horse owners, and non-horse owners, against each other to find out which group lived longer. They concluded that women with horses lived longer. However, they later acknowledged that women with horses were more likely to get more outdoor exercise, have a more active lifestyle, and have greater wealth, and therefore access to medical and health care. Therefore, the context that horses make people live longer is a flawed one; upon further examination, wealth and amount of exercise would appear to be the greater underlying factors.

CROSS REFERENCE

A reliable source will almost always have its data matched by another reliable source. Finding a second source that fits all of the above criteria and cross referencing it with your first source will add to the credibility of the data. Check a few datasets out to see if they all match, more or less, and check the metadata piece to see if their methodologies match up, and if not, how they differ. 

ASK QUESTIONS

You’ll need to do lots and lots of research in order to verify, interpret, and present data, and so the best way to do this is to constantly ask yourself questions: questions on how to make sure what you’re looking at is accurate, and questions on how to understand what it is you’re looking at and how that correlates with your other questions. Journalists ask questions incessantly, and this is just another application of that skill. Skepticism is also useful when dealing with data, as it always leads to more questions.

Data can be really overwhelming, especially when tirades of it come in and the headlines are flooded with reductive or destructive interpretations of the data that ignore context, or any of the other above steps before making it to publication. Journalists possess the natural instincts to cut to the bottom line and interpret the truth, as long as they give themselves some guidelines to fall back on.