Loading

Module 8: Introduction to Data Journalism

Notes
Study Reminders
Support
Text Version

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

Understanding and Delivering Data

Data literacy

Just as literacy refers to "the ability to read for knowledge, write coherently, and think critically about printed material," data-literacy is the ability to consume for knowledge, produce coherently, and think critically about data.

Data literacy includes statistical literacy, but also understanding how to work with large data-sets, how they were produced, how to connect various datasets, and how to interpret them.

Untitled Slide
Math Concepts

Some US universities offers math classes for journalists, in which reporters get help with concepts such as percentage changes and averages. That journalists need help in math topics normally covered before high school shows how far newsrooms are from being data-literate. This is a problem.

• How can a data journalist make use of a bunch of numbers on climate change if she doesn't know what a confidence interval means?

• How can a data reporter write a story on income distribution if he cannot tell the mean from the median?

Data Collection

Data Collection

Data Collection

Learning

Learning

Reliability

Reliability

Data Tips

Data Tips

Understanding and Delivering Data

Data literacy

A Journalist certainly does not need a degree in statistics to become more efficient when dealing with data. When faced with numbers, a few simple tricks can help her get a much better story. As Max Planck Institute professor, Gerd Gigerenzer says, “better tools will not lead to better journalism if they are not used with insight.”

Even if you lack any knowledge of math or stats, you can easily become a seasoned data journalist by asking three very simple questions:

1. How was the data collected?
2. What's in there to learn?
3. How reliable is the information?
Untitled Slide
Tips for Working with Data

• Data can appear forbidding, don't allow it to intimidate you

• Don't confuse skepticism about data with cynicism

• If you believe in data, try to let it speak before you slap on your own mood, beliefs, or expectations

• There can often be more than one legitimate way of cutting the data. Numbers don't have to be either true or false

• Think around the data, look at the real-life complications, the comparisons over time, group or geography; in short, context.

Reliability

Articles about the benefits of tea-drinking are commonplace, however many pieces of research fail to take into account lifestyle factors, such as diet, occupation, or sports. The math behind correlations and error margins are certainly correct, at least most of the time. But if researchers don't look for co-correlations (e.g., drinking tea correlates with playing sports), their results are of little value.

As a journalist, it makes little sense to challenge the numerical results of a study, such as the sample size, unless there are serious doubts about it. However, it is easy to see if researchers failed to take relevant pieces of information into account.


What can you Learn?

When writing about an average, always think "an average of what?" Is the reference population homogeneous?

Always take the distribution and base rate into account. Checking for the mean and median, as well as mode (the most frequent value in the distribution) helps you gain insights in the data. Knowing the order of magnitude makes contextualization easier.

Finally, reporting in natural frequencies (I in 100) is way easier for readers to understand than using percentage (1 %).

Data Collection

The easiest way to show off with spectacular data is to fabricate it. It sounds obvious, but data as commonly commented upon as government figures can very well be phony. When in doubt about a number's credibility, always double-check, just as you'd have if it had been a quote from a politician.

For police data, sociologists often carry out victimization studies, in which they ask people if they are subject to crime. These studies are much less volatile than police data.

Other tests let you assess precisely the credibility of the data, but none will replace your own critical thinking.

In many ways, working with data is like interviewing a live source. You ask questions of the data and get it to reveal the answers. But just as a source can only give answers about which he has information, a data-set can only answer questions for which it has the right records and the proper variables.

Understanding and Delivering Data

Data literacy

There are at least three key concepts you need to understand when starting a data project:

1. Data requests should begin with a list of questions you want to answer
2. Data often is messy and needs to be cleaned
3. Data may have undocumented features
Questions

Questions

Request

Request

Untitled Slide
Request all Variables

It is also a good idea to request all the variables and records in the database, rather than the subset that could answer the questions for the immediate story. You can always subset the data on your own, and having access to the full data-set will let you answer new questions that may come up in your reporting and even produce new ideas for follow-up stories.

It may be that confidentiality laws or other policies mean that some variables, such as the identities of victims or the names of confidential informants, can't be released. But even a partial database is much better than none, as long as you understand which questions the redacted database can and can't answer.

Know the Questions

You should consider carefully what questions you need to answer even before you acquire your data. Basically, you work backward. First, list the data-evidenced statements you want to make in your story. Then decide which variables and records you would have to acquire and analyze in order to make those statements.

Consider an example involving crime patterns in your city, and the statements you want to make involve the times of day, week and area, as well as various crime categories.

So date, time, crime category, and address are the minimum variables you need to answer those questions.

Understanding and Delivering Data

Cleaning Data

One of the biggest problems in database work is that often you will be using data for analysis reasons that has been gathered for bureaucratic reasons.

The problem is that the standards of accuracy for those two is quite
different.

Errors can skew a data journalist's attempts to discover the patterns in the database. For that reason, the first big piece of work to undertake when you acquire a new data-set is to examine how messy it is and then clean it up.

Dirty Data

Dirty Data

Untitled Slide
Dirty Data

The data often is "dirty," with values that aren't standardized. Sometimes you will receive data that doesn't match up to the supposed file layout and data dictionary that accompanies it. Also, some agencies will insist on giving you the data in awkward formats like pdf, which have to then be converted.

A good quick way to look for messiness is to create frequency tables of the categorical variables, the ones that would be expected to have a relatively small number of different values. (When using Excel, for instance, you can do this by using Filter or Pivot Tables on each categorical variable.)


Understanding and Delivering Data

Undocumented Features

The Rosetta Stone of any database is the so-called, data dictionary.

This file (it may be a text doc., a PDF or spreadsheet) will tell you how the data file is formatted, the order of the variables, the names of each variable, and the data type of each variable.

You will use this information to help you properly import the data file into the analysis software you intend to use (Excel, Access, etc). The other key element of a data dictionary is an explanation of any codes being used by particular variables.

Check Data

Check Data

Untitled Slide
Check Data

But even with a data dictionary in hand, there can be problems. Always ask the agency giving you data if there are any undocumented elements in the data, whether it is newly created codes that haven't been included in the data dictionary, changes in the file layout, or anything else.

Also, always examine the results of your analysis and ask the question "Does this make sense?"

Understanding and Delivering Data

Data Visualization

Visualize Data

Visualize Data

Data journalism can sometimes give the impression that it is mainly about presentation of data.

Such as, visualizations that convey an understanding of an aspect of the figures, or interactive searchable databases that allow individuals to look up places like their own local street or hospital.

It is unrealistic to expect that data visualization tools and techniques will unleash a barrage of ready-made stories from datasets. There are no rules, no protocol, that will guarantee a story.

Analyze Data

Analyze Data

Untitled Slide
Analyze Data

Once you have visualized your data, analyze and interpret what you see, the next step is to learn something from the picture you created. Ask yourself:

• What can I see in this image?
• Is it what I expected?
• Are there any interesting patterns?
• What does this mean in the context of the data?

Sometimes you might end up with a visualization that, in spite of its beauty, might seem to tell you nothing of interest.

Visualize Data

Visualization provides a unique perspective on the data-set. You can visualize data in four different ways.

• Tables show labels and amounts in structured and organized fashion as well as the ability to sort and filter the data.

• Charts allow you to map dimensions in your data to visual properties of geometric shapes.

• Graphs are all about showing the interconnections in your data points.

• Maps can reveal geographic relations within the data (trends).
Understanding and Delivering Data

Using Visualizations

Data visualization can be attention
getting-valuable social currency for sharing and attracting readers-it also leverages a powerful cognitive advantage: half of the human brain is devoted to processing visual information.

When you present a user with an info graphic, you are reaching them through the mind's highest-bandwidth pathway.

A well designed data visualization can give viewers an immediate and profound impression, and cut through the clutter of a complex story to get right to the point.

Facts

Facts

Untitled Slide
Facts

Data visualization is also deeply rooted in measurable facts. While aesthetically engaging, it is less emotionally charged. In an era of narrowly-focused media that is often tailored towards audiences with a particular point of view, data visualization (and data journalism in general) offers the tantalizing opportunity for storytelling that is above all driven by facts, not fanaticism.

Moreover, like other forms of narrative journalism, data visualization can be effective for both breaking news, quickly imparting new information like the location of an accident and the number of casualties-and for feature
stories, where it can go deeper into a topic and offer a new perspective.

Understanding and Delivering Data

Presenting Data

Publishing

Publishing

There are lots of different ways to present your data to the public, from publishing raw datasets with stories, to creating visualizations and interactive web applications.

There are times when data can tell a story better than words or photos.

Tools like Google Fusion Tables, Tableau, Dipity, and others make it easier than ever to create maps, charts, graphs, or even full-blown data applications. The question facing journalists is now less about whether you can turn your data-set into a visualization, but whether you should.

Untitled Slide
Publishing

You can embed the data onto your site in a visualization and in a form that allows for easy download of the data-set. Readers can then explore the data behind the stories through interacting in the visualization or using the data themselves in other ways.

This is important as it is showing the readers the same data that was used to draw powerful conclusions. By making the data available we also can enlist tips from these same critics and general readers on what may have been missed and what more could be explored.

Audience Needs

Audience Needs

Audience Needs

Commitment

Commitment

Expense

Expense

The Future

The Future

Understanding and Delivering Data

Data Interaction

News applications are windows into the data behind a story. They might be searchable databases, sleek visualizations, or something else altogether. But no matter what form they take, news app's encourage readers to interact with data in a context that is meaningful to them.
More than just high-tech infographics, the best news app's are durable products. They live outside the news cycle, often by helping readers solve real-world problems, or answering questions in a useful or novel way, becoming enduring resources.

Providing such an important and relevant service creates a relationship with users that reaches far beyond what a narrative story can do alone.

Untitled Slide
The Future

News application development has come a long way in a very short time. News app's used to be a lot like Infographics, interactive data visualizations, mixed with searchable databases, designed to advance the narrative of the story.
Now, many of those app's can be designed by reporters on deadlines using open source tools, freeing up developers to think bigger thoughts.

Where the industry is headed, is about combining the storytelling and public service strengths of journalism with the product development discipline and expertise of the technology world. The result, will be an explosion of innovation around ways to make data relevant, interesting and useful to the audience.

Expense

Building high-end news app's can be time-consuming and expensive. That's why it always pays to ask about the payoff. How do you elevate a one-hit wonder into something special?

Creating an enduring project that transcends the news cycle is one way. But so is building a tool that saves you time down the road (and open sourcing it), or applying advanced analytics to your app to learn more about your audience.

Commitment

Building news app's means balancing the daily needs of a newsroom against the long-term commitment it takes to build a great product.

For example, your editor comes to you with an idea: the City Council is set to have a vote about whether to demolish several historic properties. He suggests building a simple application that allows users to see the buildings on a map.

As a developer, you have a few options. You can build a map using custom software, or use existing tools like Google Fusion Tables or open source mapping libraries and finish the job in a couple hours. The first option will give you a better app; but the second might give you more time to build something else with a better chance of having a lasting impact.

Audience Needs

News app's don't serve the story for its own sake-they serve the user, that user might be a dialysis patient who wants to know about the safety record of her clinic, or even a homeowner unaware of earthquake hazards. No matter who it is, any discussion about building a news app, like any good product, should start with the people who are going to use it.

A single app might serve many users. For example, a project called “Curbwise”, built by the Omaha (Nebraska) World-Herald serves homeowners who believe they are being overtaxed; residents interested in property values; and real estate workers keeping track of recent sales. In each of those cases, the app meets a specific need that keeps users coming back.