The Hidden Bias of Big Data

At Strong Towns, we're all about doing the math. In the following article, republished with permission from City Observatory, Joe Cortright suggests the need for caution when using data to assess the safety of our streets. It's not enough to do the math, you also have to use it wisely.


Streetsblog recently highlighted a new report from Houston’s Kinder Institute, evaluating bike and pedestrian road safety based on user-reported near misses. Kinder got 187 cyclists and pedestrians to record their travel for a week in March, and identify and describe situations in which they narrowly avoided an collison. The idea behind the report is that actual crashes, injuries and deaths capture only a small portion of the actual dangerous situations that active transportation users encounter, and that if we cast our net more widely to look at near-misses, we’ll have a better idea of where danger lies. It’s not a bad idea: waiting for crashes or deaths to find out that a road is really dangerous is unfortunate.

But the Streetsblog article highlights an important caveat to relying on this kind of data. Because pedestrians and cyclists self-censor their route choices to generally avoid the scariest and most dangerous road segments, even this methodology can produce a kind of “false-positive” giving the impression that a roadway is safe, because there are few or no reported crashes or near misses.  The report’s author, Dian Nostrikasari writes:

The near-misses that bicyclists and pedestrians sometimes experience may affect their future travel decisions and prompt them to avoid roads they know are dangerous. That, in turn, could reduce the number of collisions at particular intersections. On paper, that could make areas seem safe, even if they aren’t.

While it may seem like this is a minor technical issue, it isn’t. It reveals a fundamental problem with our over-reliance on data-driven planning methodologies. As we’ve noted at City Observatory, while we have copious metrics for assessing car travel (traffic volumes, speeds, level of service, number of crashes, delay), we have precious few measures of bike or pedestrian use (or safety). This imparts a subtle but pervasive bias to planning processes, and creates the illusion that we’re managing scientifically, when in fact we’re ignoring many aspects of the system for which we essentially have no data. But more importantly, even the bike and ped measures we do have reflect the very constrained levels of activity that occur on a system that for decades has been optimized for vehicles and is hostile to people walking and cycling. As we wrote at City Observatory last year:

An exacting count of existing patterns of activity will only further enshrine a status quo where cars are dominant. For example, perfectly instrumented count of pedestrians, bicycles, cars in Houston would show—correctly—little to no bike or pedestrian activity. And no amount of calculation of vehicle flows will reveal whether a city is providing a high quality of life for its residents, much less meeting their desires for the kinds of places they really want to live in.

This is not a problem that can be solved by more or better data. In our view, that’s a fundamental flaw in the “Smart City” visions we hear so much about. When we gather data about vehicle movement and parking, and even current bike and pedestrian activity, we largely miss the opportunity to talk about radically different alternatives. As we discussed earlier this year, rather than relying exclusively, or even primarily on this kind of data, we ought to be talking more about the kind of places we want to live in and the way we want to enjoy them. That’ll be a much better guide to the future that an excessive reliance on data.

A hat-tip to Streetsblog’s Stephen Miller for excellent reporting here.

(Top photo source: Johnny Sanphillippo)


Related stories