A driver hits the brakes.
He changes speed.
The vehicle’s fuel economy shifts.
The car slows to a halt. Stopped at a red light, the driver looks down to his phone to check on his route on Google’s Waze app. Thanks to crowdsourced data produced in real time by other drivers, Waze identifies an accident a half-mile ahead, and the app redirects his route. The light turns green and he accelerates.
Every event that this 2015 car experiences, and every computation the app performs while using the driver’s GPS location, are all recorded and collected as valuable and powerful data. Multiply the data from this one car by many millions and we have arrived at the state of big data in the auto-tech industry.
“The amount of data we generate in our industry is tremendous,” Jim Buczkowski told an audience at the Stanford Alumni Center. Buczkowski is the director of electronics systems at Ford Motor Company, and he is speaking as a panelist at the Data Driven conference. Hosted by the Revs Program at Stanford and the Stanford Journalism Program, “Data Driven: Coding and Writing Transportation’s Future” was a panel discussion about the opportunities and challenges emerging from a new landscape of vehicular data. A distinguished lineup of 15 panelists came to Stanford on Feb. 13 to talk about how data is impacting the auto, tech and journalism industries, as well as society as a whole.
Watch the conference videos below
The first panel of the conference focused on data that is coming directly out of cars. Ford is ramping up its effort to innovate its cars, recently announcing the opening of a new research and development center in Silicon Valley. Data plays an essential role in what comes next.
“All that data that resides in the vehicle, now we have access to it through connectivity, embedded modems and smartphones,” Buczkowski said. “The opportunity to operate on that data to create better experiences for customers is massive.”
According to Buczkowski, the average Ford F-150 runs on 132 million lines of software processed by 21 different CPUs. With car connectivity, real-time data from the car can be transmitted out of the car and immediately sent back to Ford to be analyzed. Buczkowski said that a vehicle’s brake system by itself produces 11 gigabytes of data per year. He notes that this kind of information can be harvested to give owners new services, such as a customized maintenance schedule based on your driving style.
When car data gets aggregated, even more innovation becomes possible. Buczkowski mentioned future plans related to vehicle-to-vehicle communication, autonomous driving and other interesting inventions, such as simplifying every driver’s worst nightmare: finding a parking spot. “We want to go from taking the car from Point A to Point B, to taking it from Point A to the parking spot,” Buczkowski said “We are thinking beyond the car, toward how the car interacts with the environment.”
Ford is not the only automaker acting on the connected car revolution. Nearly every major car company in the world is scrambling to capitalize on connectivity and big data.
"It’s hard to understate the extent of the revolution that is taking place,” New York Times auto journalist Aaron Kessler said. “Cars used to be the equivalent of an air-gap computer. They had all this stuff but it was never connected to anything. Now that’s completely changing.”
Liz Jensen is the founder of Road Rules, an app that will use the sensors on the smartphone to automate tasks while driving. Her presentation outlined the complex and somewhat inconsistent network of car data streams that they must operate in to succeed in that niche. Road Rules and others like it are operating in a contested space. While car data is there now and ripe for innovation, there is no guarantee that car company data streams will continue to be open. Jensen explained that as automakers begin to pair with tech giants like Apple and Google, “access is starting to close a little to this data that’s coming directly from the car.”
"I’d like to see more access to data,” Jensen said. “When you keep the API (Application Programming Interface) open, it allows more room for innovation.” One idea Jensen likes is to use the car as a weather sensor to detect microclimates. “I love the idea of the car as a sensor,” she said.
While one vehicle’s data can be useful for bringing value back to the customer, aggregating the data can help with addressing larger societal issues like traffic congestion, air pollution and auto safety. In the second panel of the day, Di-Ann Eisnor of Google’s navigation app Waze; Charlie Catlett, leader of the Array of Things project at the University of Chicago; and Adam Altman, head of product at Automatic, discussed the ways that mapping and sharing data can benefit communities and cities.
Catlett has been busy working on a project that will install computer sensors on streetlights across Chicago to measure air quality among other things.
“We want to try to understand the city from a variety of standpoints,” he said. “The basic idea is to provide a way for the city to give information to itself, to residents and to drivers about the state of the city.”
Currently, only a handful of Catlett’s sensors are mounted in Chicago in private locations. But over the next few years, Catlett says that the City of Chicago will be installing 500 state-of-the-art sensors across the city that will eventually begin recording real-time data about air quality, weather and traffic flow every thirty seconds.
Altman of Automatic presented another angle for how big data can be used to benefit a larger community. Automatic is a piece of hardware that plugs into the diagnostics port of the car, reads vehicle data as it comes, and then in communication with its smartphone app, sends helpful alerts about your driving habits. Altman says that Automatic’s robust database can help with reducing traffic, predicting traffic patterns and identifying dangerous roads and intersections where data shows the most instances of drivers “hard braking.”
"We’re able to see that and visualize that with data,” Altman explained. “This is something that cities can be very interested in to help them understand how things are flowing.”
Waze’s Di-Ann Eisnor is also familiar with cities’ interest in this type of aggregated transportation data. Waze is an app where users self-report incidents while driving, such as accidents, heavy traffic, police cars and road closures. Eisnor recently announced Waze’s Connected Citizens program, a partnership between Waze and state and local governments around the world to share data with the goal of helping make cities run more efficiently.
While there are great opportunities out in the vehicular data world, there are also some serious risks. With so much private information on drivers and cars now out in the open, many panelists and audience members shared concerns over consumer privacy and security.
"Most of the automakers don’t do a great job at securing their data,” Aaron Kessler said, citing a recent congressional study declaring weak security measures in cars that have wireless systems. “What guarantees do I have as a driver that my data is going to be used properly, that it’s not going to be stolen and it’s not going to misappropriated in some way?”
"It’s kind of the Wild West right now,” he added. “Everywhere you have been, everywhere you have parked, your driving habits – someone is going to have all of it.”
"I hope that everyone that works in this sector is transparent and responsible about how they use their data,” Jensen said. “As a company, if you explain to the customer how you’re going to use that data to add value back to them, you’ll create a relationship of trust.”
Buczkowski emphasized that all of Ford’s new services will have a transparent and consumer-friendly “opt-in” format.
The final panel of the day shifted the conversation toward the use of transportation data as a resource for investigative journalists. Moderated by Stanford Journalism lecturer Cheryl Phillips, this panel featured seven expert data journalists, including two Pulitzer Prize winners. The journalists present were Robert Benincasa (NPR), Danielle Ivory (New York Times), Maurice Tamman (Reuters), John Maines (Florida Sun Sentinel), Michael Morisy (Muck Rock) and Paul Ingrassia (Reuter)
"A lot people in journalism don’t necessarily know that they can use data as evidence for their stories,” Robert Benincasa said. “Every event is a data point that can be used to cover the world.”
Each panelist explained how he or she utilized public transportation data to aid their reporting. In 2013, John Maines won a Pulitzer Prize at the Florida Sun Sentinel for an investigative story uncovering excessive and reckless off-duty police speeding. Unlike many reporters, Maines did not have to rely on eyewitnesses for textual support. He was able to prove the violations with hard data. Using public data from bridge tollbooths, Maines could identify time-stamped logs of exactly when a police car entered and exited a bridge. Then, knowing the exact length of the bridge, he could calculate the vehicle’s speed, which was consistently between 80-120 miles per hour.
Danielle Ivory has been busy this year reporting on the General Motors ignition switch scandal and showing how the National Highway and Traffic Administration did not take proper action after receiving years of complaints about faulty ignition switches. Much of her work for this project, she said, has gone into working with NHTSA fatalities, recalls and complaints datasets to uncover the true number of deaths the ignition defect was responsible for, as well as evaluating what exactly NHTSA was doing along the way.
"NHTSA collects and produces this incredible amount of data, but it is messy and some of it is very unwieldy,” Ivory said. “What we went about doing was taking these databases and stitching them together ourselves. Right away, we found that NHTSA had received thousands of complaints about ignition problems in these cars as soon as the cars hit the market. It’s unclear to us what that data was doing other than sitting in a database.”
In some cases, data journalism means reporting on the government’s handling of its own data. Michael Morisy, founder of MuckRock.com, discovered through his reporting of an intrusive license plate scanning program at the Boston Police Department that many government agencies are unable to manage sensitive data. “Most agencies didn’t seem to understand the implications of the data,” Morisy said. “They don’t have a chief privacy officer or a data specialist. This is very tricky stuff and it is very easy to do something wrong. Most agencies don’t have any plan for properly securing the data.”
While there is a surplus of data to work with in the private sector, the opposite is true for public sector data. The lack of transparency, the difficulty of accessing records through Freedom of Information Act requests and ultimately the incomplete nature of the data is a constant frustration for journalists. All the panelists at the conference agreed that it seems like more and more public data is being withheld, delayed or privatized.
"We’re seeing the privatization of public data,” Maurice Tamman said. “When data is moved from a public entity to a private entity, if that information was intended to be public, it should remain public. I think this is a fundamental threat to Americans’ access to their government.”
Maines agreed and emphasized that the hardest aspect of using the bridge toll database was obtaining it in the first place. “We had more trouble with the bureaucrats than with the data itself.”
Despite the challenges of accessing public data for journalists, the work of these panelists show that data in the right hands can be an indispensable component to journalism. And while the current landscape of data may be sparse, the connected car revolution provides some hope for a more transparent and prolific future.
"The data that comes in right now is not great [from a journalist’s perspective],” Aaron Kessler said. “But when you think about the future of a connected car, it opens up a whole new world of possibilities. To be able to capture consistently detailed, rich data that is available to the public, to journalists, to researchers and to automakers could be a game changer.”
Joining the effort to make public data more transparent, Stanford Journalism lecturer Dan Nguyen capped off Data Driven with the unveiling of a new online repository of transportation data sources, which includes example visualizations of data. Check out what’s available at www.DataDrivenStanford.org.
Watch the full conference here: https://www.youtube.com/playlist?list=PLpGHT1n4-mAsIxVxtpTQov5A8onvr4JKA
Learn more about the Data Driven panelists: www.journalism.stanford.edu/datadriven-conf/
A repository of transportation data sources courtesy of the Stanford Journalism Program: www.DataDrivenStanford.orgEnd