What Are Big Data Analytics in Transportation?

Transportation data analytics increasingly power mobility information and insights – transforming transportation planning and operations by making it easier, faster, cheaper, and safer to collect and understand critical information.

In recent years, the transportation industry has been disrupted by multiple forces, including the COVID-19 pandemic, an ongoing road safety crisis, and a growing push for decarbonization. Meanwhile, many public agencies are facing tight budgets and a looming “fiscal cliff,” adding pressure to do more with less. As these and other changes unfold, transportation experts must:

Facing these demands, more and more cities, transit organizations, departments of transportation (DOTs), and other localities are using transportation data analytics to solve problems, prioritize investments, and win stakeholder support. But how do these analytics work, and how can you choose a reliable transportation data analytics source?

Transportation Data Analytics Capture the Speed of Change

Happily, we are no longer limited to sensors and surveys. Transportation data analytics can provide complete end-to-end trip information , including trip origins and destinations, routes, trip distances, travel time, and even real-time data on how vehicles are moving. When data is aggregated from multiple sources, transportation data analytics become even more valuable, providing transportation experts with details including home and work locations, trip purpose, aggregated traveler demographics, and more.

Using transportation data analytics, transportation professionals can quickly access accurate data for every road in the country, every day of the year — even in real time.

Not all transportation data is created equal. When collecting and using data, it is important to understand the factors at play. These factors drive transportation data coverage, depth, and accuracy:

Data Sets and Sources

The landscape of mobility data sources is constantly changing. Taking advantage of emerging data sources while following data privacy regulations and best practices are musts for any transportation data provider.

Among these emerging data sources is Connected Vehicle data, which, along with GPS data, is paired with contextual data points from road network data, census data, and physical counters to offer a full picture of how people move.

Once filtered through a set of complex machine-learning-based algorithms , transportation data analytics can be used to analyze trips from the moment journeys begin to the moment they end, via any mode, on all roads and paths. They can even be used to monitor roadway conditions in real time. But not all transportation data is created equal, and not all data leads to powerful insights.

StreetLight’s data sources have grown and adapted over time to take advantage of industry innovations while protecting privacy.

Key questions for evaluating data sets and providers include:

  1. How big is the sample size? The larger the sample size , the lower the margin for error in the data.
  2. How many data sources? The most accurate and unbiased data sets draw from multiple sources.
  3. How frequently is the data updated? Regular updates allow for more granularity in studies.
  4. What does the data cover? Ideally it should be able to drill down to rural areas, small streets, and individual intersections. And it should capture historical travel data as well as recent data.
  5. What modes does the data include? Can it identify cyclists, pedestrians, transportation network company driving, transit, and more ?
  6. Does the data include trip characteristics? Look for datasets that include speed information, trip purpose, trip length, and more.
  7. Does the data offer date-specific measurement? Specifying dates allows the data to measure movements during historical events and create before-and-after studies.
  8. Does the data offer real-time insights? When data is needed to monitor ongoing conditions and enable quick responses to emerging traffic issues (such as during construction or special events operations), look for a data platform that offers real-time or near real-time data.
  9. How is the data accessed? Look for an on-demand platform with access to run multiple studies, rather than a one-time download of a single analysis.
  10. Is the data specifically designed to support your use case? Look for a platform that aggregates and visualizes data outputs in a way that is purpose-built for your needs. For example, can it highlight dangerous vehicle speeds when analyzing safety, or visualize atypical delay when monitoring the impact of road construction, etc?

Algorithms and Machine Learning

Transportation data analytics rely on computer algorithms and, sometimes, on machine learning. Software engineering and data science expertise is increasingly important for understanding and evaluating transportation data sources.

At a minimum, transportation data providers should be able to explain the modeling behind a transportation data algorithm, including the data sources, how the data is handled, and the algorithm’s capabilities. Transparency is critical for evaluating today’s complex data sets.

Machine learning is an increasingly important element of transportation data analytics, and one that does not offer as much clarity as a computer algorithm. With machine learning, data scientists essentially “feed” a computer program actual data, and the computer “learns” to recognize and extract only that type of data and select it from a data set. Over time, the machine’s accuracy grows, although transparency decreases into what details the machine identifies, and how it evaluates them.

Visualization

StreetLight’s propriety processing algorithm, Route Science® , transforms data from multiple sources into the metrics transportation professionals use every day.

Data scientists may not explain in detail how this process works (in order to protect intellectual property), but they should be able to share the degree of accuracy with which the process works. Sometimes the data will be more directional than definitive – which can still be helpful, as long as users understand where the gaps are.

Overall, a company with an effective data set and process should have multiple proven uses with actual customers for their metrics, and not simply theoretical applications.

Privacy Protection

Stakeholders sometimes have very real concerns about this level of transportation data analytics – how data is collected, protected, and shared. Fortunately, best practices are emerging to ensure privacy protection. At StreetLight, we operate at or above established guidelines to set the tone for the industry.

Data should never enable the tracking of individuals, or sending marketing messages targeted to individual devices (such as cellphones). Instead, analytics should describe patterns in the movement of composite groups of people.

Transportation data analytics companies should not receive, process, or use personally identifiable information in the creation of customer products. Throughout the product-creation process, they should employ multi-step, multi-layered technical safeguards, including automated privacy and coverage checks that ensure sufficient aggregation based on dimensions such as time, space, and land use.

Data storage and processing should take place in a secure data repository protected by multi-layered network security architecture, and supported by system audits and controls. An additional step is to build in administrative safeguards and employee training.

Currently, the General Data Protection Regulation (GDPR) created by the European Union is stricter than U.S. privacy law, and therefore many data companies choose to follow the GDPR. Some also follow “Privacy by Design” practices, building privacy practices into their technical infrastructure and business operations.