Public Transportation
Challenge provided by LxDataLab

Determining the main mobility flows in Lisbon based on mobile device data

Predicting traffic flows due to events can improve public transportation availability and aid in reducing rush hour traffic by incentivizing commuters to travel during off-peak periods using rewards and personalized offers.

In the last decades, the city of Lisbon has observed a loss of inhabitants from the downtown to its metropolitan area. From the early 1980s until 2017, the number of inhabitants of the city decreased from 800.000 to 500.000. Simultaneously, car use in daily commuting between the city and the metropolitan area showed a clear increase. 

This has resulted in an overload of the road network and parking spaces in the city and a decrease in safety and quality of life for the city's inhabitants and users. In 2017 use of public transport decreased from about 46% of journeys that start and end in the city, compared to only 22%.

Reversing the current modal split to free up public space for citizens and ensuring convergence with the goals of the Paris Agreement, namely carbon neutrality by 2050, is the biggest challenge of the mobility policies for the city of Lisbon. It is thus necessary to change the paradigm. 

The city of Lisbon proposed this challenge to get a better understanding and visualization of how people move between grids during rush hours (7:00 AM - 10:00 AM and 5:00 PM - 8:00 PM) and a model that can predict those movements and identification of potential interventions to improve the commuting experience of people in Lisbon and favor sustainable modes of mobility. 

For this challenge, the LxDataLab team stressed the importance of a predictive model as the core desired outcome and welcomed more general and niche solutions to aspects of the problem posed.


Extract inputs that could allow the planning and execution of the necessary actions to improve mobility in the city of Lisbon, the quality of life of its citizens and meet sustainability goals.

United Nations SDG 

GOAL 11: Sustainable Cities and Communities

  • Target 11.2.1: Provide access to safe, affordable, accessible, and sustainable transport systems for all.


The following datasets were provided to the participants:

  • Number of mobile phones entering, remaining, and exiting per 200m/200m square in a period of 15 minutes - Lisbon City Grid (September to November 2022)
  • Number of mobile phones entering and exiting the city every 15 minutes on the 11 main axes of entry into the city of Lisbon - Axes of the city of Lisbon (September to November 2022)
  • Identification of the 11 points of entry and exit of Lisbon
  • Data on the road network of the city of Lisbon
  • Traffic level data ( from the WAZE platform) and traffic conditions


This challenge came with a collection of large, highly granular, and well-documented datasets already offering many possibilities for different approaches and the development of different data products. Most teams worked on a selection of the datasets provided and integrated their selection with relevant external publicly available data to get more insights into specific issues.

Several teams supplemented the datasets with data from Lisbon’s Open Data portal Conjuntos de dados on public transport, road networks, environmental and meteorological information. One team added bus data (GTFS) as a proxy for public transportation, another team used data on the cycling infrastructure from OpenStreetMap and yet another team decided to collect points of interest in Lisbon such as restaurants from google maps.

Methods and Techniques

One team decided to focus on traffic jams and benchmarked several time-series models on custom metrics they had derived from the provided datasets. This team defined a Flow metric measuring the movement of individuals within grid cells in Lisbon and a Jam level metric that assessed traffic intensity within grid cells. The models this team evaluated to predict traffic jams included Last Value Repeating & Last Cycle Repeating naive benchmarks, a neural network based on dense layers as well as a recurrent neural network (RNN) trained on the whole history of the training data using Long Short-Term Memory (LSTM) layers.

Understanding peak rush hour movement patterns was the focus of another team. They used the OpenCV library to analyze movement between grids with the Wasserstein distance / EMD (earth mover’s distance), inspired by Balzotti et al. (2018). Their predictive model of choice was NeuralProphet, a hybrid forecasting framework based on PyTorch. The very same team also employed a LISA / Local Moran map to detect spatially extended clusters and diagnose local instability (Figure 3) in the data on distinct terminals in grid squares during rush hour.

A third team focussed on predicting key flows of people currently occurring in the city of Lisbon using an attention-gated temporal graph convolutional neural network (A3T-GCN). They explained their choice to combine the analytical capabilities of graph CNNs for spatial relationships with gated recurrent units, a commonly used layer in complex temporal modeling.

Main Insights from Data

One team focussed on predicting the overall number of traffic jams over time, which they visualized using plotly (Figure 1).

Figure 1 - Visualization of the number of traffic jams in Lisbon over time.

They noted the cyclical nature of the traffic impact of rush hours shown by the high performance of the Last Cycle repeated benchmark and found the most successful approach given the available data to focus on time-based features only and deploying an LSTM model (Figure 4). This model reached a performance of 0.24 loss MSE / 0.31 metric MAE, clearly outperforming the baseline models.

The team focussing on rush-hour movement patterns performed extensive exploratory data analysis and visualization using hvplot and geopandas in addition to plotly (Figure 2).

Figure 2 - Heatmap of the average number of distinct terminals in the grid over time.

Figure 3 - A Local Moran map of Lisbon using the LISA cluster analysis on the density of distinct mobile phone terminals in a grid during a typical rush hour. Red areas show “high-high” groupings, meaning these grids showed high values and were surrounded by high values; conversely, blue areas are “low-low” groupings, areas with few mobile phones surrounded by areas with few mobile phones.

In their EDA, this team determined that the Avenidas Novas, Arroios, São Domingos de Benfica, Santo António, and Parque das Nações neighborhoods show the highest people density during rush hours and the highways with the most incoming and outgoing traffic were IC19, IC2 and IC16, traffic jam occurrence generally correlated with areas of high terminal density. In general, the flow of people appeared directed towards and from the very center area of Lisbon and seemed to mainly stem from Portuguese phone numbers during common commuting times.


The products proposed by teams were as diverse as their approaches.

One team designed a tool for urban planning professionals aiming to predict high traffic impact (Figure XX) and possible effects of changes in transportation infrastructure and approaches. Users with no modeling experience would be able to quickly assess and visualize the impact of proposed traffic interventions guiding decision-making and governance.

Figure 4 - Testing predictions for traffic jams in Lisbon with an LSTM model.

An analytics web app, “Peak Analytics”, was proposed by another team with the goal to help forecast the movement of people during events that commonly cause increases in traffic like music festivals or soccer games. This product was aimed at government officials to compare plans with past events and get predictions on expected traffic and air in the planning of reallocating buses and when to incentivize commuters to travel off-peak in public transport.

Another team focussing on analyzing existing traffic flows proposed to create a dashboard available to the public to assist in individual commute planning, giving agency to the citizens of Lisbon and providing the tools to avoid high flow times and areas. They also floated the idea of gamification, incorporating their insights into a traffic app and giving points to users who choose to travel less crowded roads.

Social Impact

Following the city of Lisbon’s overall goal of promoting more sustainable modes of transportation and improving the quality of life of its citizens, the common denominator of the products proposed was a shift from individual to public transportation as well as a reduction of traffic-related friction like traffic jams in the city.

One team elaborated that predicting traffic jams could aid in planning the intensification of public transit routes. By providing targeted public transit, CO2 emissions could be reduced by 20% per passenger km. Additionally, a predictive model could help identify the ideal locations to implement traffic flow measures, reducing the time spent in jams, reducing CO2 emissions and improving quality of life and air.

Being able to predict traffic flows due to events using a tool like “Peak Analytics” could both lead to improved extra availability of public transportation and to understanding traffic load during rush hours in general, aiding in creating campaigns to shift commuters to travel in off-peak periods using incentives like random rewards, personalized offers.

A publicly available dashboard or app could help Lisbon residents and tourists to make informed decisions that make their commutes efficient and reduce travel time spent per individual user.

Open-source code

Other challenges