Managing road safety through the use of linked data and heat maps

Road traffic injuries are a critical public health challenge that requires valuable efforts for effective and sustainable prevention. Worldwide, an estimated 1.2 million people are killed in road crashes each year and as many as 50 million are injured. An analysis of data provided by authoritative sources can be a valuable source for understanding which are the most critical points on the road network. The aim of this paper is to discover data about road accidents in Italy and to provide useful visualization for improving road safety. Starting from the annual report of road accidents of the Automobile Club of Italy, we transform the original data into an RDF dataset according to the Linked Open Data principles and connect it to external datasets. Then, an integration with Open Street Map allows to display the accident data on a map. Here, the final user is able to identify which road sections are most critical based on the number of deaths, injuries or accidents.


INTRODUCTION
Road traffic injuries claim more than 1.2 million lives each year worldwide and slightly over 25 thousand within the European Union 1 .Road accidents are the leading cause of death among young people, and cost governments approximately 3% of GDP [14].Of all the systems with which people have to deal every day, road traffic systems are the most complex and the most dangerous.The road death toll in 2014 was more than 23 times the total number of fatalities in rail and air transport combined.Increasing the safety in transport is also one of the priority identified by the European Commission for H2020 projects 2 .Actually, the European Commission has adopted an ambitious Road Safety Programme 3 which aims to cut road deaths in Europe in half between 2010 and 2020.
The evaluation of the risk of losing one's live in a road traffic accident is best expressed as a ratio of the number of fatalities per billion vehicle-kilometers.Unfortunately, the traffic performance is not available for all EU country.Therefore, the ratio per million inhabitants has been taken as a proxy by the EU Commission.The number of traffic accidents per million inhabitants for each European country and the European average number are reported in Figure 1.In Italy, the road safety is in line with the average situation in Europe (56.4 fatalities per million inhabitants in 2016 as reported Figure 1).Road traffic accidents decreased by 17% between 2010 and 2015.However, from 2013 till 2015 this downward trend has come to a standstill as the total number of fatalities registered in 2014 and 2015 remained at the same level as in 2013 4 .
The use of technology to improve safety is one major goal of future smart cities. Finding and display accurate road accidents data, see the year-to-year evolution on a road map can be a strategic decision-making tool for local public administrators to increase the road network safety.
The aim of the paper is twofold.Firstly, we want to identify accurate and punctual data about Italian road accidents and deliver these information in a way that is machine readable, easily accessible and reusable.Linked Open Data (LOD) principles can be a solution for standardization and a method of publishing structured data using standard Web technologies.Secondly, we focus on providing good visualizations that allow a better understanding of factors that affect the crash occurrence and severity.
For this reason, we developed an application that displays street heat maps able to highlight which road sections are more critical.On a map users could easily deduce information such as if motorways are safer than secondary roads, which streets are safer in a defined area, which paths are more secure traveling from city A to city B, where should administrations focus for improving the quality of road infrastructure, and so on.
The implementation process has been divided into four steps: (1) Data collection (2) Ontology creation (3) Integration (4) Data visualization Section 2 will introduce some related work, then we will describe the datasets of interest in Section 3. Section 4 describes the process of refinement of a road accident dataset according to the Linked open data principles.The integration with other datasets and the visualization of the information on street heat maps are shown respectively in Section 5 and 6.Section 7 reports our conclusion and insights for future improvements.

RELATED WORK
The need to explore and analyze transport data is reflected in several actions and research efforts in the computer science community.Smart mobility solutions have been developed with aim at reducing congestion and fostering faster, greener and cheaper transportation options.Among these solutions, we find systems that use advances in wireless networking and sensor technologies for preventing or automatically detecting incidents [10, 11].The huge amount of data about movement of vehicles provided by GPS and RFID can also be used for creating interactive visual analytics dashboard [8], for monitoring road safety [6], for increasing the understanding of road accidents [4], and also for predicting road accidents based on historical data [7].
On the other hand, the production of RDF data requires capable tools.In literature, we found Linked Data Integration Framework (LDIF) 5  [12] and Open Data Clean Store (ODCS) [9] that are able to manage the entire process of cleaning, linking, assessing the quality and provenance of data and providing aggregated views.

DATA COLLECTION
In our project, firstly, we focused on the identification of datasets of interest.A deep analysis on the data quality and granularity of each data source available has been conducted to verify if they could contribute to the project.Two national datasets that collect information on road traffic fatalities are analyzed: ISTAT and ACI.LingedGeoData and GeoNames have been studied for the creation of links to external sources.In the end, OpenStreetMap has been investigated to produce street heat map visualizations of the data.

The ISTAT statistics
In Italy, ISTAT, the Italian National Institute of Statistics (Istituto nazionale di statistica) collects data on all traffic accidents occurred on roads open to public traffic, in which at least one person was killed or injured and in which at least one moving vehicle was involved.The Police officers who arrive at the site of the accident are responsible for filling the appropriate official, structured form and for sending it to ISTAT.Information on ISTAT includes time and place of the accident, characteristics of the vehicles involved, sex and age of drivers, injured passengers and pedestrians.
The ISTAT statistics are organized per regions and provinces from 2001 to 2014.The main limitation of the dataset is the granularity of the data.The information about accidents are grouped by region.Therefore, it is not possible to have a precise view of the data, i.e. to know on what street the major accidents have occurred.This limitation is not present in the ACI dataset, which is illustrated in the following.

The ACI open data
The ACI, Automobile Club of Italy (Automobile Club d'Italia), is a non-profit public organization, which institutionally represents and protects the general interests of Italian motoring.
The ACI missions are twofold: • Protecting all aspects of mobility: by adapting to the needs of the automotive world, in all its environmental, social and economic forms.ACI provides protection, experience and professionalism to citizens in defending the right to mobility; • Spreading a new car culture: by promoting and disseminating a new approach to mobility, that highlight the responsibilities of each citizen and pushes towards ethical and sustainable attitudes of moving.The ACI annual report shows the number of road accidents, deaths and injuries happened on each road at the provincial or municipal level, as well as other specific characteristics, such as type of vehicles, presence of heavy vehicles or two-wheelers, time slot, month.These statistical data are available as ODS (OpenDocument Spreasheet) files and are freely usable by anyone under the Creative Commons CC-BY 3.0 license 6 .A temporal classification, by month, day, time, and a spatial classification, by province, municipality and region are also provided.
In Table 1, a sample of the ACI report is shown.The "cReg", "cProv" and "cCom" fields identify, respectively, the region, the province and the municipality according to the ISTAT classification, while fields "Inc", "IM", "Mor" and "Fer" represent the number of accidents, fatal accidents, fatalities and injuries.These data depict the situation of accidents in each street within a municipality.
For the project, we used the spatial data classification of the ACI dataset available for the period 2011-2014.In particular we focus our analysis on the Emilia-Romagna region, even if the implemented process can be replicated for the entire dataset.

OpenStreetMap
OpenStreetMap (OSM) 7 is a collaborative project that aim to create a free editable map of the world.The maps are created by using data from portable GPS devices, aerial photography and other free sources.Registered users can upload and edit the vector data by using a number of editing tools developed by the OSM community.
The OSM dataset is available for downloading under a Creative Commons Attribution-ShareAlike 2.0 license.
The OSM data can be accessed, queried and edited by using a REST API, which basically uses HTTP GET, PUT and DELETE requests.REST API has limitations in terms of the amount of data that may be required in a single query.We made use of the open database geofabrik.de,a service that extracts and collects free geodata from OpenStreeMap, to extract the road network of the Emilia-Romagna region.In the geofabrick database, the informations about a country are collected in one single large file.The geographical information are provided by a series of tags that identify the type of entity described, for example a road accessible by vehicles is identified with the "highway" tags.By using Osmfilter, a free tool provided by OSM and by entering latitude and longitude that charachterized the Emilia-Romagna boundaries, we were able to extract the road network of the region of interest.

LinkedGeoData
LinkedGeoData 8 [13] uses the information collected by the Open-StreetMap project and makes it available as an RDF knowledge base according to the Linked Data principles.The LinkedGeoData data set is a large spatial knowledge base which has been derived from Open Street Map.The last version has been released in January 2016 with more than 1.2 billion triples based on the OpenStreetMap full planet file of November 2015.

GeoNames
The GeoNames9 geographical database covers all countries in the world and contains over eleven million placenames.The GeoNames database is available and accessible through various web services and it is also available for download in a daily database export.It is free of charge under a creative commons attribution license.It contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places and 5.5 million alternate names.Each feature of the database is represented as a web resource identified by a stable URI, that provides access, to the HTML page, or the RDF (Resource Description Framework) description of the feature.Features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes.
GeoNames is integrating geographical data such as names of places in various languages, elevation, population and others from various sources.All lat/long coordinates are in the standard WGS84 (World Geodetic System 1984).Users may manually edit, correct and add new names using a user friendly wiki interface.
Linking the ACI dataset to GeoNames is unavoidable in order to create a proper 5-stars dataset.

REALIZING A 5-STARS DATASET
This section explains the modeling principles we followed in order to refine the ACI dataset according to the 5-star deployment scheme for the Linked Open Data proposed by Berners-Lee [3].
The first star necessitates providing data available on the Web.The second star asks to structure the pubblished data.The third star requires to use a non-proprietary open format.The fourth star demands to use URIs to denote things.The fifth star claims to link your data to other datasets to provide context.
Looking at the star system, the original ACI dataset was credited as a three stars for the open license, the structured and nonproprietary format.In the new version, the use of URIs and the links to other external data grant the fourth and fifth star.
We call the new dataset "ACI 5-stars dataset".This is an RDF dataset that contains all the original statistical data published as open data by ACI.
In the first subsection,we describe the structure of the ACI statistics that has been modeled through an ontology.In the second subsection, we elucidate the choice of links to connect the new dataset to the LOD cloud.

Ontology Creation
The data and vocabulary of the ACI 5-stars dataset are designed to facilitate use, re-usability, and interoperability.To promote uptake, querying the data should be as straightforward as possible.For this reason, the core of the model is a direct translation of the structure of the statistics in the ACI dataset, intuitive names are chosen for properties and classes.
For the creation of the ontology we used a renowned ontology editor called Protégé 10 .The locations are organized according to the ISTAT classification for administrative units such as regions, provinces, municipalities.The dataset contains the road accident statistics collected and published by ACI from 2011 to 2014.
As depicted in Figure 5, the defined classes of the ACI ontology are: • "Place" which is a location connected to the GeoNames and Dbpedia via the "owl:sameas" property.It has three subclasses: "Region", "Province", "Municipality"; the property "is_in" is used to connect two places, for example the municipality of Modena "is_in" the province of Modena which "is_in" the Emilia-Romagna region.• "Street" which refers to a street in the ACI dataset; • "Section" which refers to a street section in the ACI dataset, each Section can be linked to a Way in the LinkedGeoData.Since section in the ACI dataset does not exactly match the instances of LinkedGeoData we used the "skos:related" property to connect these instances; • "Data" which collects all instances of each data related to a particular street with the property "of_road", and to a specific municipality, with the property "in_municipality".
The metrics of the ontology created are shown in Figure 6.Currently, the ACI 5-stars dataset is stored locally 11 , we are working to use OpenLink Virtuoso 12 as a triple store and SPARQL query service.

Links to the LOD cloud
In order to achieve the fifth star, the new dataset must be connected to the LOD cloud with links to external knowledge sources.Actually, we have performed the connection with external resources by linking each place in the dataset to the corresponding URI in GeoNames.The owl:sameAs has been used to connect the instances.The linking process is not finished yet.We are searching for other datasets that might be a good connection for the ACI 5-stars dataset.

INTEGRATION
The visualization of road accidents statics as a street heat map is possible only if the ACI and OSM data are joined together.Several problems arise during the integration of the two datasets.
Firstly, the data on the ACI dataset regard statistics on road sections within municipalities, while the data about streets on OSM are not split for the different municipalities.To deal with this issues, we add polygons describing the boundaries of the municipalities in the integration process (see Figure 4).We mapped each street in OSM to its municipality, then we cut the street routes in different portions according to the municipality boundaries and we assigned for each street portion the respective municipality.In the end, we obtained the OSM data enriched with the municipality feature.
The second issue regards a conflict when mapping street codes from the ACI to the OSM dataset.These inconsistencies where caused by a renaming process that affects some streets in Italy.On the ACI dataset (lastly updated in 2014), some roads appears with an "old" identification name that differ from the one in OSM.This was caused by some renaming procedures that affect streets during the years.Luckily, for some streets, OSM report the historical/old name in the "old_name" field.For each street in the ACI dataset a python script was run to check if there were corresponding streets in OSM (by looking at both current and old street names), in the end, it returns a list of "old" ACI roads.Some of them are not present in OSM (roughly the 5%) and some other match the old street name in OSM (roughly the 10%).We manually solved the missing streets by looking at additional sources like wikipedia and ISTAT.
In the end, the integration is solved by merging the ACI statistics with the geographic data of OSM streets split for each municipality.ACI statistics are obtained by querying the ACI 5-stars dataset.In listing 2, the query used to extract the accident statistics of 2014 is shown.For each street code and municipality, it reports the number of accidents, fatal accidents, fatalities and injuries.

DATA VISUALIZATION
Information Visualization is defined as "The use of computer supported, interactive, visual representations of abstract data to amplify cognition" [5].Heat maps use color as a data visualization tool.A street heat map refers to map visualization with color streets representing a 3rd dimension.
We developed our application 13 to visualize the most critical sections on the road traffic network of the Emilia-Romagna region.The application, as shown in Figure 8, presents a map where a street's heat is a function of the number of road accidents it is characterized by.We performed data visualization through Polymaps14 , a JavaScript library for making dynamic, interactive maps that can be displayed in browsers.With Polymaps, it is possible to set colors for routes and paths according to the data coming from the ACI dataset; the visualization is displayed in an HTML page containing a map of the Emilia-Romagna region.Figure 8 shows an example of visualization obtained from the query shown in listing 2. As it can be seen, the street network of the region is colored with different colors on different road sections.From yellow to orange and red to indicate the increasing rate of accidents on each road section.
Figure 7 gives insights of how we can analyze a particular area by using the four street heat maps of the corresponding statics (number of accidents, fatal accidents, fatalities, injuries).By clicking on the colored streets in the map (as shown in Figure 7 part (a)), the name of the street is shown.To aid in understanding, we have also reported a map of the main roads around the city of Modena in Figure 9.This Figure highlights the highway, the ring road and on one of the main streets, called "Via Emilia", that cross the city center.Looking at Figure 7 part (a), it is easy to recognize that we have less accidents on the highway and on "Via Emilia" than on the ring road around Modena.However the number of fatal accidents (Figure 7 part (b)) is higher on the highway than on the ring road or "via Emilia" street.This means that, even if the accidents on the ring road happens more often than on the highway, they are of less gravity and the persons involved are not seriously injured.Indeed, looking at part (c) and (d) of the same Figure, we see that we have more fatalities on the highway, while the number of injured persons is higher on the ring road around the city.

CONCLUSION AND FUTURE WORK
This paper has reported the process to refine a road accident open dataset into a Linked Open Data version and a procedure to display these data through the visualization of street heat maps.The produced ACI 5-stars dataset can be easily queried and integrated with other data thanks to its interlinks with external knowledge sources.The heat map visualization application is a useful tool to identify which road sections are most critical based on the number of deaths, injuries or accidents.It might be used by citizen and also by public authorities to analyze the road accident issues and undertake appropriate actions.
Lots of work remain undone.The process should be apply on the entire national ACI dataset, in order to produce a complete dataset and to provide a national visualizations of road accident statistics.
Moreover, the integration and visualization can be refined by adding useful indicators, such as the ratio of the number of fatalities per billion vehicle-kilometers or, in an approximation, if the traffic performance is not available at local level, the ratio per million inhabitants.
Exposing a SPARQL endpoint and providing navigation and visual querying functionalities allows all kind of users to query the ACI 5-stars dataset [1, 2].

Figure 1 :
Figure 1: Number of fatalities per million inhabitants in 2016 in EU28.Source: European Transport Safety Council (ETCS) annual PIN report.Year 2016.

Figure 2 :
Figure 2: Road fatalities in the EU since 2001.Source: CARE (EU road accidents database).

Figure 4 :
Figure 4: The integration and visualization of the ACI dataset.

Figure 7 :
Figure 7: Road accident statistics in the area around the city of Modena in 2014.

Figure 8 :
Figure 8: Accident rate visualizations on a map.

Figure 9 :
Figure 9: The main roads around the city of Modena.

•
Ways are ordered sequences of nodes.Depending on whether the first node equals the last one, a way is called closed or open, respectively.Closed ways are used to represent buildings or land areas.• Relations relate nodes, ways and potentially other relations to each other, thereby forming complex objects.Each entity participating in a relation plays a certain role in it.Example of relations are multipolygons.