Internationales Verkehrswesen
iv
0020-9511
expert verlag Tübingen
10.24053/IV-2023-0093
101
2023
75Collection
Unlocking the potential of Google’s mobility data
101
2023
Benno Benjamin Bock
Robert Schönduwe
The widespread adoption of smartphones has facilitated the collection of multimodal mobility data. Google Location History (GLH) has gained considerable popularity and has a large user base. This article discusses the importance of GLH data and illustrates its value by identifying specific use cases. It also presents ongoing initiatives in which individuals donate GLH data for research purposes. In particular, the adequacy of the collected data is validated, demonstrating their reliability and suitability for rigorous analyses.
iv75Collection0014
International Transportation | Collection 2023 14 BEST PRACTICE Mobility data Unlocking the potential of Google’s mobility data Tracking, Travel survey, Mobility demand, Google location history The widespread adoption of smartphones has facilitated the collection of multimodal mobility data. Google Location History (GLH) has gained considerable popularity and has a large user base. This article discusses the importance of GLH data and illustrates its value by identifying specific use cases. It also presents ongoing initiatives in which individuals donate GLH data for research purposes. In particular, the adequacy of the collected data is validated, demonstrating their reliability and suitability for rigorous analyses. Benno Benjamin Bock, Robert Schönduwe O ur mission is to organise the world’s information and make it universally accessible and useful [1] - this is Google’s mission statement. One domain with a pressing need for high-quality information is the transportation sector. Smartphones enable continuous collection of multimodal travel data making them a valuable solution. Google particularly is collecting such data as evidenced by its timeline function. Meanwhile, a lack of reliable and representative knowledge on individual mobility patterns for transportation analysis persists. Traditionally, such data is mainly collected through irregular surveys using diverse questionnaire-based approaches. [2] Smartphone-based surveys are only conducted in few cases. [3] Therefore, it is crucial to explore the potential of Google’s (and other big tech companies’) treasure trove of mobility data. This paper aims to showcase data donation approaches and provide an initial assessment of the data quality. Continuous GPS-based data collection, exemplified by monomodal methods like floating car data, showcases the effectiveness of passive data collection. [4] Companies like Inrix and TomTom demonstrate how this data can be made accessible to a wider group of practitioners. Mobile network data extends the data’s coverage to further trips, but the mode detection comes with high uncertainty. [5] However, to achieve the objectives of the mobility transition (‘Verkehrswende’ in German), continuous collection of multimodal data representing all transport modes is necessary. A multimodal mobility monitor is needed to gather, process and make the data universally accessible and usable. Foto: PhotoMix-Company/ pixabay BEST PRACTICE Mobility data International Transportation | Collection 2023 15 Mobility data BEST PRACTICE In principle, such a monitor would align seamlessly with Google’s mission statement and is likely achievable leveraging its databases. On big tech’s trail Google’s data has vast potential in providing comprehensive information on mobility demand. It offers detailed insights into individual mobility, including origins, destinations, routes, and mode types. This data can be utilised to compute key performance indicators like modal split values for cities (cf. Table 1) or origin-destination tables. Moreover, the spatial and temporal coverage of Google’s data is unparalleled. With the continuous and widespread availability of Google Maps’ timeline function, it is likely the most comprehensive source of actual mobility demand data available. Currently, access to Google’s mobility data for research and planning purposes is limited. The Covid-19 open data repository, available until 2022, provided some mobility data. [6] Furthermore, the Google Environmental Insights Explorer [7] offers modal split data for various cities worldwide from 2018 to 2022. However, access to this data is restricted to council employees, requiring an application process. Given the limited accessibility to this data, what can we say about the data collection and quality? It’s likely that readers of this article have a Google account and can access their personal mobility timeline via Google Maps. Surveys suggest that approx. 50 % of smartphone users have Google Location History (GLH) activated. [8] When the Google Maps app is installed and location history recording is enabled, it consistently captures movement profiles linked to the users’ Google account. This data is transferred seamlessly with the account; ensuring continuity even when switching devices or logging in on different devices. The GLH data offers detailed and accurate insights into mobility demand data collected via smartphone. In principle, GLH data can be considered as continuous and comprehensive tracking, comparable to smartphone tracking. [9, 10, 11] It serves as a valuable basis for generating movement profiles and travel/ activity diaries. Users access their data through Google Maps, either via the mobile app (see Figure 1) or the website (see Figure 2). The data is presented as a personalised travel diary, showcasing activities, locations and routes. Also, users can view their trajectories on maps and in tabular form, showing origins, destinations routes and any intermediate stops like transit stations. Additional insights can be explored through tabs like ‘Statistics’ and ‘Places’, offering overviews of distances travelled by different mode types, or locations visited. GLH data has been collected globally since 2012 [12], primarily from Android users through an opt-out mechanism. In Germany, explicit opt-in consent is required for data collection. Users provide this consent when installing Google Maps or adjusting settings on their smartphones or Google accounts. The comparatively long services lifespan enables users to access data from the previous decade - a time before introduction of services like Uber, eScooters and the Deutschlandticket in Germany. Users can export their GLH data using the Google Takeout website, designed specifically for exporting Google data. The exported GLH data is spread across multiple JSON files, encompassing over 100 attributes and 38 mode types (referred to as ‘activity types’ by Google). For a comprehensive overview of the exported data format, Bergillos offers valuable insights. [13] GLH data use cases GLH data, containing geospatial information, time data, and modes of transport, holds significant potential for transport policy and planning. However, discussions regarding aggregated GLH data are currently limited primarily to research and academia. To assess the data’s potential, it’s essential to differentiate between three levels: (1) individual GLH data sets, (2) aggregated GLH data from sample populations, and (3) hypothetical aggregated GLH data from the entire user population. The differentiation provides a comprehensive understanding of the data’s scope and its implications for research and analysis in the field. The relevant use-cases for this audience include urban and transportation planning, and social sciences. Decoding traffic patterns, investigating congestion hotspots, and monitoring travel behaviour are possible. The data enhances the understanding of people’s movement and lifestyle in urban and rural spaces. Ruktanonchai also highlights the potential for fighting infectious diseases and responding to catastrophic events. [8] Additionally, commercial usage in market research seems logical as the data can provide insights into consumer behaviour, shopping patterns, and location or mobility preferences of certain target groups. This is the most likely path for Google to generate added value in-house. Small-scale projects showcase the potential insights that can be derived from individual GLH data, providing a glimpse into its possibilities. [14, 15]: Some have provided technical assistance in formatting and processing the data such as the GitHub-user GmoncayoCodes. [16] Just a few years after GLH data became available, attention was focused on how smartphone tracking could establish an individual mobility feedback system. [17] Figures 3 and 4 show the potential of the data for longitudinal surveys of mobility patterns. The left chart shows all trips made by one of the authors over the past year, categorised by the four main mode types. It reveals mixed usage of all four modes, highlighting a seasonal preference for bicycles with a minimum in December. Notably, a significant decrease in trip volume is observed in September 2022, corresponding to a Covid-19 related quarantine period. This example supports the assumption that the data can be used to estimate the impact of the recent pandemic on mobility. The right chart presents the same data as passenger-kilometres per mode type, exhibiting higher overall fluctuation. Similar to studies involving data donations from tracking [18, 19, 20], GLH data can be collected by a group of individuals Figure 1: Screenshot of Google’s Timeline in an Android System Sources of all pictures: Catchment 2023 International Transportation | Collection 2023 16 BEST PRACTICE Mobility data willing to share their data. In German mobility research, there are very few published studies of this kind. The authors conducted proof-of-concept studies - in collaboration with a local transport authority and a university class. The generated GLH datasets allowed for addressing important questions: How did the sample react to certain events? What does the use of transport modes look like in certain areas or times? How can the accessibility and transport connections of places be described? The sample size can be tailored to the research question, with small surveys providing insights into specific sites or behaviours, and large (representative) studies offering general insights like modal split development. Aggregating GLH data from consenting users can yield robust mobility-related KPIs. This approach has the potential to address challenges like revenue sharing for the Deutschlandticket at a decisive and nationwide scale, as demonstrated in the xMND research project with mobile network data. [21] Our approach to gather GLH data Currently, external access Google’s GLH data is challenging. However, a potential workaround is to design a data donation process. The GLH data donation procedure can be divided into the following steps: 1. Survey design including data protection concept 2. Recruitment and onboarding of participants 3. Preparation of GLH data collection 4. Optional: Definition of a survey period 5. Export of GLH data 6. Evaluation and further analysis of GLH data For this approach, developing a data protection concept in collaboration with a data protection officer is crucial. It is necessary to prepare a participant declaration that complies with data protection requirements. Once these prerequisites are met, the recruitment and onboarding of participants can begin. Participants in the survey group are requested to adjust their Google settings for a specified period, if the optional survey period is chosen. If needed, the users’ settings can be configured in a joint workshop. Participant engagement is crucial for the success of the study. Given the numerous services and functionalities in the Google ecosystem, clear instructions are necessary for user interactions like activation or data export. It’s also important to provide separate guidance for Android and iOS users to enable the location history function. Once activated, users can easily access the generated data via the Google Maps app, which offers a simple menu navigation and a user-friendly experience. Visualising and editing proposed routes is also straightforward, although the complexity of editing options may require additional instructions for survey participants to ensure consistent data quality. It might be beneficial to schedule two exports: one ‘raw’ and one ‘edited’ data export with intermediate validation. This approach would provide two comparative datasets, enabling insights into the changes made during the validation process. During the survey, various experiments can be conducted based on the research question. However, for traffic-related survey, it is important to maintain the participants’ smartphone and mobility behaviour unchanged. Data quality The meaningful use of GLH data is supported by data quality, which can be categorised into the following areas: Figure 2: Screenshot from the Google Maps web-application Figures 3 and4: Modal-split high-chart of one person in one year (left: trips, right: pkm) International Transportation | Collection 2023 17 Mobility data BEST PRACTICE •• Quality of geolocation •• Quality of route detection incl. location recognition •• Quality of means of transport recognition •• Quality of temporal assignment It’s important to note that the assessment of the data quality is limited to a snapshot in time. It is expected that Google, as the producer and owner of the data, is constantly working on improving the system. The use of AI holds potential for significant advancements in data quality, particularly due to the verification provided by users, which serves as a valuable training data set. Among the international studies assessing the quality of GLH data, the study conducted by the Netherlands Forensic Institute is noteworthy. [22] The study advises that “Google locations and their accuracies should not be used in a definite way to determine the location of a mobile device”. In a walkability study, Lindquist found that approx. Two-thirds of participants contributed valuable datasets, while the remaining participants provided infrequent GLH data due to variations in smartphone settings.-[23] In public transport data, certain gaps or stochastic changes can be observed, particularly for individual modes such as ‘bus’. Google acknowledges: “These changes are the result of improved inference models that better distinguish between modes. Overall, these changes improve the accuracy and usability of the emission estimates in the long term.” [7] Similar issues are present regarding data completeness. For instance, bus or motorbike shares may not be displayed for every year. In a recent comparison with modal split information from the German travel survey Mobilität in Deutschland (MiD) (see Table 1), one of the authors found that Google data from the Google Environmental Insights Explorer consistently underestimates bicycle mode shares compared to MiD, while walking occurs more frequently in Google. [24] Overall assessment of Google data Based on the information available for this article, the value of the mobility demand data that Google could provide is undeniable. It is unfortunate that the company chooses to restrict access to this valuable information beyond government employees. Meanwhile, stakeholders dedicate considerable resources to obtain mobility data through travel surveys. The unmatched spatial and historical coverage of Google’s GLH data presents an opportunity for innovation in the transport sector. The use-cases demonstrate that the interest extends beyond transport-related inquiries. Like other data sources for mobility demand, GLH data has both advantages and disadvantages, along with its own peculiarities. The main concern the authors highlight is the limited understanding of trips and leg definitions, identification of mode types and activities, and aggregation methods used to estimate global values like modal split. As early as 2016, Lindquist summarised that “[…] researchers relying on these data must be prepared for unanticipated changes in the data collection process […]”. [23] At the time of writing, little has changed in this regard. In summary, the potential gains from exploring GLH data outweigh the potential drawbacks. We recommend the mobility community to delve further into this data source for their purposes and encourage tech giants to provide open access to their data for the benefit of research and society. ■ SOURCES [1] https: / / about.google.com (access: 18th July 2023). [2] Lanzendorf, M.; Schönduwe, R. (2018): Datenerhebungen zur Erfassung des Mobilitätsverhaltens. In: Handbuch der kommunalen Verkehrsplanung, S. 1-24. [3] Pronello, C.; Kumawat, P. (2021): Smartphone Applications Developed to Collect Mobility Data: A Review and SWOT Analysis. In: Arai, K.; Kapoor, S.; Bhatia, R. (Eds.): Intelligent Systems and Applications. Springer, pp. 449-467. [4] Bock, B.; Schönduwe, R. (2021): Black-Box Mobility. In: WZB Discussion Paper, https: / / bibliothek.wzb.eu/ pdf/ 2021/ iii21-601.pdf (access: July 2023). [5] Harrison, F. D.; Duke, W.; Eldred, J.; Pack, M.; Ivanov, N.; Crosset, J.; Chan, L. (2019): Management and Use of Data for Transportation Performance Management: Guide for Practitioners. [6] https: / / google.com/ covid19/ mobility/ (access: 18th July 2023). [7] https: / / insights.sustainability.google/ (access: 18th July 2023). [8] Ruktanonchai, N. W.; Ruktanonchai, C. W.; Floyd, J. R.; Tatem, A. J. (2018): Using Google Location History data to quantify fine-scale human mobility. In: Int. J. Health Geogr., 17, p. 28. [9] https: / / motion-tag.com (access: 18th July 2023). [10] https: / / posmo.coop (access: 18th July 2023). [11] https: / / www.trivectorsystem.se (access: 18th July 2023). [12] MacLean, D.; Komatineni, S.; Allen, G. (2015): Exploring Maps and Location-Based Services. In: Pro Android 5. Apress, Berkeley, CA, pp. 405-449. [13] https: / / locationhistoryformat.com/ (access: 18th July 2023). [14] https: / / www.achim-tack.org/ coronayear (access: 18th July 2023). [15] https: / / medium.com/ @ggonzalezzabala/ graph-your-own-googlelocation-history-in-tableau-e362d1d8f18d (access: 18th July 2023). [16] https: / / github.com/ GmoncayoCodes/ ActivityPointLocationGenerator (access: 18th July 2023). [17] Sengupta, R.; Walker, J. L. (2015): Quantified traveler. Travel feedback meets the cloud to change behaviour. Access 47, 3-7. [18] https: / / movinglab.dlr.de/ en/ (access: 18th July 2023). [19] https: / / www.freemove.space/ (access: 18th July 2023). [20] Kapp, A. (2022): Collection, usage and privacy of mobility data in the enterprise and public administrations. In: Proceedings on Privacy Enhancing Technologies. [21] MotionTag (2021): Gemeinsamer Endbericht Extended Mobile Network Data. Projektendbericht xMND-Projekt. [22] Rodriguez, A.; Tiberius, C; van Bree, R.; Geradts, Z. (2018): Google timeline accuracy assessment and error prediction. In: Forensic Sciences Research, 3: 3, pp. 240-255, DOI: 10.1080/ 20961790.2018.1509187 [23] Lindquist, M.; Galpern, P. (2016): Crowdsourcing (in) Voluntary Citizen Geospatial Data from Google Android Smartphones. In: Journal of Digital Landscape Architecture. 1., pp. 263-272. [24] https: / / catchment.de/ blog_google_s_modal_split_de.html (access: 18th july 2023). Modal Split Source Year Car Pedestrian Bicycle PT Berlin Google 2018 28 39 7 27 MiD 2017 34 27 15 25 Hamburg Google 2018 36 36 7 21 MiD 2017 36 27 15 22 Bremen (city) Google 2018 45 28 12 15 Bremen (state) MiD 2017 39 25 21 14 München Google 2018 35 29 8 29 MiD 2017 34 24 18 24 Stuttgart Google 2018 35 34 3 28 MiD 2017 40 29 8 23 Table 1: Synopsis of Google shares 2018 and Modal split figures from MID 2017 Benno Benjamin Bock Founder and CEO, Catchment GmbH, Berlin (DE) benno@catchment.de Robert Schönduwe, Dr. Guest lecturer, Technical University, Berlin (DE) schoenduwe@h2-mobility.de
