Abstract: The nowadays ubiquitous location-aware mobile devices have contributed to the rapid growth of individual-level location data. Such data are usually collected by location-based service platforms as training data to improve their predictive models’ performance, but the collection of such data may raise public concerns about privacy issues. In this study, we introduce a privacy-preserving location recommendation framework based on a decentralized collaborative machine learning approach: federated learning. Compared with traditional centralized learning frameworks, we keep users’ data on their own devices and train the model locally so that their data remain private. The local model parameters are aggregated and updated through secure multiple-party computation to achieve collaborative learning among users while preserving privacy. Our framework also integrates information about transportation infrastructure, place safety, and flow-based spatial interaction to further improve recommendation accuracy. We further design two attack cases to examine the privacy protection effectiveness and robustness of the framework. The results show that our framework achieves a better balance on the privacy–utility trade-off compared with traditional centralized learning methods. The results and ensuing discussion offer new insights into privacy-preserving geospatial artificial intelligence and promote geoprivacy in location-based services.
ACKNOWLEDGMENT: We acknowledge the funding support provided by the American Family Insurance Data Science Institute Funding Initiative at the University of Wisconsin-Madison. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funder.
The COVID-19 pandemic is a global threat presenting health, economic, and social challenges that continue to escalate. Meta-population epidemic modeling studies in the susceptible–exposed–infectious–removed (SEIR) style have played important roles in informing public health policy making to mitigate the spread of COVID-19. These models typically rely on a key assumption on the homogeneity of the population. This assumption certainly cannot be expected to hold true in real situations; various geographic, socioeconomic, and cultural environments affect the behaviors that drive the spread of COVID-19 in different communities. What’s more, variation of intracounty environments creates spatial heterogeneity of transmission in different regions (e.g., varying peak infection timing). To address this issue, we develop a human mobility flow-augmented stochastic SEIR-style epidemic modeling framework with the ability to distinguish different regions and their corresponding behaviors. This modeling framework is then combined with data assimilation and machine learning techniques to reconstruct the historical growth trajectories of COVID-19 confirmed cases in two counties in Wisconsin. The associations between the spread of COVID-19 and business foot traffic, race and ethnicity, and age structure are then investigated. The results reveal that, in a college town (Dane County), the most important heterogeneity is age structure, while, in a large city area (Milwaukee County), racial and ethnic heterogeneity becomes more apparent. Scenario studies further indicate a strong response of the spread rate to various reopening policies, which suggests that policy makers may need to take these heterogeneities into account very carefully when designing policies for mitigating the ongoing spread of COVID-19 and reopening.
Since its outbreak in December 2019, the novel coronavirus 2019 (COVID-19) has spread to 191 countries and caused millions of deaths. Many countries have experienced multiple epidemic waves and faced containment pressures from both domestic and international transmission. In this study, we conduct a multiscale geographic analysis of the spread of COVID-19 in a policy-influenced dynamic network to quantify COVID-19 importation risk under different policy scenarios using evidence from China. Our spatial dynamic panel data (SDPD) model explicitly distinguishes the effects of travel flows from the effects of transmissibility within cities, across cities, and across national borders. We find that within-city transmission was the dominant transmission mechanism in China at the beginning of the outbreak and that all domestic transmission mechanisms were muted or significantly weakened before importation posed a threat. We identify effective containment policies by matching the change points of domestic and importation transmissibility parameters to the timing of various interventions. Our simulations suggest that importation risk is limited when domestic transmission is under control, but that cumulative cases would have been almost 13 times higher if domestic transmissibility had resurged to its precontainment level after importation and 32 times higher if domestic transmissibility had remained at its precontainment level since the outbreak. Our findings provide practical insights into infectious disease containment and call for collaborative and coordinated global suppression efforts.
Abstract: The availability and use of geographic information technologies and data for describing the patterns and processes operating on or near the Earth’s surface have grown substantially during the past fifty years. The number of geographic information systems software packages and algorithms has also grown quickly during this period, fueled by rapid advances in computing and the explosive growth in the availability of digital data describing specific phenomena. Geographic information scientists therefore increasingly find themselves choosing between multiple software suites and algorithms to execute specific analysis, modeling, and visualization tasks in environmental applications today. This is a major challenge because it is often difficult to assess the efficacy of the candidate software platforms and algorithms when used in specific applications and study areas, which often generate different results. The subtleties and issues that characterize the field of geomorphometry are used here to document the need for (1) theoretically based software and algorithms; (2) new methods for the collection of provenance information about the data and code along with application context knowledge; and (3) new protocols for distributing this information and knowledge along with the data and code. This article discusses the progress and enduring challenges connected with these outcomes.
New Protocols for Distributing the Data and Code of Geospatial Research
Here, we propose a five-star practical guide for sharing data and code in geospatial research, modeled after the five-star system offered by Berners-Lee (2009) for publishing linked open data on the Web. Instead of asking researchers to share all pieces of data and code, this five-star guide encourages a simple start of data and code sharing, and researchers can move to a higher level when time and other resources allow.
Abstract: The Huff model has been widely used in location‐based business analysis to delineate a trade area containing a store’s potential customers. Calibrating the Huff model and its extensions requires empirical location visit data. Many studies rely on labor‐intensive surveys. With the increasing availability of mobile devices, users in location‐based platforms share rich multimedia information about their locations at a fine spatio‐temporal resolution, which offers opportunities for business intelligence. In this research, we present a time‐aware dynamic Huff model (T‐Huff) for location‐based market share analysis and calibrate this model using large‐scale store visit patterns based on mobile phone location data across the 10 most populated US cities. By comparing the hourly visit patterns of two types of stores, we demonstrate that the calibrated T‐Huff model is more accurate than the original Huff model in predicting the market share of different types of business (e.g., supermarkets versus department stores) over time. We also identify the regional variability where people in large metropolitan areas with a well‐developed transit system show less sensitivity to long‐distance visits. In addition, several socioeconomic and demographic factors (e.g., median household income) that potentially affect people’s visit decisions are examined and summarized.
Reference: Rao, J., Gao, S., Kang, Y., & Huang, Q. (2020). LSTM-TrajGAN: A Deep Learning Approach to Trajectory Privacy Protection. In the Proceedings of the 11th International Conference on Geographic Information Science (GIScience 2021), No. 12; pp. 12:1–12:17. DOI: 10.4230/LIPIcs.GIScience.2021.12 [PDF]
Abstract: The prevalence of location-based services contributes to the explosive growth of individual-level trajectory data and raises public concerns about privacy issues. In this research, we propose a novel LSTM-TrajGAN approach, which is an end-to-end deep learning model to generate privacy-preserving synthetic trajectory data for data sharing and publication. We design a loss metric function TrajLoss to measure the trajectory similarity losses for model training and optimization. The model is evaluated on the trajectory-user-linking task on a real-world semantic trajectory dataset. Compared with other common geomasking methods, our model can better prevent users from being re-identified, and it also preserves essential spatial, temporal, and thematic characteristics of the real trajectory data. The model better balances the effectiveness of trajectory privacy protection and the utility for spatial and temporal analyses, which offers new insights into the GeoAI-powered privacy protection.
There has not been a time in the history of GIScience when movement analytics and mobility insights have played such an important role in policymaking as in today’s global responses to the COVID-19 crisis. This special section further builds on previous efforts by the editorial team and others from the GIScience community and beyond to advance the body of knowledge in Computational Movement Analysis (CMA). CMA generally refers to series methods and analytical approaches to process, structure, visualize and analyze tracking data and movement patterns to facilitate knowledge discovery and modeling of movement. Specifically, this special section was proposed as part of a pre-conference workshop on Analysis of Movement Data (AMD 2018) at the GIScience 2018 meeting, 28 August 2018, Melbourne, Australia. The focus of this special section is on three aspects of CMA: (1) representation and modeling of movement; (2) urban mobility analytics; and (3) movement analytics using social media data. With the papers presented in the special section, we highlight recent advancements in CMA with the development of methods and techniques for big movement data analytics and utilization of trajectories constructed using user-generated crowdsourced contents such as geo-tagged social media posts. Traditional CMA methods were often developed and evaluated using a smaller set of movement data involving smaller numbers of individuals and contextual variables.
As the momentum to generate more geo-enriched movement data at large volumes, high frequencies and for longer durations continues, this is a timely and significant achievement towards movement data science. As the papers of this special section illustrate, movement data science leverages the advancements in big data analytics, cyberinfrastructure, parallel computing and data fusion to enhance the analysis of large, multi-faceted and multi-sourced movement data. Below are the editorial and the six original papers presented in this special section on the International Journal of Geographical Information Science (IJGIS).
Moving forward, we see a clear need for more reproducible research in CMA, following a growing mega-trend in data-driven sciences. Data quality and privacy challenges as well as uncertainty in data, analytics, and modeling have been largely overlooked in the CMA literature so far. For a more responsible movement data science, careful considerations should be given to the quality, uncertainty and representativeness of ‘large’ mobility data that are being used for generating important mobility insights for policymaking. Lastly, with the recent exciting developments in data access, as a community, we should think about leveraging this advantage to make movement data science more relevant to real-world problems for the mitigation of societal and environmental challenges such as disease outbreaks, population mobility, natural hazards and human-wildlife conflicts.
As efforts to mitigate and suppress COVID-19 continue, many decision makers are asking if digital contact tracing—a method for determining contact between an infected individual and others using tracking systems commonly based on mobile devices—can help us safely transition from population-wide social distancing to targeted case-based interventions such as individualized self-quarantine. In response, the Spatial Analysis Research Center (SPARC) at Arizona State University organized a panel of national experts to discuss the use of geospatial technologies in digital contact tracing and identify the practical challenges researchers can address to make digital contact tracing as effective as possible.
The major themes of the discussion included (i) the capabilities and limitations of geospatial technology, (ii) privacy, and (iii) future research directions. Key takeaways from each of these areas include:
Capabilities and limitations of geospatial technology: There are many geospatial technologies (e.g., GPS, Bluetooth, Cellular, WiFi) embedded in mobile devices that can be leveraged for digital contact tracing. However, GPS technology in smartphones lacks accuracy to map interactions in the detailed way one might expect. For instance, the horizontal accuracy of GPS is 15m, and the vertical accuracy is insufficient to pick up which floor of a building a person is on. Indoor accuracy is particularly poor, which is problematic given people spend 87% of their time indoors. However, information about the absolute location of an individual may not be as important to digitally tracing epidemiologically meaningful contacts as identifying the types of interactions most likely to result in the spread of the virus. The importance of tracing interactions creates an opportunity to use Bluetooth-based exchange of encrypted keys to record person-to-person contacts that can then be analyzed within the space-time prism framework. This approach will not require storing of all individuals’ movement data, which will reduce computation complexity. Geotargeted and geotagged social media are useful for tracking transmission between cities or within cities, detecting large gatherings, and helping individuals recall location and contact history during contact tracing interviews. Social media can also provide useful context, such as check-in locations and textual content, to reduce false positives in interactions identified through other forms of digital contact tracing.
Privacy: Digital contact tracing raises numerous privacy concerns. By creating some record of the location history or contacts of an individual, digital contact tracing creates an opportunity to identify an individual without their consent. At present, the privacy implications of digital contact tracing are unclear because these systems have yet to be fully developed or deployed in the US. An evaluation of pros and cons in the existing digital contact tracing plans operating in other countries can inform policy makers on privacy mediation during and after contact tracing. While companies and officials working on this issue have made statements that preserving privacy is an important goal, the details of how privacy will be preserved and the safeguards that will be put in place are not yet available. If any privacy protections are lifted to enable contact tracing, a plan should be put in place to restore protections once the pandemic subsides.
Future Research: To support digital contact tracing and surveillance, several research areas must be advanced. Key technical areas include increasing the accuracy of indoor positioning, developing approaches for reducing false positive of potential exposure (not to be confused with false negatives which are more common in COVID-19 diagnostic test) ensuring a focus on high accuracy in relative positioning, addressing computational complexities, developing group or bubble based approaches to surveillance, and developing a system for the creation and distribution of high resolution risk data and to enable self-determination of the need of quarantine and testing based on possible exposure. Research into how digital contact tracing systems link with existing contact tracing infrastructure and with other digital contact tracing systems also needs to be conducted. The implications of digital contact tracing for society and privacy will emerge along with these systems. Researchers need to study these issues as they emerge to ensure that we have the ability to hold an informed public debate about the effectiveness and costs of digital contact tracing.
The travels and close contact-tracing from/to infected communities is useful for identifying potential hotspots and assessing the potential risk across different places. A recent research published in Science showed that “substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2) “. Understanding the human physical movement patterns and social contacts is a key for saving more lives as one may be surrounded by latent exposed people who don’t show SARS-CoV2 symptoms. Therefore, human mobility patterns and changes could be one indicator for understanding the status of physical social distancing. Here are the neighborhood mobility pattern and the Spring 2019 and March 2020 travel patterns for US cities and counties using the anonymized and aggregated mobile phone location big data in collaboration with SafeGraph, which covers over 3.6 million points of interest (POI) and business venues with visit patterns. Meanwhile, we are working on the whole US 2020 census block data and monitoring new infected areas from the CDC and from a list of Coronavirus dashboards in response to COVID-19.
You can find out where people from those POIs / neighborhoods / a county connecting with other neighborhoods and counties across the US. By comparing the POI visits between last March and March 2020, we can summarize the changes and visualize the patterns on the maps to understand whether people in each County/State has reacted to (Physical) Social Distancing.
In addition, the maps below show the origin-destination (OD) flows larger than a travel frequency threshold at different spatial scales. The one at the urban scale can help understand the potential spread and hotspots in a city/metropolitan area.
Spring Travel Risk
By using the county-level Spring travel data in March, we can see thousands of trips generated from the U.S. counties in the Spring season and widely across the U.S., which may help explain the rapid growth of infection cases across the whole U.S. Our travel-augmented SEIR epidemic modeling results showed that only about 20% of infected cases reported (with testing) at the state level in the US.
The following table shows the top 20 counties which the people reside in the King County traveled to in March 2019.
And using the Country-to-US Counties flow data from last March, we can assess how the global travels from other countries outside of US will influence the potential coronavirus outbreak and spread in the US.
Acknowledgment: We would like to thank all individuals and organizations for collecting and updating the COVID-19 observation data and reports. Dr. Song Gao acknowledges the funding support provided by the National Science Foundation (Award No. 2027375). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Abstract: With the booming economy in China, many researches have pointed out that the improvement of regional transportation infrastructure among other factors had an important effect on economic growth. Utilizing a large-scale dataset which includes 3.5 billion entry and exit records of vehicles along highways generated from toll collection systems, we attempt to establish the relevance of mid-distance land transport patterns to regional economic status through transportation network analyses. We apply standard measurements of complex networks to analyze the highway transportation networks. A set of traffic flow features are computed and correlated to the regional economic development indicator. The multi-linear regression models explain about 89% to 96% of the variation of cities’ GDP across three provinces in China. We then fit gravity models using annual traffic volumes of cars, buses, and freight trucks between pairs of cities for each province separately as well as for the whole dataset. We find the temporal changes of distance-decay effects on spatial interactions between cities in transportation networks, which link to the economic development patterns of each province. We conclude that transportation big data reveal the status of regional economic development and contain valuable information of human mobility, production linkages, and logistics for regional management and planning. Our research offers insights into the investigation of regional economic development status using highway transportation big data.
Reference: Song Gao, Jinmeng Rao, Xinyi Liu, Yuhao Kang, Qunying Huang, Joseph App. (2019) Exploring the effectiveness of geomasking techniques for protecting the geoprivacy of Twitter users.Journal of Spatial Information Science. 19, 105-129. DOI: 10.5311/JOSIS.2019.19.510[PDF]
Abstract: With the ubiquitous use of location-based services, large-scale individual-level location data has been widely collected through location-awareness devices. Geoprivacy concerns arise on the issues of user identity de-anonymization and location exposure. In this work, we investigate the effectiveness of geomasking techniques for protecting the geoprivacy of active Twitter users who frequently share geotagged tweets in their home and work locations. By analyzing over 38,000 geotagged tweets of 93 active Twitter users in three U.S. cities (Los Angeles, Madison, and Washington D.C.), the two-dimensional Gaussian masking technique with proper standard deviation settings is found to be more effective to protect user’s location privacy while sacrificing geospatial analytical resolution than the random perturbation masking method and the aggregation on traffic analysis zones. Furthermore, a three-dimensional theoretical framework considering privacy, spatialanalytics, and uncertainty factors simultaneously is proposed to assess geomasking techniques. Our research offers insights into geoprivacy concerns of social media users’ georeferenced data sharing for future development of location-based applications and services.
Broader Impacts: In fact, Twitter removes support for precise geotagging since June, 2019. However, the metadata of historical tweets prior to the policy change may still reveal precise GPS coordinates. In addition, when a user deletes a geotagged tweet , Twitter does not guarantee the information will be completely removed from all copies of the data on third-party applications or in external search results. Even if the precise GPS location is not available anymore, Twitter users are still able to add place tags (e.g., a city, office building, apartment, landmark, and many other types of places) to their geotagged tweets, which can be converted to the GPS coordinates (often using the centroid as a representation location). This is similar to the aforementioned aggregation-based masking approach, thus we may still be able to get users’ sensitive locations based on fine-scale place tags. People should be aware that sharing or publishing such kind of location data involve geoprivacy issues and the geomasking technique provides a way to help mitigate the problem not only for Twitter users but also for other telematics and social media platforms such as Facebook, Flickr, Weibo, and Instagram where geotagging or place-tagging is accessible, as well as for mobile applications that track individual locations.