Detecting Observation ‘Hot-Spots’ in Massive Citizen-Contributed Geographic Data

Author(s):

Guiming Zhang, PhD

Publications:

Zhang, G., Zhu, A.X., and Huang, Q., 2017. A GPU-accelerated adaptive kernel density estimation approach for efficient point pattern analysis on spatial big data. International Journal of Geographical Information Science 31(10): 2068-2097. doi: 10.1080/13658816.2017.1324975.

Zhang, G., Huang, Q., Zhu, A.X., and Keel, J., 2016. Enabling point pattern analysis on spatial big data using cloud computing: Optimizing and accelerating Ripley’s K function. International Journal of Geographical Information Science 30(11):2230–2252.doi: 10.1080/13658816.2016.1170836.

Many average citizens (non-professionals) nowadays are involved in contributing geo-referenced observations of geographic phenomena. As an example, the global-scale eBird citizen science project is documenting bird species by pooling birding records from birders around the world (eBird has accumulated 500 million records as of March 26, 2018). Such large-scale citizen-contributed data create new opportunities for scientific investigations and discoveries (e.g., examining impacts of global climate change on species distribution and their migration). Nonetheless, for better data utilization, it is fundamental to understand the spatial distribution pattern of the observation efforts underlying such data, prior to using such data for any applications. Are the observation locations randomly distributed over space, or clustered in certain geographic areas? Answers to such questions and alike have great implications on data analysis approaches. For instance, clustered observation efforts would imply sampling biases in the data and thus measures should be taken to correct for such biases in data analysis.

This project aims to develop a computational framework for detecting observation ‘hot-spots’ (clusters) in massive citizen-contributed geographic data. For this purpose, the kernel density estimating (KDE) approach to point pattern analysis will be adopted. A parallel KDE algorithm is implemented to leverage parallel computing powers on multi-core CPUs (central processing units) or GPUs (graphics processing units) to accelerate complex computations. The parallel algorithm is then used to analyze massive citizen-contributed data (e.g., eBird data) for detecting observation ‘hot-spots’ at multiple spatial scales.

Detecting Observation ‘Hot-Spots’ in Massive Citizen-Contributed Geographic Data

Start Your Application

Undergraduate Applicants

Common App

Graduate Applicants

Apply Now

Explore Programs