Efficient Grid-GIS Framework for Spatial Data

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Geographic Information System (GIS) data is huge in volume and requires tremendous data storage capacity. Grid computing technology in the GIS domain provides cooperation and integration of services to implement a complex spatial function, and consequently provides a significant performance gain. The combination of GIS and Grid Computing, well known as Grid-GIS, has become a new research tendency. The existing Grid-GIS architectures and frameworks, such as OGSI with WSDL and XML based Grid-GIS, OGSA with WSRF based Grid-GIS, and parallel processing oriented mobile agent based Grid-GIS, suffer from the less efficient data access, retrieval and complex procedures, for achieving fault tolerance, availability and scalability. Grid computing is characterized in dealing with a bag-of-tasks having few Inputs/Outputs (I/O)s. The analysis of voluminous spatial data, that is characterized as big data, requires few complex tasks, however the number of intermediate (I/O)s remain very high. The MapReduce computations are compatible for processing data-intensive spatial data that requires a large number of inputs and intermediate data. The MapReduce is also better than the mobile agent technology, as it provides built-in support for parallel processing operations and fault tolerance that abstracts the complexity of operations from the user. The integrated Grid and MapReduce for GIS data supplement each other by providing data analysis and computational environment together. Firstly, it provides high utilization of the resource pool. Secondly, the high data analytic feature of the MapReduce - Hadoop is complimented with the comprehensive accounting, resource utilization control, and policy management features of the grid. However, not much research work is found that integrates MapReduce and Grid-GIS. So, considering the benefits of the MapReduce, the proposed architecture and framework integrates the MapReduce in the Grid-GIS. Three parallel spatial indexing algorithms based on MapReduce are also included as significant components of the proposed architecture and framework, these strengthen spatial data access and retrieval through indexing. H-bucket PMR Quadtree Spatial Index, parallel Hilbert TGS R-Tree Spatial Index and parallel Priority R-Tree Spatial Index are implemented. The H-bucket PMR Quadtree index is particularly designed for spatial data featuring lines. The other two indexes take Minimum Bounding Rectangles (MBRs) as approximation of spatial data. The parallel Priority R-Tree provides good performance in worst cases, such as, for non-uniformly distributed data (skewed data, data rectangles with high aspect ratio, clustered data, etc.). Finally, an architecture and framework that integrates spatially indexed MapReduce and Grid-GIS "QUiPSHoT Grid-GIS" has been proposed. This proposed architecture and framework has been implemented and then tested by running it in an academic institution. Performance of the same has been measured for cost of bulk-loading spatial indexes, execution time, scalability and availability. The experimental results validate the competitive performance and usage of the proposed framework.

Description

PhD thesis

Citation

Endorsement

Review

Supplemented By

Referenced By