Looking at data without location, most of the time
seems like looking at just part of a story. Including location and geography in
analysis reveals patterns and associations that otherwise are missed. As Big
Data emerges as a new frontier for analysis, including location in Big Data is
becoming significantly important.
Data that includes location, and that is enhanced with
geographic information in a structured form, is often referred to as Spatial
Data. Doing Analysis on Spatial data requires an understanding of geometry and
operations that can be preformed on it. Enabling Hadoop to include spatial data
and spatial analysis is the goal of this Esri Open Source effort.
GIS Tools for Hadoop is
an open source toolkit intended for Big Spatial Data Analytics. The toolkit
provides different libraries:
- Esri Geometry API for Java: A
generic geometry library, can be used to extend Hadoop core
with vector geometry types and operations, and enables developers to build
MapReduce applications for spatial data.
- Spatial Framework for Hadoop: Extends
Hive and is based on the Esri Geometry API, to enable Hive Query Language
users to leverage a set of analytical functions and geometry types. In
addition to some utilities for JSON used in ArcGIS.
- Geoprocessing Tools for Hadoop: Contains
a set of ready to use ArcGIS Geoprocessing tools, based on the Esri
Geometry API and Spatial Framework for Hadoop. Developers can download the
source code of the tools and customize it; they can also create new tools
and contribute it to the open source project. Through these tools ArcGIS
users can move their spatial data and execute a pre-defined workflow
inside Hadoop.
The GIS Tools for Hadoop toolkit
allows users, who want to leverage the Hadoop Framework, to do spatial analysis
on spatial data; for example:
1.
Run Filter and aggregate operations on billions of
spatial data records inside Hadoop based on spatial criteria.
2.
Define new areas represented as polygons, and run
Point in Polygon analysis on billions of spatial data records
inside Hadoop.
3.
Visualize analysis results on a map with rich styling
capabilities, and a rich set of base maps.
4.
Integrate your maps in reports, or publish them as map
applications online.
Getting started
Developers can get started at Spatial Framework
for Hadoop.
ArcGIS users can get started at Geoprocessing
Tools for Hadoop.
How it all works?
Overall there are four Github
projects that make up the toolkit.
Firstly, the Esri Geometry API for Java: project.
This is a generic library that includes geometry objects, spatial operations,
and spatial indexing, it can be used to spatially enable Hadoop. By deploying
the Esri geometry API library (as a jar) within Hadoop, developers are able to
build Map/Reduce applications that are spatially enabled, by leveraging the
Esri Geometry API along with the other Hadoop APIs in their application.
Secondly, the Spatial Framework for Hadoop project.
This library includes the user defined objects that extend Hive with the
capabilities of the Esri Geometry API. By enabling this library in Hive, users
are able to construct queries that are very SQL like using HQL. In this case,
users don’t have to write a Map/Reduce application, they can interact with
Hive, write their SQL like queries and get answers directly from Hadoop.
Queries in this case can include spatial operations and values.
Thirdly, the Geoprocessing Tools for Hadoop project.
These tools are specifically used in ArcGIS. Through the tools, users can
connect to Hadoop from ArcGIS. Connecting to Hadoop from ArcGIS is really
useful to the toolkit users, since they can import their analysis result in
ArcGIS for Visualization. They can also do more complex and sophisticated
analysis now that they narrowed down their data to a specific subset.
Additionally, users can leverage the ArcGIS platform capabilities to publish
their maps to web and mobile apps, and can integrate it with BI reports.
Finally, the GIS Tools for Hadoop project.
This project is intended as a place to include multiple samples that leverage
the toolkit. The samples can leverage the low level libraries, or the
Geoprocessing tools. A couple of samples are available to help you test the
deployment of the spatial libraries with Hadoop and Hive, and make sure
everything runs with no issues before you start leveraging the setup from your
HQL queries, or from the GP tools. To check your deployment, for Hive and GP
tools usage, the sample point-in-polygon-aggregation-hive can be utilized. The
sample leverages the data and lib directories on the same path.
No comments:
Post a Comment