Hadop Reklam

Sponsor Reklam

Wednesday, November 27, 2013

Pigeon

Pigeon
Pigeon is a spatial extension to Pig that allows it to process spatial data. All functionalities in Pigeon are introduced as user-defined functions (UDFS) which makes it unobtrusive and allows it to work with your existing systems. All the spatial functionality is supported by ESRI Geometry API a native Java open source library for spatial functionality licensed under  Apache Public License
Our target is to have something like Postgis but for Pig instead of PostgreSQL. We use the same function names to make it easier for existing users to use Pigeon. Here is an example the computes the union of all ZIP codes in each city.
zip_codes = LOAD 'zips' AS (zip, city, geom);
zip_by_city = GROUP zip_codes BY city;
zip_union = FOREACH zip_by_city
GENERATE group AS city, ST_Union(geom);

Data types
Currently, Pig does not support the creation of custom data types. This is not the best thing for Pigeon because we wanted to have our own data type (Geometry) similar to PostGIS. As a work around, we use the more generic type bytearray as our main data type. All conversions happen from bytearray to Geometry and vice-verse on the fly while the function is executed. If a function expects an input of type Geometry, it receives a bytearray and converts it Geometry. If the output is of type Geometry, it computes the output, converts it to bytearray, and returns that bytearray instead. This is a little bit cumbersome, but the Pig team is able to add custom data types so that we have a cleaner extension.

How to compile
Pigeon requires ESRI Geometry API to compile. You need to download a recent version of the library and add it to the classpath of your Java compiler to be able to compile the code. Of course you also need Pig classes to be available in the classpath. The current code is tested against ESRI Geometry API 1.0 and Pig 0.11.1. Currently you have to do the compilation manually but we are planning to create an ANT build file to automate the compilation. Once you compile the code, you can create a jar file out of it and REGISTER it in your Pig scripts.

How to use
To use Pigeon in your Pig scripts, you need to REGISTER the jar file in your Pig script. Then you can use the spatial functionality in your script as you cdo with normal functionality. Here are some simple examples on how to use Pigeon.

Let's say you have a trajectory in the form (latitude, longitude, timestamp). We need to for a Linestring out of this trajectory when points in this linestring are sorted by timestamp.

points = LOAD 'trajectory.tsv' AS (time: datetime, lat:double, lon:double);
s_points = FOREACH points GENERATE ST_MakePoint(lat, lon) AS point, time;
points_by_time = ORDER s_points BY time;
points_grouped = GROUP points_by_time ALL;
lines = FOREACH points_grouped GENERATE ST_AsText(ST_MakeLine(points_by_time));
STORE lines INTO 'line';
Supported functions
Here is a list of all functions that are currently supported.

Basic Spatial Functions

ST_AsHex Converts a shape to its Well-Known Binary (WKB) format encoded as Hex string
ST_AsText Converts a shape to its Well-Known Text (WKT) format
ST_MakePoint Creates a geometry point given two numeric coordinates
ST_Area Calculates the area of a surface shape (e.g., Polygon)
ST_Envelope Calculates the envelope (MBR) of a shape
ST_Buffer Computes a buffer with the specified distance around a geometry.
ST_Size Returns number of points in a linestring
Spatial Predicates

ST_Crosses Checks if one polygon crosses another polygon
ST_IsEmpty Tests whether a shape is empty or not.
Spatial Analysis

ST_Buffer Computes a buffer with the specified distance around a geometry.
ST_ConvexHull Computes the minimal convex polygon of a shape.
Aggregate functions

ST_MakeLine Creates a line string given a bag of points
ST_MakePolygon Creates a polygon given a circular list of points
ST_ConvexHull Computes the convex hull from a bag of shapes
ST_Union Computes the spatial union of a set of surfaces (e.g., Polygons)
ST_Extent Computes the minimal bounding rectangle (MBR) of a set of shapes

No comments:

Post a Comment