Hadop Reklam

Sponsor Reklam

Tuesday, November 26, 2013

SpatialHadoop

Usage examples


Once you have SpatialHadoop configured correctly, you are ready to run some sample programs. The following steps will generate a random file, index it using a Grid index, and run some spatial queries on the indexed file. The classes needed for this example are all contained in the spatialhadoop*.jar shipped with the binary release. You can type 'bin/hadoop jar spatialhadoop*.jar' to get the usage syntax for the available operations.
To generate a random file containing random rectangles, enter the following command
$ bin/hadoop jar spatialhadoop*.jar generate test mbr:0,0,1000000,1000000 size:1.gb shape:rect
This generates a 1GB file named 'test', where all rectangles in the file are contained in the rectangle with corner at (0,0) and dimensions 1Mx1M units.

If you have your own file that needs to be processed, you can upload it the same way you do with traditional Hadoop by typing
$ bin/hadoop fs -copyFromLocal <local file path> <HDFS file path>
Then you can index this file using the following command

To index this file using a Grid index
$ bin/hadoop jar spatialhadoop*.jar index test test.grid mbr:0,0,1000000,1000000 sindex:grid
To see how the grid index partitions this file, type:
$ bin/hadoop jar spatialhadoop*.jar readfile test.grid
This shows the list of partitions in file, each defined by boundaries, along with the number of blocks in each partition.

To run a range query operation on this file
$ bin/hadoop jar spatialhadoop*.jar rangequery test.grid rq_results rect:500,500,1000,1000
This runs a range query over this file with the query range set to the rectangle at (500,500) with dimensions 1000x1000. The results will be stored in an HDFS file named 'rq_result'

To run a knn query operation on this file
$ bin/hadoop jar spatialhadoop*.jar knn test.grid knn_results point:1000,1000 k:1000
This runs a knn query where the query point is at (1000,1000) and k=1000. The results are stored in HDFS file 'knn_results'

To run a spatial join operation
First, generate another file and have it indexed on the fly using the command
$ bin/hadoop jar spatialhadoop*.jar generate test2.grid mbr:0,0,1000000,1000000 size:100.mb sindex:grid
Now, join the two files via the Distributed Join algorithm using the command
$ bin/hadoop jar spatialhadoop*.jar dj test.grid test2.grid sj_results

No comments:

Post a Comment