Make the Elephant Fly. Real-time Big
Data with SQLstream
‘Real-time’
and ‘Hadoop’ had been considered synonymous, yet Hadoop is not as real-time as
many have hoped. Hadoop has many strengths, but was never intended for low
latency, real-time analytics over high velocity machine data streams. With the
SQL language emerging as the key enabler for the mainstream adoption of Hadoop,
executing streaming SQL queries over Hadoop extends the platform out to the
edge of the network, making it possible to query unstructured log file, sensor
and network machine data sources on the fly and in real-time.
Real-time Operational Intelligence on
Hadoop
SQLstream
accelerates Hadoop to process live, high velocity unstructured data
streams, delivering the low latency, streaming operational intelligence
demanded by today’s real-time businesses.
SQLstream for Hadoop combines SQLstream’s real-time operational intelligence from high velocity machine data with the power of Hadoop for high volume data storage and on-going analysis. SQLstream for Hadoop enables:
SQLstream for Hadoop combines SQLstream’s real-time operational intelligence from high velocity machine data with the power of Hadoop for high volume data storage and on-going analysis. SQLstream for Hadoop enables:
·
Stream persistence – Hadoop HBase as an active archive for streaming data
and derived intelligence using the Flume API. SQLstream also performs continuous
aggregation to support high velocity streams without data loss.
·
Stream replay – restream the complete history of persisted streams from
HBase for ‘fast forwarding’ of time-based and spatial analytics. Various
interfaces can be utilized, including Cloudera’s Impala.
·
Streaming data queries, joining streaming real-time data with historical
streams and intelligence persisted in HBase.
The first
phase of Hadoop and Big Data saw the emergence of NoSQL data storage platforms,
looking to overcome the rigidity of normalized schemas. However, as the
technology hits mainstream industry, the need for simpler, high performance and
reliable queries is driving a resurgence in SQL as the de facto language for
Big Data processing (for example, Cloudera Impala and Google BigQuery). What is
now apparent is that SQL is the ideal language for processing data streams
using real-time, windows-based queries. The issue with normalization and rigid
schemes is a non-issue for a streaming data platform – there are no tables, no
data gets stored!
What is Streaming SQL?
SQL was
developed to process stored data in a traditional RDBMS. It has a massive
existing skills base, proven scalability and sophisticated dynamic query
optimization. It also functions equally well, if not better, as a real-time
stream computing query language. SQLstream’s ANSI SQL:2008 streaming SQL
queries are exactly that – standards compliant. We test our SQL queries for
standards compliance against the leading RDBMS SQL platforms. There are however
two differences. SQLstream’s core s-Server stream computing platform does not
persist any data before processing (Hadoop HBase is the default storage
platform for stream persistence although any data storage platform can be
supported), and streaming SQL queries execute continuously, processing new data
as they are created. So why SQL as a stream computing language?
·
Proven scalability with sophisticated query optimization.
·
Rapid development – a few SQL rules have immense power.
·
SQL skills are readily available in the marketplace worldwide.
·
Supports direct migration of SQL applications to and from existing
databases and data warehouses.
A Streaming SQL Example
The following query is a basic example of a streaming SQL query. The query
finds Orders from New York that ship within one hour. Unlike a traditional
static SQL query, this query executes continuously, processing new data as they
arrive across all streams in the join, and pushing out results as the query
condition is met. The keyword STREAM is used to maintain standards
compatibility as without it the query would return a table not a stream of
results that continue ad infinitum.
Streaming SQL supports all standard SQL operations for data streams, including:
Streaming SQL supports all standard SQL operations for data streams, including:
·
Stream Select, Insert and Update
·
Stream Join
·
Streaming Partition By and Group By
·
Full set of arithmetic, string, logical, date and timestamp operators
·
Support for User Defined Functions (UDXes)
Streaming SQL queries over Hadoop
SQLstream
s-Server, our core streaming computing platform, operates both as a streaming
Big Data engine and as a streaming SQL language extension for Hadoop HBase. In
Hadoop mode, Hadoop HBase is utilized as the default platform for stream
persistence.
Data can be streamed
directly into Hadoop HBase in real-time, including the raw machine data as it
is collected from the log files, applications and sensors, also filtered and
enhanced versions of the same streams, as well as any pre-aggregated and
analytical intelligence information. SQLstreams streaming SQL language support
for Hadoop offers:
·
Real-time operational intelligence on Hadoop without low-level coding
·
Stream persistence for all raw machine data and derived intelligence
information
·
SQLstream Connector for Hadoop HBase maintain and utilize your Big Data
storage platforms in real-time.
·
Streaming integration between Big Data storage platforms.
·
Replay persisted streams for time-based and geospatial analysis of existing
stored data.
A key
advantage with SQLstream is the ability to extract and replay processed data
from Big Data storage platforms and join this information with the incoming,
live data streams. Operational intelligence results are enhanced by combining
real-time data against known trends, eliminating false alarms and longer term
comparisons. The extraction and data processing in SQLstream uses
standards-based SQL queries, enabling powerful real-time queries to be deployed
over streaming stored data.
I really appreciate the information shared above. It’s of great help. MaxMunus provides Remote Support For Corporate and for Individuals. If anyone is facing any issue in his project of #SQLSTREAM we can support them remotely , kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Industry best Consultant on# SQLSTREAM. We provide end to end Remote Support on Projects. MaxMunus is successfully doing remote support for countries like India, USA, UK, Australia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain, and UAE etc.
Saurabh
MaxMunus
E-mail: saurabh@maxmunus.com
Skype id: saurabhmaxmunus
Ph:(0) 8553576305/ 080 - 41103383
http://www.maxmunus.com