Directory Image
This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Privacy Policy.

Features of Spark SQL

Author: Narayana Reddy
by Narayana Reddy
Posted: Mar 07, 2019

Let’s take a stroll into the aspects which make Spark SQL so popular in data processing.

Integrated – One can mix SQL queries with Spark programs easily. Structured data can be queried inside Spark programs using Spark SQL using either SQL or a Data frame API. Running SQL queries alongside analytic algorithms is easy because of this tight integration.

Hive compatibility – Hive queries can be run as it is as Spark SQL supports HiveQL along with UDFs (user defined functions) and Hive SerDes. This allows one to access the existing Hive warehouses.

Unified data access – Loading and querying data from variety of sources is possible. One only needs a single interface to work with structured data which the schema-RDDs provide.

Standard connectivity – Spark SQL includes a server mode with high grade connectivity to JDBC or ODBC.

Performance and scalability – To make queries agile alongside computing hundreds of nodes using the Spark engine, Spark SQL incorporates a code generator, cost-based optimizer and columnar storage. This provides complete mid-query fault tolerance. Note that we discusses earlier in Hive limitations that this kind of tolerance was lacking in Hive. Spark has ample information regarding the structure of the data as well as the type of computation being performed which is provided by the interfaces of Spark SQL. This leads to extra optimization from Spark SQL internally. Faster execution of Hive queries is possible as Spark SQL can directly read from multiple sources like HDFS, Hive, and existing RDDs etc.

If you are Interested to learn SQL go through enrolling for free demo SQL server training

Use cases

There is a lot to learn about Spark SQL as how it is applied in industry scenario but the below three use cases can give an apt idea:

Twitter sentiment analysis – Initially all data is got from Spark streaming. Later Spark SQL is used to analyse everything about a topic say Narendra Modi. Every tweet regarding Modi is got and then Spark SQL does its magic to classify tweets as neutral tweets, positive tweets, negative tweets, very positive tweets and very negative tweets. This is just one of the ways how sentiment analysis is done. This is useful in target marketing, crisis management and service adjusting.

Stock market analysis – Once you are streaming data in the real time you can also do the processing in the real time. Stock movements, market movement generate so much data and traders need an edge, an analytics framework which will calculate all the data in real time and provide the most rewarding stock or contract all within the nick of time. As said earlier if there is a need for real time analytics framework then Spark and its components is the technology to be considered.

Banking – Real time processing is required in credit card fraud detection. Assume a transaction happens in Bangalore where there is a purchase of 4,000 rupees swiping a credit card. Within 5 minutes there is another purchase of 10,000 rupees in Kolkata swiping the same credit card. Banks can make use of real time analytics provided by Spark SQL in detecting the fraud.

Conclusion

Apache foundation has given a carefully thought out component for real time analytics. When the analytics world start seeing the shortcomings of Hadoop in providing real time analytics then migrating to Spark will be the obvious outcome. Similarly when the limitations of Hive become more and more apparent than users will obviously shift to Spark SQL. It is to be noted that the processing which takes 10 minutes to perform via Hive can be achieved in less than a minute if one uses Spark SQL. On top of that the migration is also easy as hive support is provided by Spark SQL. But here comes the great opportunity for those who want to learn Spark SQL and data frames. Currently there aren’t many professionals who can work around in Hadoop. The demand is still higher for Spark and those who learn it and have hands-on experience on it will be in great demand when the technology is used more and more in the future.

About the Author

Narayana was a python developer form 2015 she was the member of core python developer of the company she is enthusiastic about python.

Rate this Article
Leave a Comment
Author Thumbnail
I Agree:
Comment 
Pictures
Author: Narayana Reddy

Narayana Reddy

Member since: Jan 04, 2019
Published articles: 18

Related Articles