Author: | Muhammad Asif Abbasi | ISBN: | 9781785889585 |
Publisher: | Packt Publishing | Publication: | March 28, 2017 |
Imprint: | Packt Publishing | Language: | English |
Author: | Muhammad Asif Abbasi |
ISBN: | 9781785889585 |
Publisher: | Packt Publishing |
Publication: | March 28, 2017 |
Imprint: | Packt Publishing |
Language: | English |
Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analytics
This guide appeals to big data engineers, analysts, architects, software engineers, even technical managers who need to perform efficient data processing on Hadoop at real time. Basic familiarity with Java or Scala will be helpful.
The assumption is that readers will be from a mixed background, but would be typically people with background in engineering/data science with no prior Spark experience and want to understand how Spark can help them on their analytics journey.
Spark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.
The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.
Once we understand the individual components, we will take a couple of real life advanced analytics examples such as Building a Recommendation system', Predicting customer churn' and so on.
The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.
With the help of practical examples and real-world use cases, this guide will take you from scratch to building efficient data applications using Apache Spark.
You will learn all about this excellent data processing engine in a step-by-step manner, taking one aspect of it at a time.
This highly practical guide will include how to work with data pipelines, dataframes, clustering, SparkSQL, parallel programming, and such insightful topics with the help of real-world use cases.
Learn about the fastest-growing open source project in the world, and find out how it revolutionizes big data analytics
This guide appeals to big data engineers, analysts, architects, software engineers, even technical managers who need to perform efficient data processing on Hadoop at real time. Basic familiarity with Java or Scala will be helpful.
The assumption is that readers will be from a mixed background, but would be typically people with background in engineering/data science with no prior Spark experience and want to understand how Spark can help them on their analytics journey.
Spark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of existing Hadoop installation and configuring with Yarn and Mesos.
The next part of the journey after installation is using key components, APIs, Clustering, machine learning APIs, data pipelines, parallel programming. It is important to understand why each framework component is key, how widely it is being used, its stability and pertinent use cases.
Once we understand the individual components, we will take a couple of real life advanced analytics examples such as Building a Recommendation system', Predicting customer churn' and so on.
The objective of these real life examples is to give the reader confidence of using Spark for real-world problems.
With the help of practical examples and real-world use cases, this guide will take you from scratch to building efficient data applications using Apache Spark.
You will learn all about this excellent data processing engine in a step-by-step manner, taking one aspect of it at a time.
This highly practical guide will include how to work with data pipelines, dataframes, clustering, SparkSQL, parallel programming, and such insightful topics with the help of real-world use cases.