Apache Spark Projects

  • flag Packt
  • student All Levels
  • database book
  • earth English
  • clock 8h 25m


Explore the potential of Apache Spark and its ecosystem through real-world applications.

Covered topics:

  • Explore Spark ecosystem and learn to deploy in large-scale clusters
  • Perform basic operations of Spark with the Movie lens data analysis
  • Learn how to do data analysis using Spark Streaming and SQL
  • Understand how to predict flight delays with Mlib
  • Learn how to forecast sales predictions with SparkR
  • Write Pyspark codes for building a recommendation engine


Apache Spark is one of the most popular Big Data tools used in a plethora of industries today right from E-commerce^ Entertainment to Travel and Retail Industry. This book demonstrates how to leverage the capabilities of Apache Spark and use them in practical projects using real-world scenarios. The book begins with a quick introduction to all the components of the Spark ecosystem and later teach the readers how to use them in real-world scenarios. It demonstrates how to use each component of Apache Spark ecosystem^ i.e. Spark SQL^ Spark Streaming^ Spark Mllib^ PySpark to build an efficient^ end to end Big Data processing pipeline. Some of the projects that are covered such as Sales forecasting using SparkR and recommendation engine using PySpark. The readers will learn about the different libraries like Mlib^ Spark SQL^ GraphX and Spark Streaming. Throughout the book^ the readers will gain knowledge about the different components of the Spark ecosystem and will also be able to manage their big data pipelines using Apache Spark. By the end of the book^ you will master all the aspects of Apache Spark^ and use them in your own Big Data projects without any hassle.