Spark for Python Developers

A concise guide to implementing Spark Big Data analytics for Python developers^ and building a real-time and insightful trend tracker data intensive app

Covered topics:

  • Create a Python development environment powered by Spark (PySpark)^ Blaze^ and Bookeh
  • Build a real-time trend tracker data intensive app
  • Visualize the trends and insights gained from data using Bookeh
  • Generate insights from data using machine learning through Spark MLLIB
  • Juggle with data using Blaze
  • Create training data sets and train the Machine Learning models
  • Test the machine learning models on test datasets
  • Deploy the machine learning algorithms and models and scale it for real-time events


Looking for a cluster computing system that provides high-level APIs? Apache Spark is your answer—an open source^ fast^ and general purpose cluster computing system. Spark s multi-stage memory primitives provide performance up to 100 times faster than Hadoop^ and it is also well-suited for machine learning algorithms. Are you a Python developer inclined to work with Spark engine? If so^ this book will be your companion as you create data-intensive app using Spark as a processing engine^ Python visualization libraries^ and web frameworks such as Flask. To begin with^ you will learn the most effective way to install the Python development environment powered by Spark^ Blaze^ and Bookeh. You will then find out how to connect with data stores such as MySQL^ MongoDB^ Cassandra^ and Hadoop. You’ll expand your skills throughout^ getting familiarized with the various data sources (Github^ Twitter^ Meetup^ and Blogs)^ their data structures^ and solutions to effectively tackle complexities. You’ll explore datasets using iPython Notebook and will discover how to optimize the data models and pipeline. Finally^ you’ll get to know how to create training datasets and train the machine learning models. By the end of the book^ you will have created a real-time and insightful trend tracker data-intensive app with Spark.