Set up your Hadoop clusters and integrate them with processing tools such as Pig^ Hive^ and Spark
- Insert data into Hadoop Distributed File System (HDFS)
- Compute algorithms with MapReduce
- Delete and transport data to test Erasure and Balancer
- Build a YARN application for the MapReduce workflow
- Create Resilient Distributed Data System (RDD) analytics for Twitter tags and visualize your data using Python
- Configure multiple permission cases to see how Sqoop to Hive secured access works
Apache Hadoop is an open source distributed processing framework that processes^ manages^ and stores big data for applications. Hadoop Fundamentals begins by covering how distributed file systems in Hadoop work and how they are managed with YARN^ a Hadoop management layer. You’ll understand the MapReduce paradigm^ which is the basic paradigm of data processing and analytics in parallelized systems. You’ll then delve into Apache Spark^ a super-fast cluster computing technology that extends the Hadoop MapReduce functionality to efficiently perform a variety of computations. As you advance^ you’ll explore data resources in the Hadoop ecosystem made by the big data community and enterprise users to find out what lies beyond MapReduce computations. This Hadoop book also takes you through frameworks such as Flame^ Sqoop^ Hive^ and HBase for ingesting and warehousing different types of data. Finally^ you’ll learn all about the solutions that Hadoop systems implement to solve security issues. By the end of this book^ you’ll have understood what big data is and have the skills necessary to work with Hadoop systems.