Apache Hadoop is an open-source software framework that facilitates the use of a network of computer devices to store and process large data sets using simple programming models. It is designed to solve problems that involve analyzing large amounts of data ranging from gigabytes to petabytes (one million gigabytes). The framework is written in Java and is based on Google’s MapReduce programming model. This course begins with an introduction to Hadoop and big data software utility. It will teach you the features, types, and sources of information in big data. The various ways of analyzing big data and its benefits will also be covered. An overview of Apache Hadoop, its framework, history, and the Hadoop ecosystem will be discussed. Then, in the practice section, you will study how to download, start and connect to the Cloudera virtual machine using the Docker platform. Furthermore, you will study the architecture of the Hadoop distributed file system (HDFS). The building blocks of Hadoop, its components and workflow will be explained. Also, some useful HDFS shell commands used to manage files on the HDFS clusters and how to create directories, move, delete and read files will be highlighted.
Next, you will be introduced to MapReduce, studying its architecture and seeing how it works. You will also learn about the data flow of MapReduce, YARN (Yet Another Resource Negotiator) architecture, and the differences between traditional relational database management systems (RDBMS) and MapReduce. Thereafter, you will be taught the architecture of SQOOP and how to import and export data using the SQOOP command-line interface. The syntax for importing data from RDBMS to HDFS and from RDBMS to Hive through SQOOP import and exporting data from HDFS to RDBMS and from HIVE to RDBMS through SQOOP export will be explained in two practice sections. Then, you will study Hive, its architecture, components and data types. The types of tables in Hive, the Hive schema, and data storage will be highlighted. Furthermore, the Impala MPP SQL query engine, its features, and the differences between Impala, Hive, and the traditional RDBMS database will be considered. Also, creating external Hive tables, creating managed Hive tables, and running HQL and Impala queries for analyzing the data will be covered in the practice section.
Next, you will study Pig scripting in Hadoop. You will learn the Pig data types, their uses, and how Pig scripts are executed with the engine. How to load data into Pig as well as filtering data will be also be explained. Creating different Pig Latin scripts, executing and using different functions to perform ETL (extract, transform and load) using Pig will be outlined in the practice section. Then, you will be introduced to the Oozie workflow scheduling system to manage Hadoop jobs. The types of jobs in Oozie, its architecture, features, and actions will be reviewed. Oozie parameterization and how the flow control in the Oozie workflow operates will be critically analyzed. In the practice section, you will learn how to create different actions in SQOOP, Hive, and Pig. This course is for database and data house developers, big data developers, data analysts, and any technical personnel who are interested to learn and explore the various features of Hadoop and its tools. What keeps you waiting? Enroll now and start learning today!
In This Free Course, You Will Learn How To
View All Learning Outcomes View Less All Alison courses are free to enrol study and complete. To successfully complete this course and become an Alison Graduate, you need to achieve 80% or higher in each course assessment. Once you have completed this course, you have the option to acquire an official , which is a great way to share your achievement with the world.
Your Alison is:
- Ideal for sharing with potential employers
- Great for your CV, professional social media profiles and job applications.
- An indication of your commitment to continuously learn, upskill & achieve high results.
- An incentive for you to continue empowering yourself through lifelong learning.
Alison offers 3 types of s for completed courses:
- Digital : a downloadable in PDF format immediately available to you when you complete your purchase.
- : a physical version of your officially branded and security-marked
- Framed : a physical version of your officially branded and security marked in a stylish frame.
All s are available to purchase through the Alison Shop. For more information on purchasing Alison , please visit our FAQs. If you decide not to purchase your Alison , you can still demonstrate your achievement by sharing your Learner Record or Learner Achievement Verification, both of which are accessible from your Account Settings. For more details on our pricing, please visit our Pricing Page