Loading

Alison's New App is now available on iOS and Android! Download Now

Fundamentals of Hadoop

Understand the basics of the Apache Hadoop ecosystem with hands-on exercises in this free analytics training course.

Publisher: Proton Expert Systems and Solutions
In this short course, you will be introduced to the components and tools of Apache Hadoop. Learn how to store and process large datasets ranging in size from gigabytes to petabytes with big data. The HDFS (Hadoop distributed file system) architecture, data processing using MapReduce, and importing and exporting data using SQOOP will be covered. The course also has a section that provides you with practical knowledge and hands-on activities.
Fundamentals of Hadoop
  • Duration

    1.5-3 Hours
  • Students

    226
  • Accreditation

    CPD

Share This Course And
Earn Money  

Become an Affiliate Member

Description

Modules

Outcome

Certification

View course modules

Description

Apache Hadoop is an open-source software framework that facilitates the use of a network of computer devices to store and process large data sets using simple programming models. It is designed to solve problems that involve analyzing large amounts of data ranging from gigabytes to petabytes (one million gigabytes). The framework is written in Java and is based on Google’s MapReduce programming model. This course begins with an introduction to Hadoop and big data software utility. It will teach you the features, types, and sources of information in big data. The various ways of analyzing big data and its benefits will also be covered. An overview of Apache Hadoop, its framework, history, and the Hadoop ecosystem will be discussed. Then, in the practice section, you will study how to download, start and connect to the Cloudera virtual machine using the Docker platform. Furthermore, you will study the architecture of the Hadoop distributed file system (HDFS). The building blocks of Hadoop, its components and workflow will be explained. Also, some useful HDFS shell commands used to manage files on the HDFS clusters and how to create directories, move, delete and read files will be highlighted.

Next, you will be introduced to MapReduce, studying its architecture and seeing how it works. You will also learn about the data flow of MapReduce, YARN (Yet Another Resource Negotiator) architecture, and the differences between traditional relational database management systems (RDBMS) and MapReduce. Thereafter, you will be taught the architecture of SQOOP and how to import and export data using the SQOOP command-line interface. The syntax for importing data from RDBMS to HDFS and from RDBMS to Hive through SQOOP import and exporting data from HDFS to RDBMS and from HIVE to RDBMS through SQOOP export will be explained in two practice sections. Then, you will study Hive, its architecture, components and data types. The types of tables in Hive, the Hive schema, and data storage will be highlighted. Furthermore, the Impala MPP SQL query engine, its features, and the differences between Impala, Hive, and the traditional RDBMS database will be considered. Also, creating external Hive tables, creating managed Hive tables, and running HQL and Impala queries for analyzing the data will be covered in the practice section.

Next, you will study Pig scripting in Hadoop. You will learn the Pig data types, their uses, and how Pig scripts are executed with the engine. How to load data into Pig as well as filtering data will be also be explained. Creating different Pig Latin scripts, executing and using different functions to perform ETL (extract, transform and load) using Pig will be outlined in the practice section. Then, you will be introduced to the Oozie workflow scheduling system to manage Hadoop jobs. The types of jobs in Oozie, its architecture, features, and actions will be reviewed. Oozie parameterization and how the flow control in the Oozie workflow operates will be critically analyzed. In the practice section, you will learn how to create different actions in SQOOP, Hive, and Pig. This course is for database and data house developers, big data developers, data analysts, and any technical personnel who are interested to learn and explore the various features of Hadoop and its tools. What keeps you waiting? Enroll now and start learning today!

Start Course Now

Careers