Loading

Alison's New App is now available on iOS and Android! Download Now

Fundamentals of Hadoop

Understand the fundamentals of the Hadoop ecosystem with hands-on practices in this free online training course.

Publisher: Proton Tech
In this free online course, you will be introduced to the components and tools of Apache Hadoop. You will learn how to store and process large datasets ranging in size from gigabytes to petabytes with BigData. The HDFS architecture, data processing using MapReduce, importing and exporting data using SQOOP, etc. will be covered. The course also has a practice section that provides you with practical knowledge and hands-on activities. Enrol now!
Fundamentals of Hadoop
  • Duration

    1.5-3 Hours
  • Students

    29
  • Accreditation

    CPD

Description

Modules

Outcome

Certification

View course modules

Description

Apache Hadoop is an open-source software framework that facilitates the use of a network of computer devices to store and process large data sets using simple programming model models. Hadoop is designed to solve problems that involve analyzing large data ranging from gigabytes to petabytes. The framework is written in java programming language and is based on Google’s Map Reduce programming model. This course begins with an introduction to Hadoop and Big Data software utility. The course will teach you the features, types, and sources of data in Big Data. The various ways of analyzing Big Data and its benefits will also be covered. The overview of Apache Hadoop, its framework, history, and the Hadoop ecosystem will be discussed. Then, in the practice section, you will study how to download, start and connect to the Cloudera virtual machine using the Docker platform. Furthermore, you will study the architecture of the Hadoop distributed file system (HDFS). The building blocks of Hadoop, the components, as well as workflow will be explained. Also, some useful HDFS shell commands used to manage files on the HDFS clusters and how to create directories, move, delete and read files will be highlighted.

Next, you will be introduced to the MapReduce programming model. You will study its architecture and see an illustration of how it works. You will also learn about the data flow of MapReduce, the YARN architecture, and the differences between traditional Relational Database Management System (RDBMS) and Map Reduce. Thereafter, you will be taught the architecture of SQOOP and how to import and export data using the SQOOP command-line interface. The syntax for importing data from RDBMS to HDFS and from RDBMS to Hive through SQOOP import and exporting data from HDFS to RDBMS and from HIVE to RDBMS through SQOOP export will be explained in two practice sections. Then, you will study Hive, its architecture, components, and data types. The types of tables in Hive, the Hive schema, and data storage will be highlighted. Furthermore, the Impala MPP SQL query engine, its features, and the differences between Impala, Hive, and the traditional RDBMS database will be considered. Also, creating external Hive tables, creating managed Hive tables, and running HQL and Impala queries for analyzing the data will be covered in the practice section.

Furthermore, you will study the overview of Pig scripting in Hadoop. You will learn the Pig data types, their uses, and how Pig scripts are executed with the Pig execution engine. How to load data into Pig as well as filtering data in Pig will be also be explained. How to create different Pig Latin scripts, execute and use different functions to perform ETL using Pig will be outlined in the practice section. Then, you will be introduced to the Oozie workflow scheduling system to manage Hadoop jobs. The types of jobs in Oozie, the Oozie architecture, features, and actions will be reviewed. The Oozie parameterization and how the flow control in the Oozie workflow operates will be critically analyzed. In the practice section, you will learn how to create different actions in SQOOP, Hive, and Pig. This course is for database and data house developers, big data developers, data analyst, and any technical personnel who is interested to learn and explore the various features of Hadoop and its tools. What keeps you waiting? Enrol now and start learning!

Start Course Now

Careers