Hadoop

25 Reviews

Course Outline

Module 1

Understanding Hadoop

      • The Three Vs of Big Data, Six Key Hadoop DATA TYPES, Sentiment Use Case
      • Getting Twitter Feeds into Hadoop, Use HCatalog to Define a Schema, Use Hive to Determine Sentiment, View Spikes in Tweet Volume, View Sentiment by Country, Geolocation Use Case
      • The Geolocation Data, Getting the Raw Data into Hadoop, The Truck Data, Getting the Truck Data into Hadoop, HCatalog Stores a Shared Schema
      • Data Analysis, Use Hive to Compute Truck Mileage, About Hadoop, Relational Databases vs. Hadoop, About Hadoop 2.x
      • New in Hadoop 2.x, The Hadoop Ecosystem, The Hortonworks Data Platform (HDP), The Path to ROI
      • Lab: Start an HDP 2.1 Cluster

Module 2

Lab: Start an HDP 2.1 Cluster

      • About HDFS, Hadoop and RDBMS diffrenciate, HDFS Components, The NameNode, The DataNodes, DataNode Failure, HDFS Commands

Module 3

Inputting Data into HDFS

      • Examples of HDFS Commands, HDFS File Permissions, Options for Data Input, The Hadoop Client, Web HDFS, A Flume Example
      • Overview of Sqoop, The Sqoop Import Tool, Importing a Table, Importing Specific Columns, Importing from a Query, The Sqoop Export Tool, Exporting to a Table .
      • Lab: Importing RDBMS Data into HDFS
      • Lab: Exporting HDFS Data to an RDBMS

Module 4

The MapReduce Framework

      • Understanding MapReduce, The Key/Value Pairs of MapReduce, WordCount in MapReduce
      • Demo: Understanding MapReduce
      • Lab: Running a MapReduce Job

Module 5

Introduction to Pig

      • About Pig, Pig Latin
      • The Grunt Shell
      • Demo: Understanding Pig
      • Pig Latin Relation Names
      • Pig Latin Field Names& Data Types
      • Pig Complex Types
      • Defining a Schema
      • Lab: Getting Started with Pig
      • The GROUP Operator, GROUP ALL, Relations without a Schema, The FOREACH…GENERATE Operator, Specifying Ranges in FOREACH, Field Names in FOREACH, FOREACH with Groups, The FILTER Operator, The LIMIT Operator
      • Lab: Exploring Data with Pig

Module 6

Advanced Pig Programming

      • The ORDER BY Operator, The CASE Operator, Parameter Substitution, DISTINCT, PARALLEL, FLATTEN, Operator, Performing an Inner and outer Join, Invoking a UDF, Tips for Optimizing Pig Scripts
      • Lab: Joining Datasets
      • Preparing Data for Hive

Module 7

Hive Programming

      • About Hive, Comparing Hive to SQL, Hive Architecture, Submitting Hive Queries, Defining a Hive-Managed Table, Defining an External Table, Defining a Table LOCATION, Loading Data into Hive, Performing Queries
      • Understanding Hive Tables, Hive Partitions, Hive Buckets, Skewed Tables, Demo: Understanding Partitions and Skew, Using Distribute By, Storing Results to a File, Specifying MapReduce Properties
      • Lab: Analyzing Big Data with Hive
      • Lab: Understanding MapReduce in Hive
      • Hive Join Strategies, Shuffle Joins, Map (Broadcast) Joins, Sort-Merge-Bucket Joins, Invoking a Hive UDF, Computing ngrams in Hive
        Demo: Computing programs

Module 8

Using Hcatalog

      • About Hcatalog, HCatalog in the Ecosystem
      • Defining a New Schema
      • Using HCatLoader with Pig
      • Using HCatStorer with Pig, The Pig SQL Command
      • Lab: Using HCatalog with Pig

Module 9

Advanced Hive Programming

      • Performing a Multi-Table/File Insert
      • Understanding Views, Defining Views, Using Views, The TRANSFORM Clause, The OVER Clause, Using Windows, Hive Analytics Function Lab: Advanced Hive Programming
      • Hive File Formats, Hive SerDes, Hive ORC Files, Computing Table Statistics, Hive Cost-Based Optimization (CBO), Using Hive CBO, Vectorization, Using HiveServer2, Understanding Hive on Tez, Using Tez for Hive Queries
      • Demo: Hive Optimizations
      • Hive Optimization Tips, Hive Query Tunings, Lab: Streaming Data with Hive and Python

Module 10

 Hadoop 2 and YARN

      • About HDFS Federation, Multiple Federated NameNodes, Multiple Namespaces
      • Overview of HDFS HA, Quorum Journal Manager, Configuring Automatic Failover
      • About YARN, Open-source YARN Use Cases
      • The Components of YARN
      • Life cycle of a YARN Application
      • A Cluster View Example

Module 11

Defining Workflow with Oozie

    • Submitting a Workflow Job, Fork and Join Nodes
    • Defining an Oozie Coordinator Job
    • Schedule a Job Based on Time
    • Schedule Based on Data Availability
    • Lab: Defining an Oozie Workflow

Hadoop
 
 

Python-logo-notext.svg

I want to start with a Free Demo

OBJECTIVE OF THE COURSE
REQUIREMENTS AND PREREQUISITES FOR THE COURSE
Outcome