080-42091111 , +91-8892499499

Hadoop Development


Trainers are certified Real Time working professionals

Main focus on Hands-on sessions

Course aligned to Cloudera Certification

Flexible timings for working people

Real-time projects on Hadoop

Guidance in Resume Preparation

100% placement assistance

Post training support

Life validity for re-attending classes

Regular Batch(45-50days) : MON-FRI(7-9am) & Alternate Weekend Classes(2-6pm)
Weekend Batch (2 Months): SAT & SUN(2-6pm)
Course Fee: 16,000/-
New Regular Batch starts on:
Free Demo Session scheduled on :     
Ph : 8892499499 | Web:www.dvstechnologies.in | mail:dvs.training@gmail.com


Data Scientist
Big Data Visualizer
Big Data Research Analyst
Big Data Engineer
Big Data Architect
Big Data Analyst

  1. What is Big Data?
  2. Examples of Big Data
  3. Reasons of Big Data Generation
  4. Why Big Data deserves your attention
  5. Use cases of Big Data
  6. Different options of analyzing Big Data

  1. What is Hadoop
  2. History of Hadoop
  3. Problems with Traditional Large-Scale Systems and Need for Hadoop
  4. Understanding Hadoop Architecture
  5. Fundamental of HDFS (Blocks, Name Node, Data Node, Secondary Name Node)
  6. Block Placement &Rack Awareness
  7. HDFS Read/Write/Delete
  8. Under/Over/Missing Replication
  9. Types of Scaling(Horizantal/Vertical)
  10. Drawback with 1.X Hadoop
  11. Introduction to 2.X Hadoop
  12. HDFS Federation and High Availability

  1. Setting up single node Hadoop cluster(Pseudo mode)
  2. Understanding Hadoop configuration files
  3. Hadoop Components- HDFS, MapReduce
  4. Overview Of Hadoop Processes
  5. Overview Of Hadoop Distributed File System
  6. The building blocks of Hadoop
  7. Hands-On Exercise: Using HDFS commands

  1. Understanding Map Reduce
  2. Job Tracker and Task Tracker
  3. Architecture of Map Reduce
  4. Data Flow of Map Reduce
  5. Hadoop Writable, Comparable& comparison with Java data types
  6. Creation of local files and directories with Hadoop API
  7. Creation of HDFS files and directories with Hadoop API
  8. Map Function& Reduce Function
  9. How Map Reduce Works
  10. Anatomy of Map Reduce Job
  11. Submission & Initialization of Map Reduce Job
  12. Monitoring & Progress of Map Reduce Job
  13. Understand Difference Between Block and Input Split
  14. Role of Record Reader, Shuffler and Sorter
  15. File Input Formats
  16. Getting Started With Eclipse IDE
  17. Setting up Eclipse Development Environment
  18. Creating Map Reduce Projects
  19. Configuring Hadoop API on Eclipse IDE
  20. Differences between the Hadoop Old and New APIs
  21. Life cycle of the Job
  22. Identity Reducer
  23. MapReduce program flow with wordcount
  24. Combiner &Partitioner, Custom Partitioner with example
  25. Joining Multiple datasets in MapReduce
  26. Map Side, Reduce Side joins with examples
  27. Distributed Cache with practical example
  28. Stragglers &Speculative execution
  29. Schedulers(FIFO Scheduler, FAIR Scheduler, CAPACITY Scheduler)

  1. Limitations of Current Architecture
  2. YARN Architecture
  3. Application Master, Node Manager & Resource Manager
  4. Developing Map Reduce Application using YARN

  1. Introduction to Apache Hive
  2. Architecture of Hive
  3. Installing Hive
  4. Hive data types
  5. Exploring hive metastore tables
  6. Types of Tables in Hive
  7. Partitions(Static & Dynamic)
  8. Buckets & Sampling
  9. Indexes& Views
  10. Developing hive scripts
  11. Parameter Substitution
  12. Difference between order& sort by, Cluster& distribute by
  13. Different compressions in HIVE
  14. File Input formats(Text file, RC, ORC, Sequence, Parquet)
  15. Optimization Techniques in HIVE
  16. Impala&HUE(Cloudera Web UI)
  17. Creating UDFs
  18. Hands-On Exercise
  19. Assignment on HIVE

  1. Introduction to Apache Pig
  2. Building Blocks ( Bag, Tuple & Field)
  3. Installing Pig
  4. PIG Terminology &Data types
  5. Different modes of execution of PIG
  6. Working with various PIG Commands covering all the functions in PIG
  7. Developing PIG scripts
  8. Parameter Substitution
    1. Command line arguments
    2. Passing parameters though a param file
  9. Joins ( Left Outer, Right Outer, Full Outer)
  10. Nested queries
  11. Specialized joins in PIG (Replicated, Skewed & Merge Join)
  12. HCatalog(Getting data from hive to pig & vice versa)
  13. Working with un-structured data
  14. Working with Semi-structured data like XML, JSON
  15. Optimization techniques
  16. Creating UDFs
  17. Hands-On Exercise
  18. Assignment on PIG

  1. Introduction to SQOOP& Architecture
  2. Import data from RDBMS to HDFS
  3. Importing Data from RDBMS to HIVE
  4. Exporting data from HIVE to RDBMS
  5. Handling incremental loads using sqoop
  6. Hands on exercise

  1. Introduction to HBASE
  2. Exploring HBASE Master & Region server
  3. Exploring Zookeeper
  4. CRUD Operation of HBase with Examples
  5. HIVE integration with HBASE(HBASE-Managed hive tables)
  6. Hands on exercise on HBASE

  1. What is Oozie&Why Oozie
  2. Features of Oozie
  3. Job Types in Oozie
  4. Control Nodes& Action Nodes
  5. Oozie Workflow Process flow
  6. Oozie Parameterization
  7. Oozie Command Line examples – Developer
  8. Oozie Web Console
  9. Hands on exercise on OOZIE

We will be providing raw data & requirements for the project & you will have to work. Finally we will have one Project execution session where we will be explaining the steps for project execution along with code.
Workshops on Spark & Scala once in 2 months. Interested people can attend.