Course Outline
Section 1: Introduction to Hadoop
- hadoop history, concepts
- eco system
- distributions
- high level architecture
- hadoop myths
- hadoop challenges
- hardware / software
- lab : first look at Hadoop
Section 2: HDFS
- Design and architecture
- concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons : Namenode, Secondary namenode, Data node
- communications / heart-beats
- data integrity
- read / write path
- Namenode High Availability (HA), Federation
- labs : Interacting with HDFS
Section 3 : Map Reduce
- concepts and architecture
- daemons (MRV1) : jobtracker / tasktracker
- phases : driver, mapper, shuffle/sort, reducer
- Map Reduce Version 1 and Version 2 (YARN)
- Internals of Map Reduce
- Introduction to Java Map Reduce program
- labs : Running a sample MapReduce program
Section 4 : Pig
- pig vs java map reduce
- pig job flow
- pig latin language
- ETL with Pig
- Transformations & Joins
- User defined functions (UDF)
- labs : writing Pig scripts to analyze data
Section 5: Hive
- architecture and design
- data types
- SQL support in Hive
- Creating Hive tables and querying
- partitions
- joins
- text processing
- labs : various labs on processing data with Hive
Section 6: HBase
- concepts and architecture
- hbase vs RDBMS vs cassandra
- HBase Java API
- Time series data on HBase
- schema design
- labs : Interacting with HBase using shell; programming in HBase Java API ; Schema design exercise
Requirements
- comfortable with Java programming language (most programming exercises are in java)
- comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)
Lab environment
Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.
Students will need the following
- an SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
- a browser to access the cluster. We recommend Firefox browser
Testimonials (6)
Trainer's preparation & organization, and quality of materials provided on github.
Mateusz Rek - MicroStrategy Poland Sp. z o.o.
Course - Impala for Business Intelligence
I thought he did a great job of tailoring the experience to the audience. This class is mostly designed to cover data analysis with HIVE, but me and my co-worker are doing HIVE administration with no real data analytics responsibilities.
ian reif - Franchise Tax Board
Course - Data Analysis with Hive/HiveQL
I genuinely enjoyed the many hands-on sessions.
Jacek Pieczątka
Course - Administrator Training for Apache Hadoop
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
The fact that all the data and software was ready to use on an already prepared VM, provided by the trainer in external disks.
vyzVoice
Course - Hadoop for Developers and Administrators
practical things of doing, also theory was served good by Ajay