Hadoop is the most important framework for working with Big Data in a distributed environment. Hadoop Administrators maintains and troubleshoot Hadoop clusters in production/development environments. By attending this training, trainees will learn about Hadoop cluster including planning, deployment, monitoring, performance tuning, security using Kerberos, HDFS high availability and Hcatalog/Hive administration. This course covers the fundamental concepts of Apache Hadoop and Apache Cluster.
Data has become an integral part of every organization, be it small or large; and maintaining it in a proper form has become difficult. Hadoop is a revolutionary open-source framework for software programming that took the data storage and processing to next level. Hadoop platform is used for structuring data and solves formatting problem for subsequent analytic purposes. Hadoop Administration is one of the specialization areas of Hadoop framework which helps in Hadoop Installation, Hadoop Security, Setting up Hadoop clusters and log files and designing, testing and building Hadoop environments.
After the completion of this course, Trainee will:
- Understand how Hadoop solves the Big Data problems, about Hadoop cluster architecture, its core components and ecosystem
- Have knowledge on different Hadoop components, understand working of HDFS, Hadoop cluster modes and configuration files
- Be expertised in Hadoop 1.0 cluster setup and configuration, setting up Hadoop Clients using Hadoop 1.0 and resolve problems simulated from real-time environment.
- Work on the secondary namenode, working with Hadoop distributed cluster, enabling rack awareness, maintenance mode of Hadoop cluster, adding or removing nodes to your cluster in adhoc.
- Gain knowledge day to day cluster administration tasks, balancing data in cluster, protecting data by enabling trash, attempting a manual failover, creating backup within or across clusters, safeguarding your metadata and doing metadata recovery or manual failover of NameNode recovery.
- Have capability to cluster, cluster sizing, hardware, network and software considerations, popular Hadoop distributions, workload and usage patterns, industry recommendations in Hadoop 2.0 environment.
Prepare for Certification
Our training and certification program gives you a solid understanding of the key topics covered on the Cloudera (CCAH). In addition to boosting your income potential, getting certified in Hadoop Administration, demonstrates your knowledge of the skills necessary to be an effective Hadoop Professional. The certification validates your ability to produce reliable, high-quality results with increased efficiency and consistency.
Unit 1: What is Big Data
- Need for a different technique for Data Storage
- Need for a different paradigm for Data Analysis
- The 3 V’s of Big Data
- Different distributions of Hadoop
Unit 2: The Case for Apache Hadoop
- A Brief History of Hadoop
- Core Hadoop Components
- Fundamental Concepts
- Hadoop Eco-Systems – Overview
Unit 3: The Hadoop Distributed File System
HDFS FeaturesHDFS Design AssumptionsOverview of HDFS ArchitectureWriting and Reading Files
Unit 4: MapReduce
- What Is MapReduce?
- Features of MapReduce
- Basic MapReduce Concepts
- Architectural Overview
- What is a Combiner?
- What is a Practitioner?
Unit 5: An Overview of the Hadoop Ecosystem
- What is the Hadoop Ecosystem?
- Integration Tools
- Analysis Tools
- Data Storage and Retrieval Tools
Unit 6: Planning your Hadoop Cluster
- General planning Considerations
- Choosing the Right Hardware
- Network Considerations
- Configuring Nodes
Unit 7: Hadoop Installation
- Deployment TypesInstalling Hadoop
- Basic Configuration Parameters
- Hands-On Exercise on a Pseudo – Cluster
- Hands-On Exercise on a Multi-Node Cluster
Unit 8: Advanced Configuration
- Advanced Parameters
- core-site.xml parameters
- mapred-site.xml parameters
- hdfs-site.xml parameters
- Configuring Rack Awareness
Unit 9: Hadoop Security
- Why Hadoop Security Is Important
- Hadoop’ s Security System Concepts
- What Kerberos Is and How it Works
- Integrating a Secure Cluster with Other Systems
Unit 10: Managing and Scheduling Jobs
- Managing Running Jobs
- The FIFO Scheduler
- The Fair Scheduler
- The Capacity Scheduler
- Configuring the Fair Scheduler
- Evaluating the different schedulers
Unit 11: Cluster Maintenance
- Checking HDFS Status
- Copying Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing the Cluster
- Name Node Metadata Backup
- Cluster Upgrading
Unit 12: Cluster Monitoring and Troubleshooting
- General System Monitoring
- Managing Hadoop’s Log Files
- Using the Name Node and Job Tracker Web UIs
- Cluster Monitoring with Ganglia
- Common Troubleshooting Issues
- Benchmarking Your Cluster
Unit 13: Installing and Managing Other Hadoop Projects
“The course provides adequate knowledge and information on business intelligence which helps to improve business efficiency and management” – Mehul Thakkar
“This course is best for any business enthusiast, the course explains in detail data reporting and warehousing methods” – Ankit Doshi