Menu Close

Best Big Data Training Institute in Dilsukhnagar

Big Data Course Content

Module 1

  • Big Data Introduction and Hadoop
  • Fundamental
  • Data Storage & Analysis
  • Comparision with RDBMS
  • HDFS ARCHITECTURE
  • Basic Terminologies
  • HDFS Block Concepts
  • Replication Concepts
  • Basic reading & writing of files in HDFS
  • Basic processing concepts in MapReduce
  • Data Flow
  • Anatomy of file READ and WRITE

Module 2

  • HADOOP ADMINISTRATOR
  • HADOOP GEN1 VS HADOOP GEN 2(YARN)
  • Linux commands
  • Single and Multinode cluster installation (HADOOP Gen 2)
  • AWS (EC2, RDS, S3, IAM and Cloud formation)
  • Cloudera and Hortonworks distribution installation on AWS
  • Cloudera Manager and Ambari
  • Hadoop Security and Commissioning and Decommissioning of nodes
  • Sizing of Hadoop Cluster and Name Node High Availability

Module 3

  • DATA INGESTION
  • Sqoop:
  • Migration of data from MYSQL/ ORACLE to HDFS.
  • Creating SQOOP job.
  • Scheduling and Monitoring SQOOP job using OOZIE and Crontab.
  • Incremental and Last modified mode in sqoop.
  • Talend:
  • Installation of Talend big data studio on windows server.
  • Creating and Scheduling talend Jobs.
  • Components: tmap, tmssqlinput, tmssqloutput,tFileInputDelimited, tfileoutputdelimited, tmssqloutputbulkexec, tunique, tFlowToIterate,tIterateToFlow, tlogcatcher, tflowmetercatcher, tfilelist, taggregate, tsort, thdfsinput, thdfsoutput, tFilterRow, thiveload.
  • Flume:
  • Flume Architecture
  • Data Ingest in HDFS with Flume
  • Flume Sources
  • Flume Sinks
  • Topology Design Considerations

Module 4

  • DATA PROCESSING
  • MapReduce:
  • Env Setup
  • Tool and ToolRunner
  • Mapper
  • Reducer
  • Driver program
  • How to package the job?
  • MapReduce WebUI
  • How MapReduce Job run?
  • Shuffle & Sort
  • Speculative Execution
  • InputFormats
  • Input Splits and Record Reader
  • Default Input Formats
  • Implement Custom Input Format
  • OutputFormats
  • Default Output formats
  • Output Record Reader
  • Compression
  • Map Output
  • Final Output
  • Data types – default
  • Writable vs Writable Comparable
  • Custom Data types – Custom Writable/Comparable
  • File Based Data structures
  • Sequence file
  • Reading and Writing into Sequence file
  • Map File
  • Tuning MapReduce Jobs
  • Advanced MapReduce
  • Sorting
  • Partial Sort
  • Total Sort
  • Secondary Sort
  • Joins
  • Hive:
  • Comparison with RDBMS
  • HQL
  • Data types
  • Tables
  • Importing and Exporting
  • Partitioning and Bucketing – Advanced.
  • Joins and Join Optimization.
  • Functions- Built in & user defined
  • Advanced Optimization of HQL
  • Storage File Formats – Advanced
  • Loading and Storing Data
  • SerDes – Advanced
  • Pig:
  • Important basics
  • Pig Latin
  • Data types
  • Functions – Built-in, User Defined
  • Loading and Storing Data
  • Spark:
  • Spark introduction
  • Spark vs MapReduce
  • Intro to spark lib (SparkSql, SparkStreaming, Spark Core)

Module 5

  • An Introduction to Python
  • 1.1 Brief about the course
  • 1.2 History/timelines of python
  • 1.3 What is python ?
  • 1.4 What python can do?
  • 1.5 How the name was put up as python
  • 1.6 Why python?
  • 1.7 Who all are using python
  • 1.8 Features of python
  • 1.9 Python installation
  • 1.10. Hello world
  • 1. using cmd
  • 2. IDLE
  • 3. By py script
  • 4. python command line
  • 2: Beginning Python Basics
  • 2.1. The print statements
  • 2.2. Comments
  • 2.3. Python Data Structures
  • 2.4. variables & Data Types
  • 1. rules for variable
  • 2. declaring variables
  • 3. Assignment in variables
  • 4. operations with variables
  • 5. Reserved keyword
  • 2.5. Operators in Python
  • 2.6. Simple Input & Output
  • 2.7. Examples for variables , Data Types ,operators
  • 3: Python Program Flow
  • 3.1. Indentation
  • 3.2. The If statement and its’ related statement
  •  
  • 3.4. The while loop
  • 3.5. The for loop
  • 3.6. The range statement
  • 3.7. Break
  • 3.8. Continue
  • 3.9. pass
  • 3.9. Examples for looping
  • 4: Functions & Modules
  • 4.1. system define function(number system and its sdf ,String and its sdf
  • )
  • 4.2. Create your own functions (user define function)
  • 4.3. Functions Parameters
  • 4.4. Variable Arguments
  • 4.5. An Exercise with functions
  • 5: Exceptions
  • 5.1. Errors
  • 5.2. Exception Handling with try
  • 5.3. Handling Multiple Exceptions
  • 5.4. raise
  • 5.5. finally
  • 5.6. else
  • 6: File Handling
  • 6.1. File Handling Modes
  • 6.2. Reading Files
  • 6.3. Writing & Appending to Files
  • 6.4. Handling File Exceptions
  • 7: Data Structures and Data Structures functions
  • 7.1. List and its sdf
  • 7.2. tuple and its sdf
  • 7.3. Dictionary and its sdf
  • 7.4. set and its sdf
  • 7.5. use cases and practical examples
  • 8: casting
  • 8:1 intro to casting

Module 6

  • NOSQL
  • Cassandra:
  • Cassandra cluster installation
  • Cassandra Architecture
  • Cqlsh
  • Replication strategy
  • Tools: Opscenter, Nodetool and CCM
  • Cassandra use cases
  • Labs:
  • Real Time use cases and Data sets covered (10+ Real Time datasets)
  • Word count, Sensors (Weather Sensors) Dataset, Social Media data sets like YouTube, Twitter data analysis

Register for Big Data training today! Questions? Contact us at  +91 7995920133