A Level: Big Data Analytics Using Hadoop (A9.1-R5, NIELIT / DOEACC)



    The purpose of this module is to provide skills to students to analyze and process large volumes of data using tools and techniques. It provides theoretical background as well as in-depth knowledge of Software/ packages that are used in analyzing the voluminous data.


    After completing the module, the incumbent will be able to:

    • Collect and combine data recovered from different sources and in different formats into a uniform format that will help in analyzing data.
    • Understand the basics of data, database, and the requirement to analyze data, analyzing data using mathematical and statistical
    • techniques, representation of data in tabular and graphical modes.
    • The concept and usefulness of a cluster environment for processing voluminous data.
    • Analyze data using the Hadoop framework and its sub-project of HIVE

    120 Hours – (Theory: 48 hrs + Practical: 72 hrs)

    Detailed Syllabus

    (i) Analyze and Define Business Requirement

    Introduction to Business Intelligence, Business Analytics, Data, Information, how information hierarchy can be improved/introduced, understanding Business Analytics, Introduction to OLAP, OLTP, data mining, and data warehouse. Difference between OLAP and OLTP.
    Introduction to a database, characteristics of data in a database, DBMS, advantages of DBMS, file-oriented approach versus Database-oriented approach to Data Management, disadvantages of the file-oriented approach. A brief overview of the relational model. Definition of relation, properties of the relational model, Concept of keys: candidate key, primary key, alternate key, foreign key, Fundamental integrity rules: entity integrity, referential integrity. SQL statements: Insert, delete, update, and select. Join, union.

    (ii) Introduction to Operating System

    Introduction to Ubuntu Operating System, Managing files and folder through command line and Desktop. Basic Ubuntu commands like ls, mkdir, clear, rm. Creating users and groups in Ubuntu. User privileges and roles (chown and chmod commands),gedit editor. Secure shell configuration, configuring. bashrc and environment files.

    (iii) Java Programming

    OOPS Principles, an Overview of Java Object-Oriented Programming, Data Types, Variables, and Arrays, Operators-Arithmetic Operators, The Bitwise Operators, Relational Operators, Boolean, Logical Operators, Programming Constructs, Methods and Inheritance, The basic Java I/O Classes and String Handling
    Exception-Handling Fundamentals, Exception Types, Uncaught Exceptions, Using try and catch, Displaying a Description of an Exception, Multiple catch Clauses, Nested try Statements, Throw throws finally Java’s Built-in Exceptions Packages, Access Protection, Importing Packages and Interfaces
    Java Swing and its controls like JTextField, JLabel, JComboBox, JTable, JButton, JScrollBar, JOptionPane, and JMenu. Java Database Connectivity JDBC-ODBC Bridge JDBC Drivers Creating DSN Driver Manager, Connection, Statement, ResultSet. Connecting Java with Database.

    (iv) Hadoop Framework and Map-Reduce Programming Technique

    Big Data Concepts, Need for analyzing Big Data, its roles in Business Intelligence, and decision making.
    Big Data, Hadoop Architecture, Hadoop ecosystem components, storage, Hadoop Distributed File System (HDFS), Single node installation. Multi-node installations. Cluster Architecture, Cluster configuration files Hadoop commands, Hadoop Server Role, name Node, secondary node, data node, file write, and read.
    Shell commands, Accessing files on HDFS and the local machine, Map-Reduce Framework, Developing Map Reduce Programs, the structure of Map Reduce program,

    (v) Analysing Data Using HIVE

    Introduction to HIVE, installing HIVE, Data types, HIVE shell, HIVE commands, HIVE SQL, creating database and tables, bulk loading of data, SQL DML statements, SQL Join, HIVE Functions, Complex Data types, UDF in Hive using Java

    (vi) Basics of R Programming and ARCHIVE

    R Overview, Basic Syntax, Data types, R Control constructs like loop and conditional, R Function. Connecting R with Hive.

    (vii) HIVE JDBC Connectivity

    Starting HIVE in client-server mode, beeline, mapping HIVE datatype with Java datatypes, Connecting Java with HIVE. Integrating Java Swing, HIVE, and JDBC for developing front end applications.

    (viii) Introduction to HBase, PIG and JAQL

    HBase introduction, integration with Hadoop, HBase Shell, introduction to JAQL data model, JAQL shell, introduction to JSON files, and accessing JSON files through JAQL. Introduction to PIG

    Course Reviews


    • 5 stars0
    • 4 stars0
    • 3 stars0
    • 2 stars0
    • 1 stars0

    No Reviews found for this course.