UNIT 1:
Data Science – Fundamentals and Components
Terminologies Used in Big Data Environments
Introduction to Big Data – Characteristics of Data
Big Data Analytics – Classification of Analytics
Challenges Facing Big Data – Importance of Big Data Analytics
UNIT 2:
Introduction to Essential Data Science Packages
Numpy - Creation of Arrays, Indexing and Slicing Operations
Scipy – Manipulation of mathematical functions using special package
Statsmodels and Pandas Package
UNIT 3:
Introducing Hadoop – Hadoop Overview
HDFS (Hadoop Distributed File System)
Components and Block Replication
Processing Data with Hadoop
Introduction to MapReduce
UNIT 4:
Data Munging: Introduction to Data Munging
Data Pipeline and Machine Learning in Python
Data Visualization Using Matplotlib
Interactive Visualization
UNIT 5:
Key-value store, Document store, Column family, Graph store
Sharding –Types of sharding
Introduction to Hive – Hive Architecture – Hive Query Language