UNIT 1:
Introduction to Big Data Platform
Challenges of conventional systems
Stastical concepts: Sampling distributions
Resampling, statistical inference
Web data – Evolution of Analytic scalability, analytic processes and tools
Modern data analytic tools
UNIT 2:
Regression modeling, Multivariate analysis
Bayesian modeling, inference and Bayesian networks
Support vector and kernel methods
Fuzzy logic: extracting fuzzy models from data
Stochastic search methods
Neural networks: learning and generalization
Analysis of time series: linear systems analysis
Principal component analysis and neural networks
UNIT 3:
Filtering streams – Counting distinct elements in a stream
Introduction to Streams Concepts – Stream data model and architecture
Stream Computing, Sampling data in a stream
Counting oneness in a window
Realtime Analytics Platform(RTAP) applications
case studies - real time sentiment analysis
UNIT 4:
Market based model – Apriori Algorithm
Handling large data sets in Main memory
Clustering Techniques – Hierarchical – K- Means
Counting frequent itemsets in a stream
Clustering high dimensional data – CLIQUE and PROCLUS
Frequent pattern based clustering methods
Clustering in non-euclidean space – Clustering for streams and Parallelism
UNIT 5:
S3 - Hadoop Distributed file systems
Visual data analysis techniques, interaction techniques