UNIT 1:
Introduction to Big Data Platform
Evolution of Analytic Scalability,
Challenges of conventional systems - Web data
analytic processes and tools, Analysis vs reporting
analytic processes and tools, Analysis vs reporting
Modern data analytic tools,
Stastical concepts: Sampling distributions
resampling, statistical inference
UNIT 2:
Regression modeling, Multivariate analysis
Regression modeling, Multivariate analysis
Bayesian modeling, inference and Bayesian networks,
Support vector and kernel methods
nonlinear dynamics - Rule induction
competitive learning,principal component analysis
data,fuzzy decision trees
Stochastic search methods.
UNIT 3:
Introduction to Streams Concepts – Stream data model and architecture
Stream Computing, Sampling data in a stream
Filtering streams – Counting distinct elements in a stream
Counting oneness in a window
case studies - real time sentiment analysis
stock market predictions.
UNIT 4:
Counting frequent itemsets in a stream
Mining Frequent itemsets - Market based model
Handling large data sets in Main memory – Limited Pass algorithm
Mining Frequent itemsets - Market based model
Clustering Techniques –Hierarchical
K- Means – Clustering high dimensional data
Frequent pattern based clustering methods
Clustering in non-euclidean space – Clustering for streams and Parallelism.
UNIT 5:
Sharding – NoSQL Databases
Visual data analysis techniques
interaction techniques; Systems
S3 - Hadoop Distributed file systems