UNIT 1:
Introduction to Big Data Platform
Challenges of conventional systems - Web data
Evolution of Analytic Scalability,
analytic processes and tools, Analysis vs reporting
Modern data analytic tools,
Stastical concepts: Sampling distributions
resampling, statistical inference
UNIT 2:
UNIT 3:
ntroduction to Streams Concepts – Stream data model and architecture
Stream Computing, Sampling data in a stream
Filtering streams – Counting distinct elements in a stream
Counting oneness in a window
Realtime Analytics Platform(RTAP)applications
stock market predictions.
UNIT 4:
Mining Frequent itemsets - Market based model
Handling large data sets in Main memory – Limited Pass algorithm
Counting frequent itemsets in a stream
Clustering Techniques –Hierarchical
K- Means – Clustering high dimensional data
Frequent pattern based clustering methods
Clustering in non-euclidean space – Clustering for streams and Parallelism.
UNIT 5:
Sharding – NoSQL Databases
S3 - Hadoop Distributed file systems
Visualizations -Visual data analysis techniques
interaction techniques; Systems -applications: