Hadoop - Data Serialization - Columnar Storage - Messaging Systems – NoSQL - Distributed SQL Query Engine. Programming in Scala: Functional Programming (FP) - Scala Fundamentals - A Standalone Scala Application
Overview - High-level Architecture - Application Execution - Data Sources - Application Programming Interface (API) - Lazy Operations – Caching - Spark Jobs - Shared Variables
Getting Started - REPL Commands - Using the Spark Shell as a Scala Shell - Number Analysis - Log Analysis Unit – IV
Hello World in Spark - Compiling and Running the Application - Monitoring the Application - Debugging the Application
Introducing Spark Streaming - Application Programming Interface (API) - A Complete Spark Streaming Application
Reference Book:
1. Philipp K.Janert, “Data Analysis with Open Source Tools”, O’Reilly Media, Inc., First Edition, ISBN – 978-0-596-80235-6 2. Wes McKinney, “Python for Data Analysis”, O’Reilly Media, Inc., First Edition, ISBN – 978-1-449-31979-3 3. Wes McKinney, “Python for Data Analysis”, O’Reilly Media, Inc., Second Edition, ISBN – 978-1-491-95766-0 4. Alice Zhen & Amanda Casari, “Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists”, O’Reilly Media, Inc., First Edition, ISBN – 978-1-491- 95324-2
Text Book:
Mohammed Guller, “Big Data Analytics with Spark”, Apress Media, First Edition, ISBN – 978-1-4842-0965-3