080-42091111 , +91-8892499499

SPARK Course


Data Scientist
Big Data Visualizer
Big Data Research Analyst
Big Data Engineer
Big Data Architect
Big Data Analyst

  1. What is Apache Spark&Why Spark?
  2. Spark History
  3. Unification in Spark
  4. Spark ecosystem Vs Hadoop
  5. Spark with Hadoop

  1. Introduction to Functional Programming
  2. Interactive Shell – REPL, Data types, Variables, Expressions, Conditional statements, Loops – For comprehension
  3. Pattern Matching in Scala with Match expression
  4. Simple Functions and their variants, Tail Recursion, Functions as Objectsaka Anonymous functions, Higher Order Functions
  5. Scala Collections and the usage of higher order methods on Collections
  6. Classes and Objects, Class Constructors in Scala, Case classes, Abstract and Generic Class
  7. Exception Handling in Scala
  8. Traits in Scala, Properties of Traits
  9. Magic Apply method
  10. Singleton and Companion objects
  11. Implicits in Scala – Implicit parameters, def, classes

  1. Installing Spark
  2. Introduction to Spark’s Python and Scala Shells
  3. Spark Standalone Cluster Architecture and its application flow
  4. Spark on YARN and its application flow

  1. RDD Basics and its characteristics, Creating RDDs
  2. RDD Operations
  3. Transformations
  4. Actions
  5. RDD Types
  6. Lazy Evaluation
  7. Persistence (Caching)

  1. Accumulators and Fault Tolerance
  2. Broadcast Variables
  3. Custom Partitioning

  1. Dealing with different file formats (Text, CSV, JSON files etc.)
  2. Hadoop Input and Output Formats
  3. Connecting to diverse Data Sources (HDFS, Hive, S3, RDBMS and NoSQL etc.)

  1. Linking with Spark SQL
  2. Initializing Spark SQL
  3. DataFrames &Caching
  4. Case Classes, Inferred Schema
  5. Loading and Saving Data
  6. Apache Hive
  7. Data Sources/Parquet
  8. JSON
  9. JDBC/ODBC Server
  10. Spark SQL User Defined Functions (UDFs)
  11. Hive UDFs

  1. Batch vs Streaming
  2. Architecture and Abstraction
  3. DStreams, DStreams vs RDD
  4. Transformations
  5. Input Streams (Socket, HDFS, Twitter, Kafka)
  6. Check pointing, Persist and Caching
  7. Batch and Window Sizes
  8. Level of Parallelism

  1. Machine Learning Basics and terminology
  2. Apache Spark MLLib Algorithms
  3. Examples implementing Machine Learning algorithms using Spark MLLib– Linear Regression

Download Spark Course Content


Overview sessions on Cassandra, Kafka

Project with Spark SQL and Spark Streaming using Kafka & Cassandra