080-42091111 , +91-8892499499

SPARK Course

Vacancy

Data Scientist
Big Data Visualizer
Big Data Research Analyst
Big Data Engineer
Big Data Architect
Big Data Analyst

  1. What is Apache Spark&Why Spark?
  2. Evolution of Distributed systems &Challenges faced
  3. Need of new generation
  4. Hardware/software evolution in last decade
  5. Spark History
  6. Unification in Spark
  7. Spark ecosystem Vs Hadoop
  8. Spark with Hadoop
  9. Who are using Spark?

  1. Introduction to Functional Programming
  2. Interactive Shell – REPL, Data types, Variables, Expressions, Simple Functions, Loops
  3. Classes and Objects, Traits – OOPS in Scala
  4. Functions as Objects, Nested Functions
  5. Pattern Matching, Scala Collections with their Functions

  1. Downloading Spark
  2. Introduction to Spark’s Python and Scala Shells
  3. Spark Standalone
  4. Spark on YARN

  1. RDD Basics and its characteristics, Creating RDDs
  2. RDD Operations
  3. Transformations
  4. Actions
  5. RDD Types
  6. Lazy Evaluation
  7. Persistence (Caching)

  1. Dealing with different file formats (Text, CSV, JSON files etc.)
  2. Hadoop Input and Output Formats
  3. Connecting to diverse Data Sources (HDFS, Hive, S3, RDBMS and NoSQL etc.)

  1. Accumulators and Fault Tolerance
  2. Broadcast Variables
  3. Custom Partitioning

  1. Linking with Spark SQL
  2. Using Spark SQL in Applications
  3. Initializing Spark SQL
  4. DataFrames &Caching
  5. Case Classes, Inferred Schema
  6. Loading and Saving Data
  7. Apache Hive
  8. Data Sources/Parquet
  9. JSON
  10. JDBC/ODBC Server
  11. Spark SQL User Defined Functions (UDFs)
  12. Hive UDFs

  1. Batch vs Streaming
  2. Architecture and Abstraction
  3. DStreams, DStreams vs RDD
  4. Transformations
  5. Input Streams (Socket, HDFS, Twitter, Kafka)
  6. Check pointing, Persist and Caching
  7. Batch and Window Sizes
  8. Level of Parallelism

  1. Machine Learning Basics
  2. Example: Spam Classification
  3. Data Types and Working with Vectors
  4. Apache Spark MLLib Algorithms

Project with Spark Streaming and Spark SQL