Course Outline
Introduction
Scala Programming in Depth Review
- Syntax and structure
- Flow control and functions
Spark Internals
- Resilient Distributed Datasets (RDD)
- Spark script to graph to cluster
Overview of Spark Streaming
- Streaming architecture
- Intervals in streaming
- Fault tolerance
Preparing the Development Environment
- Installing and configuring Apache Spark
- Installing and configuring the Scala IDE
- Installing and configuring JDK
Spark Streaming Beginner to Advanced
- Working with key/value RDD's
- Filtering RDD's
- Improving Spark scripts with regular expressions
- Sharing data on a cluster
- Working with network data sets
- Implementing BFS algorithms
- Creating Spark driver scripts
- Tracking in real time with scripts
- Writing continuous applications
- Streaming linear regression
- Using Spark Machine Learning Library
Spark and Clusters
- Bundling dependencies and Spark scripts using the SBT tool
- Using EMR for illustrating clusters
- Optimizing by partitioning RDD's
- Using Spark logs
Integration in Spark Streaming
- Integrating Apache Kafka and working with Kafka topics
- Integrating Apache Fume and working with pull-based/push-based Flume configurations
- Writing a custom receiver class
- Integrating Cassandra and exposing data as real-time services
In Production
- Packaging an application and running it with Spark-Submit
- Troubleshooting, tuning, and debugging Spark Jobs and clusters
Summary and Conclusion
Requirements
- Programming and scripting experience
Audience
- Software Engineers
Testimonials (5)
I liked that it was practical. Loved to apply the theoretical knowledge with practical examples.
Aurelia-Adriana - Allianz Services Romania
Course - Python and Spark for Big Data (PySpark)
This is one of the best hands-on with exercises programming courses I have ever taken.
Laura Kahn
Course - Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP
A lot of practical examples, different ways to approach the same problem, and sometimes not so obvious tricks how to improve the current solution
Rafał - Nordea
Course - Apache Spark MLlib
The VM I liked very much The Teacher was very knowledgeable regarding the topic as well as other topics, he was very nice and friendly I liked the facility in Dubai.
Safar Alqahtani - Elm Information Security
Course - Big Data Analytics in Health
Sufficient hands on, trainer is knowledgable