お問い合わせを送信いただきありがとうございます!当社のスタッフがすぐにご連絡いたします。
予約を送信いただきありがとうございます!当社のスタッフがすぐにご連絡いたします。
コース概要
Introduction:
- Apache Spark in Hadoop Ecosystem
- Short intro for python, scala
Basics (theory):
- Architecture
- RDD
- Transformation and Actions
- Stage, Task, Dependencies
Using Databricks environment understand the basics (hands-on workshop):
- Exercises using RDD API
- Basic action and transformation functions
- PairRDD
- Join
- Caching strategies
- Exercises using DataFrame API
- SparkSQL
- DataFrame: select, filter, group, sort
- UDF (User Defined Function)
- Looking into DataSet API
- Streaming
Using AWS environment understand the deployment (hands-on workshop):
- Basics of AWS Glue
- Understand differencies between AWS EMR and AWS Glue
- Example jobs on both environment
- Understand pros and cons
Extra:
- Introduction to Apache Airflow orchestration
要求
Programing skills (preferably python, scala)
SQL basics
21 時間
お客様の声 (3)
Having hands on session / assignments
Poornima Chenthamarakshan - Intelligent Medical Objects
コース - Apache Spark in the Cloud
1. Right balance between high level concepts and technical details. 2. Andras is very knowledgeable about his teaching. 3. Exercise
Steven Wu - Intelligent Medical Objects
コース - Apache Spark in the Cloud
Get to learn spark streaming , databricks and aws redshift