عنوان دوره | طول دوره | زمان برگزاری | تاریخ شروع دوره | شهریه | استاد | نوع برگزاری | وضعیت ثبت نام | ثبت نام | فیلم جلسه اول | |
---|---|---|---|---|---|---|---|---|---|---|
Advanced Big Data Analytics | 15 جلسه 45 ساعت |
دوشنبه
از
ساعت 18:00
الی 21:00
چهارشنبه از ساعت 18:00 الی 21:00 |
دوشنبه ۱۶ مهر ۱۴۰۳ | 7,500,000 تومان | مهندس حسن احمدخانی | آنلاین | - |
سرفصل و محتوای دوره ی مباحث پیشرفته در تحلیل کلان داده با معماری مدرن
Advanced
big data analytics with modern architecture
معرفی و هدف
دوره :
در
دوره آموزشی مباحث پیشرفته در تحلیل داده های کلان، ابزار ها و تکنیک های مهم در آماده
سازی، پاکسازی، تحلیل و مدیریت داده های کلان بررسی خواهند شد.
هدف
این دوره آموزشی بررسی مباحث پیشرفته در حوزه تحلیل big data جهت احراز نیازمندی
های مشاغل
-
Data
Engineer
-
Data
Scientist
-
Analytics
Engineer
می باشد.
Course content at a glance
Kafka and streaming
advances – 6 hours
Real-time stream
processing with Apache Flink – 15 hours
Data transformation with
data build tool(dbt) – 3 hours
Advanced topics in data
modeling – 3 hours
Machine learning and
advanced analytics with Spark ML – 6 hours
Data governance and
quality in practice – 3 hours
Advanced analytics in public cloud platforms (AWS and CDP) - 9 hours
طول
دوره : 45 ساعت
پیش
نیاز دوره :
دوره آموزشی Applied Big Data Fundamentals یا دو سال سابقه کاری در زمینه بیگ دیتا و آشنایی با یک زبان برنامه نویسی
Course content, details
Kafka and streaming advances – 6 hours
Real-time data pipeline
development with kafka connect
Stream processing with
ksqlDB
ksqlDB features,
limitations and best practices
Transactional producers
Kafka tired storage
Dead letter queues and
exception handling
Real-time stream processing with Apache Flink – 15 hours
Stream processing with
industry gold standards
Apache Flink overview
Development environment
setup
Building and deploying
the project
DataStream API
Table and SQL API
Flink SQL
Data sources and sinks
Flink kafka source and
sink
Flink Clickhouse sink
Flink MongoDB sink
Flink ScyllaDB sink
Flink Iceberg sink
Stateless operations
Stateful operations
Checkpointing and
exactly-once delivery
Handling of time
State recovery
Data transformation with data build tool(dbt) – 3 hours
dbt essentials and
features
dbt resources and
project structure
Building transformation
DAGs
dbt Spark adapter
dbt Trino adapter
dbt Flink adapter
Advanced topics in data modeling – 3 hours
Dimensional modeling
challenges in big data ecosystem
Layered data lakehouse
and data modeling tip and tricks
Data vault 2.0
methodology and overview
DV2 logical data flow
Business vault
Information marts
Record source tracking
Point-in-time and bridge
tables
Column comparison and
hash differences
Handling null business
keys
Staging loads
DV2.0 loading templates
Real-time loading
Machine learning and advanced analytics with Spark ML – 6 hours
Machine
learning and ML in distributed manner
ML
algorithms in Spark ML
Spark
ML pipelines
Case
studies and examples for clustering, classification and recommendation
Deep
learning with Spark
RAG
and LLM essentials
Vector
DB
Spark
NLP and LLM
Model
deployment
Data governance and quality in practice – 3 hours
Data quality management
essentials
Data completeness,
accuracy, consistency, validity, uniqueness in practice
Why data governance
matters
DAMA DMBOK and data
governance knowledge areas
GDPR and CCPA in
practice
Advanced analytics in public cloud platforms (AWS and CDP) - 9
hours
AWS
features and options for big data storage and processing
AWS
analytics pillars review
Data
visualization in AWS
Implementation
of real-time, ML and advanced analytics scenarios in AWS
Cloudera
Data Platform and CDP data services
CDP
analytics pillars review
Data
visualization in CDP
Implementation
of real-time, ML and advance analytics scenarios in CDP