دوره آموزش Applied Big Data Fundamentals

	عنوان دوره	طول دوره	زمان برگزاری	تاریخ شروع دوره	شهریه	استاد	نوع برگزاری	وضعیت ثبت نام	ثبت نام	فیلم جلسه اول
	دوره آموزش Applied Big Data Fundamentals	15 جلسه 45 ساعت	دوشنبه از ساعت 17:30 الی 20:30 چهارشنبه از ساعت 17:30 الی 20:30	پنجشنبه 05 شهریور 1405	15,000,000 تومان	مهندس حسن احمدخانی	آنلاین	✓ در حال ثبت نام		دانلود

سرفصل و محتوای دوره مفاهیم بنیادی و کاربردی در تحلیل کلان داده با روش های مدرن

Distributed ingestion, storage and processing with modern data stack

معرفی و هدف دوره :
در دوره آموزشی مفاهیم بنیادی پردازش و مدیریت کلان داده ها، ابزار ها و تکنیک های مدرن جمع آوری، آماده سازی، پاکسازی، تحلیل و مدیریت داده های کلان بررسی خواهند شد.
هدف این دوره آموزشی پرداختن به مباحث پایه ای در حوزه ابزارها و روش های توزیع شده جهت احراز نیازمندی های مشاغل Data Engineer، Data Scientist و Analytics Engineer و همچنین پوشش مطالب بنیادی کلان داده جهت احراز نیازمندی های لازم برای مشاغل Platform Engineer و Data Administration می باشد.
در پایان دوره انتظار میرود دانشجو علاوه بر شناخت کلان داده و چالش های آن، درک کاملی از تکنیک ها و ابزارهای اصلی و پایه ای جمع آوری، پردازش و نگهداری داده های کلان پیدا کرده و بتواند از ابزارهای مربوطه جهت انجام فعالیت های لازم در حوزه مهندسی، تحلیل و علم داده استفاده کند.
در این دوره یک پروژه در مقیاس نیازمندی ها و چالش های محیط های عملیاتی بر مبنای مباحث مطرح شده در سرفصل ها تعریف خواهد شد و اجرای این پروژه در فرآیند دوره خواهد بود.

:Course content at a glance
Big Data, distributed storage and distributed processing concepts – 1.5 hours
Data lake, Lakehouse and Streamhouse concepts and challenges – 1.5 hours
Distributed storage with Hadoop HDFS – 5 hours
Distributed storage with MinIO – 6 hours
Distributed storage and processing with ClickHouse – 7 hours
Data ingestion with Apache NiFi – 5 hours
Distributed processing on data lake and lakehouse with Trino - 5 hours
Distributed processing on data lake and lakehouse with Spark and Spark SQL - 8 hours

Kafka and streaming fundamentals – 6 hours

طول دوره : 45 ساعت

مشاهده رزومه استاد

پیش نیاز دوره : تجربه کار با یک RDBMS - آشنایی با یک زبان برنامه نویسی - آشنایی اولیه با سیستم عامل لینوکس

:Course content, details

Big Data, distributed storage and distributed processing concepts – 1.5 hours

Big data definition

Distributed storage definition and challenges

Distributed processing definition and challenges

NoSQL data bases focus area

Big data platforms focus area

Big data in AI era

What is modern data stack?

How to modernize our data platform and keep it up to date?

big data processing at scale, lesson learned

Data lake, Lakehouse and Streamhouse concepts and challenges – 1.5 hours

Data lake definition and limitations

Data lakehouse definition and features

Streamhouse definition, features and challenges

Data pipelines and modern data pipeline features

Distributed storage with Hadoop HDFS – 5 hours

Apache Hadoop architecture and components

HDFS cluster deployment

Working with Hadoop HDFS

Small files issue in HDFS

HDFS erasure coding

HDFS balancing, HDFS federation, quota, rack awareness, …

HDFS tip and tricks

Distributed storage with MinIO – 6 hours

Object storage vs block storage vs file storage

Object storage platforms use cases

AWS S3 compatible object storages

MinIO architecture and components

MinIO vs S3 vs Apache Ozone vs Ceph

MinIO cluster deployment

Working with MinIO

MinIO Command Line

Lifecycle Management

Object Retention

Replication and EC

Buckets and bucket notifications

Users and quotas

Distributed storage and processing with ClickHouse – 7 hours

Distributed OLAP engines

Pinot vs Druid vs ClickHouse

ClickHouse cluster components

ClickHouse cluster deployment

Data modeling: features and limitations

Working with ClickHouse and CH SQL review

Data types

Sharding and replication strategy in ClickHouse

Table engines

User management and roles

Partitioning and ordering tip and tricks

ClickHouse best practices

Data ingestion and integration with Apache NiFi – 5 hours

NiFi use cases and data ingestion with NiFi

Processors and data flow design: multiple scenario and data flow

Data ingestion into HDFS, MinIO, ClickHouse, RDBMS and NoSQL DBs

Processor groups, remote processor groups and ports

Custom processors

Ingestion into lakehouse with NiFi

NiFi registry

NiFi best practices

Distributed processing on data lake and lakehouse with Trino - 5 hours

Trino architecture

Apache Hive meta store and other meta store services for Trino

Trino functions and SQL

HDFS, MinIO, ClickHouse, RDBMS, Hive and Iceberg catalogs

Trino user defined functions

Deployment best practices

Administration tips and tricks

Distributed processing on data lake and lakehouse with Spark and Spark SQL - 8 hours

Apache Spark architecture

Spark job deployment in YARN and K8s

Working with Spark data structures

Reading from and writing data into HDFS, Clickhouse, MinIO, RDBMS, File, …

Implementation of multiple scenario and developing batch jobs with Spark

Spark user defined functions

Spark execution plan

Spark job and tasks monitoring

Spark external shuffle service

Spark parameter tuning

Kafka and streaming fundamentals – 6 hours

Apache Kafka architecture

Cluster components and deployment

Working with topic, partition, replication, producer, consumer and consumer group, …

Message structure and message formats in Kafka

Message ordering

Kafka connect

Kafka monitoring

Kafka storage internals

Kafka integration with HDFS, ClickHouse, MinIO, NiFi and Spark

دسترسی سریع

تماس با فراتر از دانش

آدرس: تهران - کارگر شمالی - بالاتر از چهار راه فاطمی (ایستگاه مترو کارگر) - کوچه دیدگاه - پلاک 26 - طبقه 3 شرکت فراتر از دانش
تلفن : 02188989781 - 02188989782
واتس اپ : 09396839678
فکس : 02188989780
- ایمیل: info@fad.ir

طراحی سایت سپیدآریا