Oreilly – Data Analysis with Python and PySpark 2022-3

Oreilly – Data Analysis with Python and PySpark 2022-3

Oreilly – Data Analysis with Python and PySpark 2022-3
Oreilly – Data Analysis with Python and PySpark 2022-3

Data Analysis with Python and PySpark Course. This course introduces you to the exciting world of big data analysis. Using PySpark, the most powerful big data processing engine, and Python, a popular and versatile language, you will be able to perform complex analyses on big data and extract valuable results.

What you will learn:

  • Big Data Management: Learn efficient methods for managing and organizing data that is distributed across multiple machines.
  • Scalability of data analysis applications: Ensuring the correct and efficient execution of data analysis applications on very large data sets.
  • Reading and Writing Data: Mastering methods for reading and writing data from various sources and formats.
  • Irregular Data Processing: Addressing the challenges of irregular data and preparing it for analysis.
  • Data mining: Discover new patterns and insights in data using data mining techniques.
  • Building automated pipelines: Creating automated processes to transform, summarize, and extract insights from data.
  • Troubleshooting Common Errors in PieSpark: Troubleshooting and fixing common problems that may occur when working with PieSpark.
  • Create stable, long-lasting tasks: Build tasks that run consistently and reliably.

This course is suitable for people who:

  • They are familiar with the Python programming language.
  • Are interested in data analysis and machine learning.
  • They intend to increase their ability to process large amounts of data.
  • They are looking for powerful tools to carry out data analysis projects.

Data Analysis with Python and PySpark course details

  • Publisher: Oreilly
  • Instructor: Jonathan Rioux
  • Training level: Beginner to advanced
  • Training duration: 10 hours and 31 minutes

Course headings

  • Chapter 1. Introduction
  • Part 1. Get acquainted: First steps in PySpark
  • Chapter 2. Your first data program in PySpark
  • Chapter 3. Submitting and scaling your first PySpark program
  • Chapter 4. Analyzing tabular data with pyspark.sql
  • Chapter 5. Data frame gymnastics: Joining and grouping
  • Part 2. Get proficient: Translate your ideas into code
  • Chapter 6. Multidimensional data frames: Using PySpark with JSON data
  • Chapter 7. Bilingual PySpark: Blending Python and SQL code
  • Chapter 8. Extending PySpark with Python: RDDs and UDFs
  • Chapter 9. Big data is just a lot of small data: Using pandas UDFs
  • Chapter 10. Your data under a different lens: Window functions
  • Chapter 11. Faster PySpark: Understanding Spark’s query planning
  • Part 3. Get confident: Using machine learning with PySpark
  • Chapter 12. Setting the stage: Preparing features for machine learning
  • Chapter 13. Robust machine learning with ML Pipelines
  • Chapter 14. Building custom ML transformers and estimators
  • Appendix C. Some useful Python concepts

Course images

Data Pipelines with Apache Airflow video edition

Sample course video

Installation Guide

After Extract, view with your favorite player.

Subtitles: None

Quality: 720p

Download link

Download Part 1 – 1 GB

Download Part 2 – 513 MB

File(s) password: www.downloadly.ir

File size

1.5 GB