Oreilly – Data Analysis with Python and PySpark 2022-3
Oreilly – Data Analysis with Python and PySpark 2022-3

Data Analysis with Python and PySpark Course. This course introduces you to the exciting world of big data analysis. Using PySpark, the most powerful big data processing engine, and Python, a popular and versatile language, you will be able to perform complex analyses on big data and extract valuable results.
What you will learn:
- Big Data Management: Learn efficient methods for managing and organizing data that is distributed across multiple machines.
- Scalability of data analysis applications: Ensuring the correct and efficient execution of data analysis applications on very large data sets.
- Reading and Writing Data: Mastering methods for reading and writing data from various sources and formats.
- Irregular Data Processing: Addressing the challenges of irregular data and preparing it for analysis.
- Data mining: Discover new patterns and insights in data using data mining techniques.
- Building automated pipelines: Creating automated processes to transform, summarize, and extract insights from data.
- Troubleshooting Common Errors in PieSpark: Troubleshooting and fixing common problems that may occur when working with PieSpark.
- Create stable, long-lasting tasks: Build tasks that run consistently and reliably.
This course is suitable for people who:
- They are familiar with the Python programming language.
- Are interested in data analysis and machine learning.
- They intend to increase their ability to process large amounts of data.
- They are looking for powerful tools to carry out data analysis projects.
Data Analysis with Python and PySpark course details
- Publisher: Oreilly
- Instructor: Jonathan Rioux
- Training level: Beginner to advanced
- Training duration: 10 hours and 31 minutes
Course headings
- Chapter 1. Introduction
- Part 1. Get acquainted: First steps in PySpark
- Chapter 2. Your first data program in PySpark
- Chapter 3. Submitting and scaling your first PySpark program
- Chapter 4. Analyzing tabular data with pyspark.sql
- Chapter 5. Data frame gymnastics: Joining and grouping
- Part 2. Get proficient: Translate your ideas into code
- Chapter 6. Multidimensional data frames: Using PySpark with JSON data
- Chapter 7. Bilingual PySpark: Blending Python and SQL code
- Chapter 8. Extending PySpark with Python: RDDs and UDFs
- Chapter 9. Big data is just a lot of small data: Using pandas UDFs
- Chapter 10. Your data under a different lens: Window functions
- Chapter 11. Faster PySpark: Understanding Spark’s query planning
- Part 3. Get confident: Using machine learning with PySpark
- Chapter 12. Setting the stage: Preparing features for machine learning
- Chapter 13. Robust machine learning with ML Pipelines
- Chapter 14. Building custom ML transformers and estimators
- Appendix C. Some useful Python concepts
Course images
Sample course video
Installation Guide
After Extract, view with your favorite player.
Subtitles: None
Quality: 720p
Download link
File(s) password: www.downloadly.ir
File size
1.5 GB