Oreilly – Spark, Ray, and Python for Scalable Data Science 2021-6

Oreilly – Spark, Ray, and Python for Scalable Data Science 2021-6

Oreilly – Spark, Ray, and Python for Scalable Data Science 2021-6
Oreilly – Spark, Ray, and Python for Scalable Data Science 2021-6

Spark, Ray, and Python for Scalable Data Science. This course teaches you how to implement data science and machine learning projects at scale using Spark, Ray, and Python. Given the growing need for big data analytics, this course provides basic concepts and hands-on exercises to master distributed computing tools. First, you will be introduced to data science workflows and the necessity of using frameworks like Spark and Ray. Then, the basics of distributed systems and how these tools work are examined. Next, Spark is taught in a practical way with examples such as the RDD data structure and MapReduce operations. It also shows the application of Spark in natural language processing (NLP) and exploratory data analysis (EDA) at scale. In the section on Ray, its differences and similarities with Spark are compared and methods for distributing functions and objects using Ray are explained. Finally, the application of Ray to scaling machine learning model training, hyperparameter tuning, and deploying models in an operational environment is explored. Combining theory and practice, this course provides the skills needed to manage large-scale data projects.

What you will learn:

  • Integrating Python and Distributed Computing
  • Scaling data processing with Spark
  • Performing exploratory data analysis with PySpark
  • Using parallel computing with Ray
  • Scaling Machine Learning and AI Applications with Ray

Who is this course suitable for?

  • This course is suitable for anyone who needs to improve their fundamental understanding of scalable, integrated data processing with Python for use in machine learning or artificial intelligence applications.

Course details

  • Publisher: Oreilly
  • Instructor: Jonathan Dinu
  • Training level: Beginner to advanced
  • Training duration: 7 hours and 1 minute

Course topics

  • Introduction
  • Spark, Ray, and Python for Scalable Data Science: Introduction
  • Lesson 1: Introduction to Distributed Computing in Python
  • Topics
    1.1 Introduction and Materials
    1.2 The Data Science Process
    1.3 A Brief Historical Diversion
    1.4 Distributed Systems Primer
    1.5 Python Distributed Computing Frameworks
    1.6 The What and Why of Spark
    1.7 The Spark Platform
    1.8 Spark versus Ray
  • Lesson 2: Scaling Data Processing with Spark
  • Topics
    2.1 Course Coding Setup
    2.2 Your First PySpark Job
    2.3 Introduction to RDDs
    2.4 Transformations versus Actions
    2.5 RDD Deep Dive
    2.6 The Spark Execution Context
    2.7 Spark versus Hadoop
    2.8 Spark Application Lifecycle
  • Lesson 3: Exploratory Data Analysis with PySpark
  • Topics
    3.1 Introduction to Exploratory Data Analysis
    3.2 A Quick Tour of Jupyter Notebooks
    3.3 Parsing Data at Scale
    3.4 Spark DataFrames: Integration into Existing Workflows
    3.5 Scaling Exploratory Data Analysis with Spark
    3.6 Making Sense of Data: Summary Statistics and Data Visualization
    3.7 Working with Text: Introduction to NLP
    3.8 Tokenization and Vectorization with MLlib
  • Lesson 4: Parallel Computing with Ray
  • Topics
    4.1 The What and Why of Ray
    4.2 The Ray Programming Model
    4.3 Parallelizing Functions with Ray Tasks
    4.4 Asynchronous Programming with Actors
    4.5 Cellular Automata and the Game of Life
    4.6 Distributed Agent-Based Models with Ray
  • Lesson 5: Scaling AI Applications with Ray
  • Topics
    5.1 Introduction to Model Evaluation
    5.2 Serializing Data for Machine Learning Applications
    5.3 Cross Validation with scikit-learn
    5.4 Strategies for Tuning Machine Learning Models
    5.5 Grid Search in Python
    5.6 Distributed Hyperparameter Optimization with Ray Tune
    5.7 Resource Efficient Search with Principled Early Stopping
    5.8 Diving Deeper into Ray’s Internals
    5.9 Serving Machine Learning Models
    5.10 Deploying AI Applications with Ray Serve
    5.11 Monitoring Model Performance in Production
  • Summary
  • Spark, Ray, and Python for Scalable Data Science: Summary

Course prerequisites

  • A basic understanding of programming in Python (variables, basic control flow, simple scripts).
  • Familiarity with the vocabulary of data processing at scale, machine learning (dataset, training set, test set, model), and AI.

Spark, Ray, and Python for Scalable Data Science course images

Spark, Ray, and Python for Scalable Data Science

Sample course video

Installation Guide

After Extract, view with your favorite player.

Subtitles: None

Quality: 720p

Download link

Download Part 1 – 2 GB

Download Part 2 – 732 MB

File(s) password: www.downloadly.ir

File size

2.7 GB