Oreilly – Evaluating Large Language Models (LLMs) 2025-2

Oreilly – Evaluating Large Language Models (LLMs) 2025-2

Oreilly – Evaluating Large Language Models (LLMs) 2025-2
Oreilly – Evaluating Large Language Models (LLMs) 2025-2

Evaluating Large Language Models (LLMs) course. This course will introduce you to the various methods of evaluating Large Language Models (LLMs). Whether you are a data scientist, a machine learning engineer, or even an AI enthusiast, this course will help you gain a deep understanding of the methods of evaluating these models. From the basics of evaluation to practical applications, this course covers everything.

What you will learn:

  • LLM Assessment Fundamentals: The importance of assessment, the difference between content production and comprehension tasks, and key criteria for assessing different tasks
  • Evaluating content production tasks: evaluating open-ended responses, using language models to evaluate each other, etc.
  • Evaluating comprehension tasks: Evaluating embedding, classification, and classifier construction tasks using BERT and GPT
  • Effective use of assessment criteria: the role of criteria, a review of common criteria, and assessing LLMs using criteria
  • Exploring the LLM Global Model: Exploring the knowledge stored in LLMs and using them to play games
  • LLM fine-tuning assessment: fine-tuning objectives, success criteria and practical examples
  • Case studies: Evaluation of artificial intelligence agents, content generation systems with information retrieval, recommendation engines, etc.
  • The Future of LLM Assessment: Future Trends in LLM Assessment

This course is suitable for people who:

  • Seeking a deep understanding of large language models
  • They want to learn how to evaluate these models.
  • They work in the field of machine learning, natural language processing, or artificial intelligence.
  • They want to use large language models in various applications.

Evaluating Large Language Models (LLMs) Course Details

  • Publisher: Oreilly
  • Instructor: Sinan Ozdemir
  • Training level: Beginner to advanced
  • Training duration: 7 hours and 56 minutes

Course headings

  • Introduction
  • Evaluating Large Language Models (LLMs): Introduction
  • Lesson 1: Foundations of LLM Evaluation
    Learning objectives
    1.1 Introduction to Evaluation: Why It Matters
    1.2 Generative versus Understanding Tasks
    1.3 Key Metrics for Common Tasks
  • Lesson 2: Evaluating Generative Tasks
    Learning objectives
    2.1 Evaluating Multiple-Choice Tasks
    2.2 Evaluating Free Text Response Tasks
    2.3 AIs Supervising AIs: LLM as a Judge
  • Lesson 3: Evaluating Understanding Tasks
    Learning objectives
    3.1 Evaluating Embedding Tasks
    3.2 Evaluating Classification Tasks
    3.3 Building an LLM Classifier with BERT and GPT
  • Lesson 4: Using Benchmarks Effectively
    Learning objectives
    4.1 The Role of Benchmarks
    4.2 Interrogating Common Benchmarks
    4.3 Evaluating LLMs with Benchmarks
  • Lesson 5: Probing LLMs for a World Model
    Learning objectives
    5.1 Probing LLMs for Knowledge
    5.2 Probing LLMs to Play Games
  • Lesson 6: Evaluating LLM Fine-Tuning
    Learning objectives
    6.1 Fine-Tuning Objectives
    6.2 Metrics for Fine-Tuning Success
    6.3 Practical Demonstration: Evaluating Fine-Tuning
    6.4 Evaluating and Cleaning Data
  • Lesson 7: Case Studies
    Learning objectives
    7.1 Evaluating AI Agents: Task Automation and Tool Integration
    7.2 Measuring Retrieval-Augmented Generation (RAG) Systems
    7.3 Building and Evaluating a Recommendation Engine Using LLMs
    7.4 Using Evaluation to Combat AI Drift
    7.5 Time-Series Regression
  • Lesson 8: Summary of Evaluation and Looking Ahead
    Learning objectives
    8.1 When and How to Evaluate
    8.2 Looking Ahead: Trends in LLM Evaluation
  • Summary
  • Evaluating Large Language Models (LLMs): Summary

Course images

Evaluating Large Language Models (LLMs)

Sample course video

Installation Guide

After Extract, view with your favorite player.

Subtitles: None

Quality: 720p

Download link

Download Part 1 – 1 GB

Download Part 2 – 831 MB

File(s) password: www.downloadly.ir

File size

1.8 GB