Oreilly – Evaluating Large Language Models (LLMs) 2025-2
Oreilly – Evaluating Large Language Models (LLMs) 2025-2

Evaluating Large Language Models (LLMs) course. This course will introduce you to the various methods of evaluating Large Language Models (LLMs). Whether you are a data scientist, a machine learning engineer, or even an AI enthusiast, this course will help you gain a deep understanding of the methods of evaluating these models. From the basics of evaluation to practical applications, this course covers everything.
What you will learn:
- LLM Assessment Fundamentals: The importance of assessment, the difference between content production and comprehension tasks, and key criteria for assessing different tasks
- Evaluating content production tasks: evaluating open-ended responses, using language models to evaluate each other, etc.
- Evaluating comprehension tasks: Evaluating embedding, classification, and classifier construction tasks using BERT and GPT
- Effective use of assessment criteria: the role of criteria, a review of common criteria, and assessing LLMs using criteria
- Exploring the LLM Global Model: Exploring the knowledge stored in LLMs and using them to play games
- LLM fine-tuning assessment: fine-tuning objectives, success criteria and practical examples
- Case studies: Evaluation of artificial intelligence agents, content generation systems with information retrieval, recommendation engines, etc.
- The Future of LLM Assessment: Future Trends in LLM Assessment
This course is suitable for people who:
- Seeking a deep understanding of large language models
- They want to learn how to evaluate these models.
- They work in the field of machine learning, natural language processing, or artificial intelligence.
- They want to use large language models in various applications.
Evaluating Large Language Models (LLMs) Course Details
- Publisher: Oreilly
- Instructor: Sinan Ozdemir
- Training level: Beginner to advanced
- Training duration: 7 hours and 56 minutes
Course headings
- Introduction
- Evaluating Large Language Models (LLMs): Introduction
- Lesson 1: Foundations of LLM Evaluation
Learning objectives
1.1 Introduction to Evaluation: Why It Matters
1.2 Generative versus Understanding Tasks
1.3 Key Metrics for Common Tasks - Lesson 2: Evaluating Generative Tasks
Learning objectives
2.1 Evaluating Multiple-Choice Tasks
2.2 Evaluating Free Text Response Tasks
2.3 AIs Supervising AIs: LLM as a Judge - Lesson 3: Evaluating Understanding Tasks
Learning objectives
3.1 Evaluating Embedding Tasks
3.2 Evaluating Classification Tasks
3.3 Building an LLM Classifier with BERT and GPT - Lesson 4: Using Benchmarks Effectively
Learning objectives
4.1 The Role of Benchmarks
4.2 Interrogating Common Benchmarks
4.3 Evaluating LLMs with Benchmarks - Lesson 5: Probing LLMs for a World Model
Learning objectives
5.1 Probing LLMs for Knowledge
5.2 Probing LLMs to Play Games - Lesson 6: Evaluating LLM Fine-Tuning
Learning objectives
6.1 Fine-Tuning Objectives
6.2 Metrics for Fine-Tuning Success
6.3 Practical Demonstration: Evaluating Fine-Tuning
6.4 Evaluating and Cleaning Data - Lesson 7: Case Studies
Learning objectives
7.1 Evaluating AI Agents: Task Automation and Tool Integration
7.2 Measuring Retrieval-Augmented Generation (RAG) Systems
7.3 Building and Evaluating a Recommendation Engine Using LLMs
7.4 Using Evaluation to Combat AI Drift
7.5 Time-Series Regression - Lesson 8: Summary of Evaluation and Looking Ahead
Learning objectives
8.1 When and How to Evaluate
8.2 Looking Ahead: Trends in LLM Evaluation - Summary
- Evaluating Large Language Models (LLMs): Summary
Course images
Sample course video
Installation Guide
After Extract, view with your favorite player.
Subtitles: None
Quality: 720p
Download link
File(s) password: www.downloadly.ir
File size
1.8 GB