1
Introduction to AI Agents
AI agents are changing the way we work. Software has typically created deterministic (if X, then Y) and rigid systems that cannot address ambiguity or adapt to different goals - but this is changing. With the advancements of large language models (LLMs), intelligent systems are being created that can independently reason through steps and take actions to complete a goal. These AI agents are taking a larger share of work previously thought only a human could do, and it's just beginning.
By the end of this book, you will become a master at creating AI agents through OpenAI Agents SDK. The best way to learn this is to get your hands dirty and start building AI agent systems using that framework. Before we do this, however, we need to start at the most basic level, which is answering the question, "What is an AI agent?".
This chapter goes through everything you need to know to answer that question and, more importantly, lays the foundation we'll build in the rest of the book. We will explain exactly what an AI agent is and how it differs from traditional systems. This is important as many readers often confuse AI agents with sophisticated applications, such as chatbots or fraud detection systems. It's important to understand how AI agent systems work before we start building them. We will explore AI agents' practical applications beyond productivity. Finally, we will go through the different design patterns and frameworks available when building an AI agent, and understand why OpenAI Agents SDK is the pragmatic choice for most production systems.
Here is what we will cover in this first chapter:
- Overview of the AI agent system and its strengths and weaknesses compared to more traditional systems
- Practical applications of AI agents
- How AI agents are built, by understanding their anatomy and different design/framework patterns used to build them
By the end of this opening chapter, we will have a strong mental blueprint for how every real-world AI agent is assembled, which will serve as our compass for when we start building our own.
Technical requirements
This chapter will be an overview of AI agents from a theoretical point of view to set a good foundation before we start building them. As a result, we will not be writing any code or developing any applications in this chapter. However, to follow along and complete the exercises and projects discussed throughout the rest of the book, make sure you have the following set up in your development environment:
- Operating system: Windows 10/11, macOS, or Linux-based distribution (Ubuntu recommended).
- Python version: Python 3.8 or later. You can verify your Python version by running
python --version in your terminal or Command Prompt. - OpenAI account: Sign up at https://platform.openai.com/signup.
- OpenAI API key: Obtained by creating an account with OpenAI. You will require this to utilize OpenAI Agents SDK.
- Code editor: VS Code, PyCharm, or any IDE/editor you prefer.
Throughout this book, practical examples and the complete code from each chapter will be made available via the accompanying GitHub repository at https://github.com/PacktPublishing/Building-Agents-with-OpenAI-Agents-SDK.
You are encouraged to clone the repository, reuse and adapt the provided code samples, and refer to it as needed while progressing through the chapters.
Overview of AI agents
Before exploring AI agents in depth, we must first establish an intuitive understanding of what an AI agent actually is, how it fundamentally differs from traditional software, and what advantages and disadvantages this brings. This is difficult as there are varying definitions that often evolve with technological advancements. By clearly defining the key concepts upfront - including its benefits such as intelligent autonomy, reasoning abilities, and adaptive problem-solving - we can set the stage for understanding its practical applications and building approaches.
What is an AI agent?
An AI agent is an intelligent system that can operate independently to accomplish a specific goal by perceiving the world around it and taking action. Key distinguishing features of an AI agent include its ability to think and reason from a broad and sometimes ambiguous goal, its ability to create a plan to accomplish that goal, and its ability to autonomously complete that goal using a set of tools at its disposal that interact with the world.
This is in direct contrast to other conventional software systems that are deterministic (i.e., they follow a strict set of instructions based on a predefined plan) and cannot reason if situations outside of that plan are encountered. AI agents, on the other hand, can observe their environment, reason about what needs to be done, and act upon it in a continuous manner.
AI agents achieve this by combining the intelligence and reasoning abilities of LLMs with actions through standardized API calls. Let's explore the concepts and strengths of AI agents through a simple analogy to cement our understanding and differentiate it from classical software automation frameworks.
Understanding AI agents with a simple analogy
Imagine you are the head chef of a five-star restaurant, and you need to train two junior chefs, Carlos and Adam. Carlos is like a conventional automation software system or model, whereas Adam is like an AI agent. The way you would train these two chefs and the way that these two chefs operate are completely different.
Carlos requires you to teach him exactly what to do to prepare every dish. If you're teaching him to make an omelet, you must teach him how to open the fridge, take an egg, turn on the stove, pour some oil, crack the egg, and so on. Each step must be meticulously defined and shown to Carlos. When asked to make an omelet, Carlos performs the task exactly as-is, to perfection.
Adam works a different way, more like a human. Instead of giving him predefined steps, you show him how to perform actions around the kitchen - this is how you grab ingredients from the fridge, this is how you operate a stove, these are the basics of gastronomy, and so on. When asked to make an omelet, Adam relies on his reasoning ability and the set of tools/knowledge he's been given to accomplish that task, rather than following predefined steps.
Both Carlos and Adam are amazing chefs but have different strengths and weaknesses. In particular, Adam can embrace complexity and ambiguity. Because he can reason and is taught how to perform general actions, he can cook more than just an omelet - he can theoretically cook all kinds of foods as they all use the same actions.
This acts as the perfect analogy between AI agents and classical automation software/models. In short, the intelligent autonomy afforded to an AI agent enables it to perform a diverse set of ambiguous tasks that just cannot be replicated.
Note
It's important to mention that intelligent autonomy comes with the need for safeguards. An autonomous agent might make a poor decision if its "brain" (the AI model) is misinformed. We will later discuss how to guide and constrain agents (through prompt instructions and guardrails) to ensure their autonomy is exercised responsibly. The key takeaway here is that AI agents bring a level of smart, goal-directed independence that sets them apart from traditional automated systems.
Strengths and weaknesses of AI agents versus traditional systems
The preceding analogy describes the key differences and advantages that AI agents have over other systems in addition to their ability to embrace complexity. Adam has goal-directed autonomy, which enables him to cook more than just an omelet; he can make scrambled eggs, poached eggs, and even sunny-side-up eggs. In fact, Adam can create new/novel creations that he has not been explicitly trained on as long as his set of actions is sufficient to perform that task. Adam can also complete tasks in another order if appropriate.
Adam exhibits reasoning, which means he can perform adaptive problem-solving, which enables him to do the following, which would be impossible for Carlos:
- Vary his cooking style to meet customer requests - Adam can cook an omelet more or less runny because he knows that leaving food on the stovetop for longer will make them more dry.
- If there is an ingredient missing, Adam can compromise and see whether there are any substitutions that he can make. He can handle real-world ambiguity and thrive on it.
Carlos would find these tasks impossible as he has been taught and can only cook one single way and cannot reason otherwise. If there are any externalities that prevent him from opening the fridge or turning on the stove, Carlos cannot proceed and stalls, whereas Adam could adapt.
There are, however, weaknesses with the AI agent model that, for certain use cases, may be so large and impactful that they are not the best options. Adam's brain...