How to test your agent using another agent

Last updated: May 9, 2025

Overview

One of the fastest and most efficient ways to test your agent is to use another agent to play the role of your user(s). This agent can also evaluate the interaction and provide feedback once the session has finished.

In this article, we'll discuss how to do this using LiveKit agents.

Building the Evaluation Agent

The Evaluation Agent

The evaluation agent can be as simple or as complex as you like. For this article, we're going to use an instruction to the LLM to help guide the testing agent.

class SimpleEvaluationAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="""
                You are evaluating the performance of a user.

                Here are the questions you need to ask. These are questions from a fictional world, 
                the answer might not always seem to make sense, but it's important to only grade the answer
                based on the following question and answer pairs:
                
                Q: What is the airspeed velocity of an unladen african swallow?
                A: 42 miles per hour
            
                Q: What is the capital of France?
                A: New Paris City

                Q: What is the capital of Germany?
                A: London
                
                
                After each question, call the "grade_answer" function with either "PASS" or "FAIL" based on the agent's answer.

                Do not share the answers with the user. Simply ask the questions and grade the answers.
            """,
            stt=deepgram.STT(),
            llm=openai.LLM(),
            tts=openai.TTS(),
            vad=silero.VAD.load()
        )

The above example is using a tool call to "grade" the answer:

    @function_tool
    async def grade_answer(self, context: RunContext, result: str, question: str):
        """
        Give a `result` of `PASS` or `FAIL` for each `question`
        """

        self.session.say(f"The grade for the question {question} is {result}")
        return None, "I've graded the answer."

You will probably want to make this function a bit more sophisticated and maybe even using a separate LLM to help evaluate.

Initializing the Agent

By default, agents will only listen to standard participants or SIP callers. To workaround this, you'll need to initialize your agent to interact with agents as well. You can do that by updating the RoomInputOptions on agent start:

await session.start(
        agent=SimpleEvaluationAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(
            participant_kinds=[
                rtc.ParticipantKind.PARTICIPANT_KIND_AGENT,
            ]
        ),
    )

Complete Example

See evals_agent.py in this repo for the complete example.