Skip to main content

Command Palette

Search for a command to run...

Building a Multi Agent System to Track Real Madrid Matches Using AWS Strands and Ollama

Published
5 min read
Building a Multi Agent System to Track Real Madrid Matches Using AWS Strands and Ollama

I like soccer and Real Madrid, but I’m not the diehard kind who can recite every fixture by heart. So naturally, I end up missing mid-week games and that usually end up to be the best matches. I wanted to solve for this…

Then at least I know which match is worth following, so even if I cannot watch it, I can still track the score online.

And I had three options…

🔺Google the fixtures manually (Zero points for style).

🔺Write a cron job to ping an API and send a dry text (Functional, but boring).

🔺Build a multi-agent system that finds the game, decides if the "hype factor" is worth it, and notifies me with a reason to watch.

Since we’re well past 2023, I chose the obvious one. 🤷🏽‍♂️

I spent the weekend building a local agentic system using AWS Strands and Ollama.

This project uses:

  • AWS Strands for agent orchestration

  • Ollama for running language models locally

  • A Football API for match data

  • Telegram API for notifications

What Is AWS Strands

AWS Strands is a lightweight agent orchestration framework that allows you to define agents, connect them to language models, and expose structured tools that those agents can call. Instead of writing manual glue code between LLM calls and functions, Strands lets you describe:

  • The agent’s role

  • The model it uses

  • The tools it can call

  • The rules it must follow

Strands handles the tool calling loop internally. When the model decides to call a tool, Strands executes the function, captures the output, and feeds it back into the model until the task completes.

In this project, Strands acts as the controller. It ensures the supervisor calls the analyzer first and the communication step second. It enforces order without requiring complex orchestration code.

What Is Ollama and Why Use It

Ollama is a local runtime that allows you to run open source language models on your own machine. It exposes a simple HTTP server, typically at:http://localhost:11434

Instead of sending prompts to a cloud provider, your application sends them to Ollama, which runs the model locally.

This gives you:

  • Full local control

  • No external inference costs

  • Faster iteration during development

  • No dependency on external model APIs

In this project, Ollama runs small models such as Gemma 2B to handle structured reasoning.

Installing Ollama on Windows

Step 1: Download

Go to the official website: https://ollama.com

Download the Windows installer.

Step 2: Install

Run the installer and follow the default setup. Ollama installs as a background service.

Step 3: Verify

Open PowerShell and run:

ollama --version

If installed correctly, it prints the version.

Pulling and Running a Model

To download a model such as Gemma 2B:

ollama pull gemma:2b

To see all installed models:

ollama list

To run the model interactively:

ollama run gemma:2b

To call the model from your application, send an HTTP request to:

http://localhost:11434/api/generate

Example:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt": "What can you do?"
}'

Your Strands agent internally calls this endpoint when configured with OllamaModel.

High Level Design - Multi agent workflow

The system follows a simple multi step flow:

  1. A supervisor agent receives a task.

  2. An analyzer component fetches match data from a Football API.

  3. The analyzer evaluates whether the match is worth watching.

  4. If the match crosses a defined threshold, a communication component sends a message using the Telegram API.

There are no complex distributed services. Everything runs locally except the external APIs.

The intelligence sits in the analysis step. The supervisor enforces order. The communication layer focuses only on delivery.

Understanding the Sequence Diagram

The sequence diagram above shows key components of the system and how the query flows. The flow is linear and disciplined.

Step 1: Supervisor Initiates the Task

The Supervisor starts the process by asking the Analyzer to check whether there is a Real Madrid match today. The Supervisor does not fetch data itself. It delegates.

This separation ensures orchestration logic stays clean.

Step 2: Analyzer Calls the Football API

The Analyzer sends a request to the Football API to retrieve match data. This includes opponent, competition, timing, and other relevant details.

The Football API responds with structured match data.

Step 3: Analyzer Performs Reasoning

Once the Analyzer receives match data, it evaluates whether there is a compelling reason to watch the match. This step uses a language model running locally through Ollama.

The reasoning can include:

  • Competition type

  • Rivalry level

  • Importance of the fixture

  • Context around standings

If the match meets the criteria, the Analyzer returns a formatted result to the Supervisor.

If not, it simply returns nothing significant.

Step 4: Supervisor Triggers Notification

If the Analyzer returns a positive result, the Supervis calls the Comms component.

Step 5: Comms Calls Telegram API

The Comms component sends the final message to the Telegram API, which posts the notification.

Why the System Struggled on My Laptop

The architecture was simple but the limitation was hardware.

My laptop runs on an older Intel i5 processor with only 2GB of VRAM. Even small models require consistent memory allocation. When GPU memory is insufficient, the system offloads computation to system RAM, which increases latency and reduces stability.

What I observed:

  • Slow inference times

  • Occasional freezing

  • Inconsistent reasoning output

  • Reduced reliability when chaining multiple steps

Multi step workflows amplify instability because each step depends on the previous output. When inference becomes slow or inconsistent, the whole pipeline suffers.

Building locally forced me to understand how models are loaded, served, and executed. That insight is difficult to gain when everything runs behind a managed API.