Cerulean Cloud Blog

Optimizing AWS Costs: NAT Gateways, S3 Storage Classes, and EBS Lifecycle Management

Cerulean Cloud — Tue, 31 Mar 2026 14:30:00 GMT

Most AWS environments carry somewhere between 20% and 35% in avoidable spend. This is not a controversial claim. Industry reports from Flexera and Gartner have consistently placed cloud waste in that range, and the pattern holds regardless of whether the organization is a startup running a handful of services or an enterprise managing hundreds of accounts. The waste rarely comes from one large, obvious mistake. It accumulates through a series of small architectural decisions that made sense at the time but were never revisited as workloads evolved. What makes AWS cost waste particularly difficult to catch is that the billing model is designed around granularity. You are charged per hour, per gigabyte, per request, and per data transfer path. Each individual charge looks reasonable in isolation. It is only when you trace the full path of a request through your infrastructure, accounting for every service it touches along the way, that the compounding effect becomes visible. This post examines three of the most common cost leaks we see in AWS environments. Rather than listing surface-level tips, we will walk through the mechanics of each one: why it happens, how to identify it in your own account, and how to address it with specific AWS tools and configuration changes.

NAT Gateway data processing charges

NAT Gateways are one of the most commonly misunderstood cost centers in AWS networking. When you provision a NAT Gateway, AWS charges you in two dimensions simultaneously: a flat hourly rate of $0.045 per hour (in us-east-1), and a per-gigabyte data processing charge of $0.045 for every gigabyte that flows through the gateway in either direction. These charges apply on top of any standard data transfer fees that AWS levies for cross-AZ or internet-bound traffic. The hourly charge alone works out to roughly $32.40 per month for a single NAT Gateway running continuously. Since AWS recommends deploying one NAT Gateway per Availability Zone for high availability, a standard three-AZ production architecture carries a baseline cost of approximately $97 per month before a single byte of data is processed. This is the cost of the NAT Gateways simply existing. The data processing charge is where the bill compounds.

Consider a common scenario: your application runs in private subnets and makes regular API calls to AWS services like S3, DynamoDB, SQS, or CloudWatch. By default, all of this traffic routes through the NAT Gateway, and every gigabyte is charged at $0.045. A workload pulling 500 GB of data from S3 per month through a NAT Gateway incurs $22.50 in processing charges alone, for traffic that could flow entirely for free through a properly configured VPC endpoint. The compounding gets worse when you factor in container workloads. ECS and EKS tasks running in private subnets pull container images from Amazon ECR through the NAT Gateway. A 500 MB container image pulled 100 times per month represents 50 GB of NAT Gateway traffic, adding $2.25 per month per image. Across a fleet of microservices with frequent deployments, this accumulates into a meaningful line item.

How to identify it: Open AWS Cost Explorer and filter by service for "VPC" or "EC2-Other." Look for the line item labeled "NatGateway-Bytes" under usage type. If you are processing more than a few gigabytes per month, you are likely paying for traffic that could be routed more efficiently. You can also enable VPC Flow Logs and analyze them to understand which services and endpoints are generating the most NAT Gateway traffic.

How to fix it: The most impactful change is deploying VPC Gateway Endpoints for S3 and DynamoDB. Gateway Endpoints are completely free. There is no hourly charge and no data processing charge. Traffic routes over AWS's private network backbone instead of traversing the NAT Gateway. The setup takes minutes: you create the endpoint in your VPC, associate it with the relevant route tables, and the traffic is redirected automatically. No application code changes are required.

For other AWS services like SQS, SNS, CloudWatch, ECR, and Secrets Manager, you can deploy VPC Interface Endpoints (powered by AWS PrivateLink). These do carry a cost of $0.01 per hour plus $0.01 per GB of data processed, but this is still significantly cheaper than the $0.045 per GB you would pay through the NAT Gateway. The cost difference becomes substantial at any meaningful traffic volume. For ECR specifically, deploying an Interface Endpoint also eliminates the NAT Gateway charges incurred during container image pulls, which can represent a surprisingly large portion of total NAT traffic in containerized environments. It is worth noting that deploying these endpoints is not an all-or-nothing decision. You can start by deploying the free S3 and DynamoDB Gateway Endpoints, monitor the impact on your NAT Gateway data processing charges for a billing cycle, and then evaluate whether Interface Endpoints for other services are justified based on your traffic patterns.

S3 storage class mismatches

Amazon S3 pricing is structured around storage classes, each designed for a different access pattern. S3 Standard, the default class, costs $0.023 per GB per month in us-east-1. This is appropriate for data that is accessed frequently and requires low-latency retrieval. The problem is that most teams store everything in S3 Standard regardless of how often the data is actually accessed, and they rarely revisit this decision as data accumulates over time. The cost difference between storage classes is significant. S3 Standard-Infrequent Access (Standard-IA) costs $0.0125 per GB per month, roughly half the price of Standard. S3 Glacier Instant Retrieval, which still provides millisecond-level access latency, costs $0.004 per GB per month, which is about 83% less than Standard. S3 Glacier Deep Archive, designed for compliance and regulatory data that is accessed very rarely, costs just $0.00099 per GB per month.

To put these numbers in practical terms: an organization storing 10 TB of data entirely in S3 Standard pays approximately $235 per month in storage costs. If 70% of that data is archival (log files, old backups, historical exports, compliance records) and could be moved to Glacier Instant Retrieval, the storage cost for that 7 TB drops from $164 to $28 per month. That is a saving of $136 per month, or over $1,600 per year, for a single storage optimization on a relatively modest data footprint. As data volumes grow into the tens or hundreds of terabytes, these savings scale proportionally. The reason this waste persists is partly behavioral. Teams create S3 buckets, upload data, and move on. There is no built-in mechanism that alerts you when the majority of objects in a bucket have not been accessed in months. The data just sits there, billed at the Standard rate, growing quietly with every new upload.

How to identify it: AWS provides two tools that make this analysis straightforward. The first is S3 Storage Lens, which gives you an account-wide or organization-wide view of your S3 usage broken down by storage class, bucket, region, and access patterns. It will show you exactly how much data sits in each storage class and highlight buckets where a large percentage of objects have not been accessed recently. The second tool is S3 Storage Class Analysis, which you can enable on individual buckets. It monitors object-level access patterns over a 30-day period and generates recommendations for which objects would benefit from transitioning to a lower-cost storage class.

How to fix it: The simplest and most broadly applicable fix is enabling S3 Intelligent-Tiering on buckets where access patterns are unpredictable or mixed. Intelligent-Tiering automatically monitors each object's access frequency and moves it between a frequent access tier and an infrequent access tier after 30 days of no access. If the object is not accessed for 90 days, it can optionally be moved to an archive access tier, and after 180 days, to a deep archive tier.

The key advantage of Intelligent-Tiering is that there are no retrieval fees when objects move back to the frequent access tier, so you do not pay a penalty if an archived object is suddenly needed. The trade-off is a small monthly monitoring fee of $0.0025 per 1,000 objects, which is negligible for most workloads but can add up if you have millions of very small objects (under 128 KB each).

For data that you know is archival from the start, such as log files or compliance backups, the more direct approach is configuring S3 Lifecycle Rules on the relevant buckets. A lifecycle rule can automatically transition objects to Standard-IA after 30 days, to Glacier Instant Retrieval after 90 days, and to Glacier Deep Archive after 365 days. You can also configure lifecycle rules to expire (delete) objects after a defined retention period, which prevents old data from accumulating indefinitely.

One important caveat: both Standard-IA and One Zone-IA enforce a minimum storage duration of 30 days. If you delete or overwrite an object before the 30-day mark, you are still charged for the full 30 days at the IA storage rate. This means lifecycle transitions should be configured with retention requirements in mind, not applied indiscriminately to all buckets.

Unattached EBS volumes and orphaned snapshots

This is perhaps the most straightforward cost leak on this list, and also one of the most persistent. When you terminate an EC2 instance, the attached EBS volumes are not always deleted along with it. Whether the volume is deleted depends on the "Delete on Termination" attribute, which is set at the time the volume is attached. For root volumes, this attribute defaults to true, but for additional data volumes, it often defaults to false. This means that terminating an EC2 instance can leave behind one or more "orphaned" EBS volumes that are no longer attached to any running instance but continue to incur storage charges.

The cost of a single orphaned volume depends on its type and size. A 100 GB gp3 volume costs approximately $8 per month. That does not sound like much in isolation, but in environments where instances are created and terminated frequently, such as development and testing environments, CI/CD build fleets, or auto-scaling groups, orphaned volumes accumulate over time. It is not uncommon to find dozens of unattached volumes in an account that has been active for a year or more.

EBS snapshots follow a similar pattern but can be even harder to catch. Snapshots are incremental backups of EBS volumes, and teams often configure automated snapshot schedules using Amazon Data Lifecycle Manager (DLM) or custom Lambda functions as a backup strategy. This is good practice for data protection. The problem arises when snapshot retention policies are not configured, or when they are configured with overly generous retention periods. Without a retention policy, every snapshot that is created is retained indefinitely, and each one incurs storage charges based on the amount of changed data it contains. Over months and years, the cumulative cost of forgotten snapshots can exceed the cost of the volumes they were meant to protect.

How to identify it: In the EC2 console, navigate to the Volumes section and filter by the "Available" state. Every volume listed as "Available" is not attached to any instance and is almost certainly unnecessary. For snapshots, sort by creation date and cross-reference against your retention requirements. Any snapshot older than your retention policy dictates is a candidate for deletion. AWS Trusted Advisor also includes a check for underutilized EBS volumes and can surface these findings automatically. You can also run a quick audit from the AWS CLI.

The following command lists all unattached EBS volumes in your current region:

aws ec2 describe-volumes --filters Name=status,Values=available --query "Volumes[].{ID:VolumeId,Size:Size,Type:VolumeType,Created:CreateTime}" --output table

For snapshots, a similar approach works. You can list all snapshots owned by your account and review their age and associated volume:

aws ec2 describe-snapshots --owner-ids self --query "Snapshots[].{ID:SnapshotId,VolumeId:VolumeId,Size:VolumeSize,Started:StartTime}" --output table

How to fix it: For orphaned volumes, the fix is simply deletion after verifying that no critical data resides on them. If you are uncertain, you can create a final snapshot of the volume before deleting it, and then set a lifecycle policy on that snapshot to expire it after a defined period. For snapshots, the recommended approach is configuring Amazon Data Lifecycle Manager with explicit retention rules.

DLM allows you to define automated snapshot policies that specify how many snapshots to retain (for example, keep the last 7 daily snapshots and the last 4 weekly snapshots) and automatically deletes older snapshots when the retention limit is reached. This eliminates the manual overhead of snapshot cleanup and ensures that backup storage costs remain predictable over time. Going forward, it is also worth reviewing the "Delete on Termination" attribute for EBS volumes in your launch templates and AMIs.

Setting this attribute to true for non-persistent data volumes ensures that future instance terminations do not leave behind orphaned storage. For volumes that contain data you need to retain, a better pattern is snapshotting the volume before termination (using a lifecycle hook in an Auto Scaling Group, for example) and then letting DLM manage the snapshot retention.

Building cost awareness into your operational rhythm

The three cost leaks described above share a common characteristic: none of them are the result of a single bad decision. They are the product of reasonable initial configurations that were never revisited as the environment grew and workloads changed. NAT Gateways are deployed because private subnets need internet access.

S3 data is stored in Standard because it is the default. EBS volumes are left behind because the termination behavior was not explicitly configured. The underlying lesson is that AWS cost optimization is not a one-time audit. It is an ongoing discipline that needs to be embedded into your operational practices.

A monthly review of Cost Explorer anomalies, a quarterly check of S3 storage distribution across classes, and a periodic sweep for orphaned resources will catch most waste before it compounds.

AWS Budgets, which is free to use, can be configured to send alerts when spending in a particular service or account exceeds a threshold, giving you early visibility into unexpected cost increases. The tools exist. The pricing data is transparent. What most teams lack is not information but the habit of looking.

When to Use ECS vs EKS vs Lambda: A Decision Framework

Cerulean Cloud — Sun, 29 Mar 2026 15:27:42 GMT

You're building on AWS.

You know your workload needs compute. And now you're staring at three options. ECS, EKS, and Lambda. Each with its own ecosystem of blog posts telling you it's the right choice.

Here's the truth: there's no universally correct answer. But there is a structured way to decide. At Cerulean Cloud, we walk folks through this decision regularly, and we've found that most teams overcomplicate it. The right choice usually comes down to four factors.

The Four Factors That Actually Matter

Before comparing services feature-by-feature, zoom out. Every compute decision on AWS comes down to these questions:

1. Operational ownership: How much infrastructure management does your team want to own?

2. Workload profile: Is your workload long-running, event-driven, or somewhere in between?

3. Team expertise: Does your team already know Kubernetes? Docker? Neither?

4. Growth trajectory: Where will this workload be in 18 months?

Let's run each service through these lenses.

Lambda: When You Want AWS to Handle Everything

Lambda is the right starting point for workloads that are event-driven, short-lived, and unpredictable in volume. Think API backends that spike during business hours and flatline at night. Think file processing triggers, webhook handlers, or scheduled ETL jobs that run for a few minutes and disappear.

Choose Lambda when:

Your functions complete in under 15 minutes (hard limit).
Traffic is bursty or unpredictable and you don't want to pay for idle capacity.
Your team is small and you'd rather spend engineering cycles on product, not infrastructure.
You're building event-driven architectures with SQS, SNS, EventBridge, or S3 triggers.

Think twice about Lambda when:

You need persistent connections (WebSockets, long-polling) at scale.
Cold starts are a dealbreaker for your latency requirements - although there is Lambda warm configurations to tackle this.
Your application has complex dependency trees or large container images
You're running compute-heavy workloads that consistently need 10+ minutes per invocation.

Lambda's superpower is that there's genuinely nothing to manage. No clusters, no capacity planning, no patching. But that simplicity comes with constraints. The moment your workload starts pushing against those constraints, you're fighting the platform instead of building on it.

ECS: When You Want Containers Without the Kubernetes Tax

ECS is AWS's own container orchestration service, and it's quietly become the best default choice for teams that need containers but don't need Kubernetes.

Pair ECS with Fargate and you get serverless containers. No EC2 instances to manage, no cluster capacity to worry about. Pair it with EC2 launch type and you get full control over the underlying hosts when you need it (GPU workloads, specific instance types, cost optimization through Reserved Instances).

Choose ECS when:

Your workload is long-running and needs to be containerized (APIs, background workers, microservices).
Your team is comfortable with Docker but doesn't have Kubernetes expertise.
You want tight, native integration with the AWS ecosystem like ALB, CloudWatch, IAM, Service Connect without glue code.
You value simplicity and fast time-to-production over ecosystem portability.

Think twice about ECS when:

Multi-cloud or hybrid-cloud portability is a hard requirement today
You need advanced scheduling, custom controllers, or operators that only exist in the Kubernetes ecosystem.
Your team already runs Kubernetes elsewhere and wants a consistent operational model across environments.

Here's what we tell folks candidly: ECS is underrated. It does 80% of what Kubernetes does with 30% of the operational complexity. For most mid-market teams running purely on AWS, ECS with Fargate is the fastest path to production-grade container workloads.

EKS: When You Genuinely Need Kubernetes

EKS is managed Kubernetes on AWS. It gives you the full Kubernetes API, the massive open-source ecosystem, and the ability to run the same workload definition on any cloud or on-premises cluster.

But Kubernetes is not free. Not in cost, and certainly not in operational complexity. EKS is the right choice when the power of the Kubernetes ecosystem solves a problem that ECS cannot.

Choose EKS when:

You have a team that already knows Kubernetes and operates it confidently.
Portability is a real, current requirement and you're running workloads across AWS, GCP, Azure, or on-prem.
You need the Kubernetes ecosystem: Istio for service mesh, Argo for GitOps, custom operators for stateful workloads, Karpenter for intelligent autoscaling.
You're running a platform team that serves multiple internal development teams and needs namespace-level isolation, RBAC, and self-service deployment.

Think twice about EKS when:

"We might go multi-cloud someday" is the only reason Kubernetes is on the table. Hypothetical portability is expensive portability.
Your team doesn't have Kubernetes experience and you'd be learning it in production.
You're a team of 3–10 engineers and you'd be spending 20–30% of your time on cluster operations instead of product development.
Your workload is simple enough that ECS or Lambda would handle it with less overhead.

We've seen this pattern repeatedly: a startup adopts EKS because it feels like the "serious" choice, then spends six months building platform tooling before shipping a single feature. Kubernetes is powerful. But power you don't need is just overhead.

The Decision Flow

Here's the simplified framework we use at Cerulean Cloud during architecture engagements:

Start with Lambda. If your workload is event-driven, runs under 15 minutes, and doesn't need persistent compute then Lambda is your answer. Stop here.

If Lambda doesn't fit, default to ECS on Fargate. Long-running containers, microservices, background workers. ECS handles all of it with minimal operational burden. Use EC2 launch type only if you need GPU, specific instance families, or want to optimize cost with Reserved Instances.

Graduate to EKS only when you have a Kubernetes-specific reason. Portability requirements, ecosystem tooling (service mesh, GitOps, custom operators), or a platform engineering team serving multiple tenants. If you can't name the specific Kubernetes capability you need, you probably don't need EKS.

The Hybrid Reality

In practice, most mature AWS environments use more than one of these services. Lambda handles event-driven aspects like processing S3 uploads, running scheduled tasks, powering lightweight APIs. ECS runs the core application services. EKS exists when there's a genuine platform engineering need.

The mistake is treating this as a one-size-fits-all decision. It's not. It's a per-workload decision, and the best architectures are intentional about which workloads land where.

What This Looks Like in Practice

When we run architecture engagements at Cerulean Cloud, the compute decision is never made in isolation. It connects directly to networking (VPC design, service discovery), observability (CloudWatch vs. third-party), CI/CD (how you deploy shapes what you deploy to), and cost (Fargate vs. EC2, Lambda pricing at scale).

Why I built stressllm?

Cerulean Cloud — Sat, 07 Mar 2026 02:03:51 GMT

A few weeks ago I wrote about building a multi agent system on my old personal laptop. You can read more about it here. This project failed miserably because of my hardware limitations. My hardware was theoretically sufficient, but the system was unusable in practice. I realized I was flying blind against the KV Cache; the "working memory" that scales with context. I needed a way to know exactly where my hardware would "choke" before I spent hours debugging a lagging agent.

So, I spent the weekend vibe-coding stressllm: a Python CLI tool that performs context saturation tests to find your hardware's breaking point.

The Core Bottleneck: VRAM vs. Context

Most people only look at "Static VRAM" i.e the memory needed to load the model weights. But the real killer is Dynamic VRAM.

As your conversation grows, the model needs to store the "Keys" and "Values" (KV Cache) of every token to avoid re-processing the entire prompt. This grows linearly. On my 6-year-old laptop with 2GB of VRAM, the moment that cache spills over into system RAM, performance falls off a cliff.

Under the Hood: The Architecture

I built StressLLM with a clean separation of concerns: The Watcher and The Worker.

I split the tool into two main components: probe.py (the observer) and engine.py (the executor).

`probe.py`: Hardware Telemetry & Constraints

This module is responsible for heartbeat-style monitoring of your system.

NVIDIA Only: The tool currently relies on pynvml (NVIDIA Management Library). It specifically looks for nvml.dll on Windows or the standard drivers on Linux.
Graceful Fallback: When the GPU isn't available, the tool automatically pivots to reporting system-wide RAM and CPU usage via psutil.
Error Isolation: Each sensor (VRAM, Temperature, CPU) is independent. If a single telemetry call fails, the rest of the stats are still yielded.

2. `engine.py`: The Saturation Logic

This is the "stress" part of the tester. It uses a specific strategy to find the breaking point:

Synthetic Pressure: Instead of asking the model to "think," it uses a wordpool.txt to generate a random prompt of a specific token length. The instruction is simple: "Read this and say 'done'." This isolates Prompt Evaluation speed and KV Cache overhead from actual generation.
The Generator Pattern: The test is written as a Python generator (yield). This is a safety feature. If the model causes a total system hang or an Out-of-Memory (OOM) error at 128k context, the tool has already "yielded" the successful results for 2k, 8k, and 32k.
Direct GGUF vs. Ollama: It supports two paths. It can call the Ollama API (testing your local server) or load a .gguf file directly via llama-cpp-python, which allows for manual control over n_gpu_layers.
The Verdict: The code maps Tokens-Per-Second (TPS) to a status. Anything under 5 TPS is flagged as the "Cliff" meaning the KV cache has likely spilled into your slower system RAM.

The Verdict: Mapping the Performance Cliff

The tool doesn't just give you numbers; it gives you a "vibe check" based on the _verdict logic:

✅ Smooth (>15 TPS): Native GPU speeds. Your agents will feel snappy.
⚠️ Slowing (5-15 TPS): You’ve likely hit the "Knee" where the KV cache is spilling into system RAM.
💀 Cliff (<5 TPS): Total saturation. Time to lower your num_ctx or buy a better GPU.

Getting Started

I’ve pushed the tool to PyPI and GitHub. You can test your own local setup in seconds:

You can use the following commands to test the tool

#install via pip
pip install stressllm

#list available models 
stressllm models

#run test
stressllm run  --depth 2

Whether you are building agentic frameworks or just running local LLMs for privacy, you need to know your limits. Stop guessing and start stressing.

Find it on PyPI: pypi.org/project/stressllm/

Building a Multi Agent System to Track Real Madrid Matches Using AWS Strands and Ollama

Cerulean Cloud — Tue, 24 Feb 2026 01:10:31 GMT

I like soccer and Real Madrid, but I’m not the diehard kind who can recite every fixture by heart. So naturally, I end up missing mid-week games and that usually end up to be the best matches. I wanted to solve for this…

Then at least I know which match is worth following, so even if I cannot watch it, I can still track the score online.

And I had three options…

🔺Google the fixtures manually (Zero points for style).

🔺Write a cron job to ping an API and send a dry text (Functional, but boring).

🔺Build a multi-agent system that finds the game, decides if the "hype factor" is worth it, and notifies me with a reason to watch.

Since we’re well past 2023, I chose the obvious one. 🤷🏽‍♂️

I spent the weekend building a local agentic system using AWS Strands and Ollama.

This project uses:

AWS Strands for agent orchestration
Ollama for running language models locally
A Football API for match data
Telegram API for notifications

What Is AWS Strands

AWS Strands is a lightweight agent orchestration framework that allows you to define agents, connect them to language models, and expose structured tools that those agents can call. Instead of writing manual glue code between LLM calls and functions, Strands lets you describe:

The agent’s role
The model it uses
The tools it can call
The rules it must follow

Strands handles the tool calling loop internally. When the model decides to call a tool, Strands executes the function, captures the output, and feeds it back into the model until the task completes.

In this project, Strands acts as the controller. It ensures the supervisor calls the analyzer first and the communication step second. It enforces order without requiring complex orchestration code.

What Is Ollama and Why Use It

Ollama is a local runtime that allows you to run open source language models on your own machine. It exposes a simple HTTP server, typically at:http://localhost:11434

Instead of sending prompts to a cloud provider, your application sends them to Ollama, which runs the model locally.

This gives you:

Full local control
No external inference costs
Faster iteration during development
No dependency on external model APIs

In this project, Ollama runs small models such as Gemma 2B to handle structured reasoning.

Installing Ollama on Windows

Step 1: Download

Go to the official website: https://ollama.com

Download the Windows installer.

Step 2: Install

Run the installer and follow the default setup. Ollama installs as a background service.

Step 3: Verify

Open PowerShell and run:

ollama --version

If installed correctly, it prints the version.

Pulling and Running a Model

To download a model such as Gemma 2B:

ollama pull gemma:2b

To see all installed models:

ollama list

To run the model interactively:

ollama run gemma:2b

To call the model from your application, send an HTTP request to:

http://localhost:11434/api/generate

Example:

curl http://localhost:11434/api/generate -d '{
  "model": "gemma:2b",
  "prompt": "What can you do?"
}'

Your Strands agent internally calls this endpoint when configured with OllamaModel.

High Level Design - Multi agent workflow

The system follows a simple multi step flow:

A supervisor agent receives a task.
An analyzer component fetches match data from a Football API.
The analyzer evaluates whether the match is worth watching.
If the match crosses a defined threshold, a communication component sends a message using the Telegram API.

There are no complex distributed services. Everything runs locally except the external APIs.

The intelligence sits in the analysis step. The supervisor enforces order. The communication layer focuses only on delivery.

Understanding the Sequence Diagram

The sequence diagram above shows key components of the system and how the query flows. The flow is linear and disciplined.

Step 1: Supervisor Initiates the Task

The Supervisor starts the process by asking the Analyzer to check whether there is a Real Madrid match today. The Supervisor does not fetch data itself. It delegates.

This separation ensures orchestration logic stays clean.

Step 2: Analyzer Calls the Football API

The Analyzer sends a request to the Football API to retrieve match data. This includes opponent, competition, timing, and other relevant details.

The Football API responds with structured match data.

Step 3: Analyzer Performs Reasoning

Once the Analyzer receives match data, it evaluates whether there is a compelling reason to watch the match. This step uses a language model running locally through Ollama.

The reasoning can include:

Competition type
Rivalry level
Importance of the fixture
Context around standings

If the match meets the criteria, the Analyzer returns a formatted result to the Supervisor.

If not, it simply returns nothing significant.

Step 4: Supervisor Triggers Notification

If the Analyzer returns a positive result, the Supervis calls the Comms component.

Step 5: Comms Calls Telegram API

The Comms component sends the final message to the Telegram API, which posts the notification.

Why the System Struggled on My Laptop

The architecture was simple but the limitation was hardware.

My laptop runs on an older Intel i5 processor with only 2GB of VRAM. Even small models require consistent memory allocation. When GPU memory is insufficient, the system offloads computation to system RAM, which increases latency and reduces stability.

What I observed:

Slow inference times
Occasional freezing
Inconsistent reasoning output
Reduced reliability when chaining multiple steps

Multi step workflows amplify instability because each step depends on the previous output. When inference becomes slow or inconsistent, the whole pipeline suffers.

Building locally forced me to understand how models are loaded, served, and executed. That insight is difficult to gain when everything runs behind a managed API.

Three AWS MCP Servers you should use today

Cerulean Cloud — Sat, 27 Dec 2025 19:35:21 GMT

In the rapidly evolving landscape of Generative AI, giving your LLM (Large Language Model) access to real-time data and specialized tools is the difference between a generic chatbot and a powerful AI assistant.

AWS has embraced the Model Context Protocol (MCP), providing a suite of servers that allow AI assistants to interact directly with AWS infrastructure, documentation, and architecture tools. Learn more about AWS MCP servers here.

What is MCP?

The Model Context Protocol (MCP) is an open-source standard, originally developed by Anthropic, that acts as a "universal translator" between AI models and external data sources.

Historically, connecting an AI to a specific tool required custom code, unique API integrations, and complex data parsing. MCP solves this by providing a standardized client-server architecture:

MCP Client: Your AI interface (e.g., Kiro, Claude Desktop or Amazon Q Developer).
MCP Server: A lightweight program that exposes specific tools or data (e.g., AWS Docs, GitHub, or a Database).

By using MCP, you can "plug in" expert capabilities to your AI assistant without writing any integration glue-code.

Connecting to AWS MCP Servers

Most AWS MCP servers are hosted on GitHub (awslabs/mcp) and can be run locally or via remote managed endpoints. To connect them, you generally follow these steps:

Install a Manager: Most servers require Python 3.10+ and the uv package manager.
Configure Credentials: Ensure your local environment has active AWS credentials (via aws configure or environment variables) with the necessary IAM permissions. You only need this to work with your AWS environment.
Update Client Config: Add the server details to your MCP client’s configuration file (usually mcp.json).

Example Configuration Snippet

"mcpServers": {
  "aws-documentation": {
    "command": "uvx",
    "args": ["awslabs.aws-documentation-mcp-server"]
  }
}

AWS MCP Servers You Should Use Today

While AWS offers a growing list of servers (including servers for Lambda, S3, and CloudWatch), these three provide the most immediate value for developers and architects.

1. AWS Documentation MCP

The AWS Documentation MCP exposes AWS documentations via MCP. Instead of the LLM relying on its training data (which might be outdated), this server allows it to fetch the latest official AWS docs in real-time.

This will allow your AI assistant to programatically fetch and read up to date AWS documentations. With this, when learning AWS and troubleshooting any projects, your AI assistant always has access to up to date AWS information. With this you can prevent the AI from hallucinating old API syntax or deprecated service limits.

2. AWS Knowledge MCP

The AWS Knowledge MCP is a remote, fully-managed server designed for deep architectural research and regional service availability. It can quickly check which services or specific features (like a specific EC2 instance type) are available in a given AWS Region, provide full stack development guidance and access latest CDK and Cloudfromation documents for better IaC (Infra-as-code) development

3. AWS Diagram MCP

The AWS Diagram MCP allows your AI to act as a cloud architect. It bridges the gap between a text-based conversation and a visual architecture diagram. It uses the Python diagrams package DSL to generate professional-looking architecture images. It can generate AWS architecture diagrams, sequence diagrams, and flowcharts. You can ask the AI to "draw a highly available web app" and it will generate the code and render the image for you to review.

The shift toward MCP marks a transition from AI assistants "that talk about AWS" to "work on or with AWS." By integrating these three servers you create a comprehensive environment where your AI assistant can research, plan, and visualize your cloud infrastructure in one continuous workflow.

JARK Stack for Gen AI on Kubernetes

Cerulean Cloud — Sun, 27 Jul 2025 17:26:06 GMT

You should have heard of LAMP, MEAN, MERN or even JAM stack. But what is this new JARK stack?

Let’s unpack that. JARK stands for Jupyter, Argo, Ray, and Kubernetes. It can help teams launch gen‑AI pipelines in production-ready infrastructure.

Let’s unpack this further!

The JARK stack brings together four open-source building blocks to manage end‑to‑end generative AI workloads:

JupyterHub: Multi-user notebook environment for experimentation, fine‑tuning, and analysis.
Argo Workflows: YAML‑defined DAG pipelines connecting notebook stages to training, packaging, and serving.
Ray & Ray Serve: Scalable distributed compute across GPU/CPU; Ray Serve handles high‑throughput inference.
Kubernetes (EKS): The orchestrator; handles scheduling, scaling, GPU node provisioning (often via Karpenter), multi-tenancy, and resilience.

This stack originated from AWS’s re:Invent 2023 presentation "Deploying Generative Models on Amazon EKS".

Why Teams Are Building JARK on EKS

Experimentation + Production in One Place

JupyterHub serves as a collaboration point for data scientists and ML engineers. They can interactively prototype models, fine-turn hyperparameters, and version notebooks—all inside the same Kubernetes cluster used for production workloads.

Declarative Pipelines That Scale

With Argo Workflows, you can "define once, run anywhere." Want to run fine-tuning, validation, packaging, and deployment as steps? Argo manages it as a directed graph, running in containers with GPU scheduling support.

Distributed Compute & Scalable Serving

For model training and inference, Ray lets tasks scale across nodes. Ray Serve adds autoscaling, streaming, and robust APIs for serving large language models or vision engines.

Cloud-Native and Cost-Aware

EKS nodes can autoscale thanks to Karpenter or managed nodegroups. GPU nodes can spin up on demand, minimizing idle costs. Kubernetes also provides namespaces, RBAC, monitoring, and resource isolation.

Sample JARK stack deployment with EKS Auto Mode

The following is the broad overview of EKS Auto Mode cluster architecture.

Deploy EKS Auto Mode Cluster with autoModeConfig:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: jark-autocluster
  region: us-west-2
  version: "1.27"
autoModeConfig:
  enabled: true
  nodeRoleARN: arn:aws:iam:::role/EKSAutoNodeRole
  nodePools:
    - general-purpose
    - system
managed: false
vpc:
  nat:
    gateway: Single

Then:

create cluster -f eks-cluster-aut-mode.yaml

This provisions cluster where Karpenter auto-provisions node pools and handles upgrades (~every 21 days) automatically.

Custom Node Pool Creation

In Auto Mode, node pools are replaced by provisioners configured via Karpenter. To handle GPU workloads and isolate system pods, define two provisioner CRDs.

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: jark-gpu
spec:
  requirements:
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["g5.xlarge","g4dn.xlarge"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["amd64"]
  provider:
    subnetSelector:
      karpenter.sh/discovery: 
    securityGroupSelector:
      kubernetes.io/cluster/jark-autocluster: shared
  taints:
    - key: "ray.io/node-type"
      value: "worker"
      effect: "NoSchedule"
  limits:
    resources:
      cpu: "1000"
      memory: "2000Gi"
  ttlSecondsAfterEmpty: 60

And CPU/general-purpose:

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: jark-cpu
spec:
  requirements:
    - key: "node.kubernetes.io/instance-type"
      operator: In
      values: ["m5.xlarge","m5.2xlarge"]
  ttlSecondsAfterEmpty: 30

Apply them:

bashCopyEditkubectl apply -f gpu-provisioner.yaml
kubectl apply -f cpu-provisioner.yaml

Install JupyterHub, Argo and Ray

Install JupyterHub

Use Helm chart from the AWS Data‑on‑EKS blueprint to deploy JupyterHub

helm repo add data-on-eks https://awslabs.github.io/data-on-eks
helm repo update
helm install jhub data-on-eks/jupyterhub \
  --namespace jupyterhub --create-namespace \
  --set singleuser.image.name=jupyter/minimal-notebook \
  --set proxy.service.type=LoadBalancer

This sets up multi-user JupyterHub with a public LB endpoint .

Install Argo Workflows

create namespace argo
kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo-workflows/stable/manifests/install.yaml

Verify argo is deployed.

kubectl get pods -n argo
kubectl port-forward svc/argo-server -n argo 2746:2746

Install Ray Operator and Ray Serve

helm repo add ray https://ray-project.github.io/kuberay-helm/
helm repo update
helm install ray-operator ray/ray-operator --namespace ray-system --create-namespace

Deploy a Ray cluster manifest (ray-cluster.yaml) to run Serve workers on GPU nodes and Ray head:

apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
  name: jark-ray
  namespace: default
spec:
  rayVersion: "2.9.1"
  headGroupSpec:
    template:
      spec:
        containers:
          - name: ray-head
            image: rayproject/ray:latest
            ports:
              - containerPort: 8265
  workerGroupSpecs:
    - groupName: gpu-workers
      minReplicas: 0
      maxReplicas: 2
      rayStartParams: {}
      template:
        spec:
          containers:
            - name: ray-worker
              image: rayproject/ray:latest
              resources:
                limits:
                  nvidia.com/gpu: 1
              args: ["ray", "start", "--address=$(RAY_HEAD_IP):6379"]
          tolerations:
            - key: "ray.io/node-type"
              operator: "Exists"
              effect: "NoSchedule"

Apply:

kubectl apply -f ray-cluster.yaml

Sample Model from Hugging Face & Serve Inference

Create a Ray Serve application

In serve_app.py:

from ray import serve
from transformers import pipeline

print("Loading HuggingFace model...")
model = pipeline("text-generation", model="gpt2")

serve.start()
@serve.deployment(route_prefix="/generate")
class GenModel:
    def __call__(self, request):
        data = request.json()
        return {"result": model(data["prompt"], max_length=50)[0]["generated_text"]}

GenModel.deploy()

Build Docker image and push to ECR (inside a notebook or CI pipeline)

bashCopyEditdocker build -t ghserve:latest -f Dockerfile .
aws ecr create-repository --repository-name ghserve
docker tag ghserve:latest .dkr.ecr.us-west-2.amazonaws.com/ghserve:latest
docker push /ghserve:latest

Deploy via Kubernetes manifest `ray-serve-app.yaml`:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: serve-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: serve-app
  template:
    metadata:
      labels:
        app: serve-app
    spec:
      containers:
      - name: serve-app
        image: /ghserve:latest
        ports:
        - containerPort: 8000
---
apiVersion: v1
kind: Service
metadata:
  name: serve-service
spec:
  type: LoadBalancer
  selector:
    app: serve-app
  ports:
    - port: 80
      targetPort: 8000

Apply:

kubectl apply -f ray-serve-app.yaml

Test inference

curl http://$(kubectl get svc serve-service -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt":"Hello world"}'

Now, In summary, we have:

An EKS cluster in Auto Mode, scaling nodes automatically via Karpenter.
Custom provisioners for GPU worker isolation and cost‑effective scaling.
A functional JARK stack: JupyterHub, Argo, and Ray Serve.
A sample Hugging Face model served via Ray, with inference exposed through LoadBalancer.

Build an AI powered customer service voice agent on AWS

Cerulean Cloud — Sun, 23 Mar 2025 16:48:23 GMT

A few weeks ago, I found myself in Vegas, trying to update a dinner reservation on a packed Saturday night. Calling the restaurant was useless—there was no chance of getting through. That got me thinking: how quickly could I spin up an AI-powered reservation assistant?

So, I put it to the test. In under an hour, I built a crude proof of concept (PoC) using AWS services. The setup was simple:

Amazon Connect to handle incoming calls
Amazon Lex for voice interaction and capturing reservation details
AWS Lambda to process and store the reservation
Amazon DynamoDB for storing reservations
Amazon SES to send confirmation emails

This isn’t about cutting-edge AI. It’s about solving real problems—fast. AI isn’t useful unless it’s applied to the right problem, and once you do that, impact follows almost instantly.

Now, let’s dive into how I built it.

Architecture

Setting Up Amazon Connect for Call Routing

Create an Amazon Connect Instance

Go to the AWS Console → Amazon Connect → Create an instance.
Set up an administrator account and choose a name for the instance.
Follow the instance creation flow to claim a phone number and use it for incoming calls

Now, whenever someone calls this number, we can define what happens next using a contact flow.

Configure a Contact Flow to Invoke Lex

A contact flow is a set of actions Amazon Connect follows when handling a call. Under bot create a Lex bot. After configuring the bot, create an “intent” to capture user request to make reservations.

Additionally, the call flow I configured a welcome prompt and terminating the call after capturing the required information

Creating an Amazon Lex Bot for Reservations

Define the Lex Bot and Intent

Amazon Lex is AWS’s conversational AI service, handling both text and voice inputs.

In Amazon Lex, create a new bot called ReservationBot.
Under Intents, create a new intent called MakeReservation.
Add sample utterances like:
- I want to make a reservation
- Can I book a table for tonight?
- Reserve a table for two at 7 PM

Add Slots to Capture User Data

Slots are variables that store user inputs. Add the following slots to MakeReservation:

Slot Name	Type	Prompt Example	Required?
`firstName`	AMAZON.FirstName	What name should I book under?	Yes
`phoneNumber`	AMAZON.PhoneNumber	Can I get your phone number?	Yes
`partySize`	AMAZON.Number	How many people are in your party?	Yes
`date`	AMAZON.Date	What date do you need the reservation for?	Yes
`time`	AMAZON.Time	What time would you like the reservation?	Yes
`confirmation`	AMAZON.YesNo	Should I confirm your reservation?	Yes

Configure Fulfillment with AWS Lambda

Once Lex collects user input, we need to process the reservation and store it in DynamoDB.

In Lex, go to the Fulfillment section.
Select AWS Lambda function and create a new function.
Use the following Python code:

import boto3
import uuid

dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("Reservations")

def lambda_handler(event, context):
    slots = event["currentIntent"]["slots"]

    reservation_id = str(uuid.uuid4())
    reservation_data = {
        "ReservationID": reservation_id,
        "FirstName": slots["firstName"],
        "PhoneNumber": slots["phoneNumber"],
        "PartySize": slots["partySize"],
        "Date": slots["date"],
        "Time": slots["time"]
    }

    table.put_item(Item=reservation_data)

    return {
        "dialogAction": {
            "type": "Close",
            "fulfillmentState": "Fulfilled",
            "message": {
                "contentType": "PlainText",
                "content": f"Your reservation is confirmed, {slots['firstName']}! Your confirmation ID is {reservation_id}."
            }
        }
    }

This function:
✅ Extracts user inputs from Lex
✅ Generates a unique reservation ID
✅ Stores the reservation in DynamoDB
✅ Returns a confirmation message

Sending Confirmation Emails with Amazon SES

Once the reservation is stored, we want to send a confirmation email. Modify the Lambda function to include this:

def send_email(to_address, first_name, reservation_id):
    subject = "Your Reservation is Confirmed!"
    body = f"Hello {first_name},\n\nYour reservation is confirmed. Your confirmation ID is {reservation_id}.\n\nThank you!"

    response = ses.send_email(
        Source="your-email@example.com",
        Destination={"ToAddresses": [to_address]},
        Message={
            "Subject": {"Data": subject},
            "Body": {"Text": {"Data": body}}
        }
    )
    return response

Update the lambda_handler function to call send_email():

send_email(slots["phoneNumber"] + "@example.com", slots["firstName"], reservation_id)

Now, once a reservation is made, an email confirmation is sent to the customer.

Testing the Full Setup

Test the Call Flow in Amazon Connect

Call the phone number assigned to Amazon Connect.
Try making a reservation by voice.
Ensure the data is stored in DynamoDB and a confirmation email is sent.

This setup isn’t complex, and that’s the beauty of it. In under an hour, we built a fully functional AI-powered reservation system using AWS services. The key takeaways:

✅ Amazon Connect + Lex makes voice call automation easy.
✅ Lambda + DynamoDB handles backend processing and storage.
✅ Amazon SES automates email confirmations.

This solution can be expanded in many ways—multi-language support, SMS confirmations via SNS, or even integrating with restaurant POS systems. But the core idea remains: AI is most effective when applied to the right problems.

Building a Live Stream Infrastructure on AWS

Cerulean Cloud — Fri, 28 Feb 2025 19:18:33 GMT

A few days ago, Netflix made history by streaming the Screen Actors Guild (SAG) Awards live for the first time. This marked a major shift in how award shows and other large-scale events are broadcasted, proving that live streaming is no longer just for sports and gaming—it's now a mainstream entertainment medium. This got me thinking: What does it take to build a scalable, high-quality live streaming architecture?

In this blog, I’ll break down how AWS provides a robust solution for live streaming, using a combination of services like AWS Elemental MediaLive, MediaPackage, Amazon S3, and CloudFront to ensure seamless content delivery. Now, have in mind - these services are purpose built by AWS to serve such media usecases.

Whether you're streaming a major award show, a corporate event, or even a personal live broadcast, understanding the building blocks of AWS live streaming can help you create a reliable and scalable streaming experience.

Let’s dive into how each AWS service plays a role in making this happen.

Architecture

This architecture is for Live Streaming on AWS, leveraging various AWS services to efficiently deliver video content to end-user devices. Here's a detailed breakdown of the services involved and the overall workflow:

1. Video Source

This represents the camera or any other video capture device that provides the live video feed. It acts as the input source for the streaming process.

2. AWS Elemental Link

Role: AWS Elemental Link is a device that connects the video source (like a camera or production equipment) to AWS Elemental MediaLive.
Function: It encodes and streams the video content to AWS, ensuring high-quality transmission. Link supports standard video inputs and outputs that can easily be connected to production systems.
Benefit: It provides a reliable and scalable way to ingest live video streams, bridging the gap between the on-premise hardware and AWS cloud services.

3. AWS Elemental MediaLive

Role: MediaLive is a broadcast-grade live video processing service. It ingests the video from Elemental Link and processes it to create the live stream.
Function: It transcodes the video into different formats for various devices and resolutions. MediaLive is capable of adaptive bitrate streaming, ensuring the best quality video delivery under varying network conditions.
Benefit: AWS Elemental MediaLive is highly scalable, ensuring that the stream can reach a wide audience without degrading the quality. It also integrates with other AWS services to enhance performance and security.

4. AWS Elemental MediaPackage

Role: MediaPackage is used to create live streaming packages, enabling the stream to be formatted and delivered across multiple devices and platforms.
Function: It packages the content delivered by MediaLive into formats like HLS (HTTP Live Streaming) and DASH (Dynamic Adaptive Streaming over HTTP) for device compatibility. MediaPackage also supports features like encryption and DRM (Digital Rights Management).
Benefit: This service helps deliver high-quality video to devices regardless of their platform, and ensures smooth, secure, and efficient streaming. It allows for failover strategies, ensuring reliability.

5. Amazon S3

Role: Amazon Simple Storage Service (S3) is used to store the video files, including both live and archived content.
Function: MediaPackage stores the processed live streams in S3, ensuring that the video content is secure, scalable, and easily accessible for future use or playback.
Benefit: S3 provides durable and low-latency storage, making it a reliable place for hosting video content for both live and on-demand applications.

6. Amazon CloudFront

Role: CloudFront is a global content delivery network (CDN) that accelerates the delivery of content to end-user devices.
Function: CloudFront caches content from Amazon S3, ensuring that the live video feed is quickly delivered to users regardless of their geographical location.
Benefit: CloudFront helps reduce latency by delivering content from the closest edge location to the user. It optimizes the delivery of media content, ensuring seamless streaming across various devices.

7. End User Devices

Role: These are the devices used by the viewers to watch the live stream.
Function: End-user devices like smartphones, tablets, desktops, and smart TVs consume the live stream via their respective media players or apps.
Benefit: CloudFront ensures that the video stream is optimized for these devices, regardless of the bandwidth or device specifications.

How This Architecture Works

Capture & Transmission: The video source captures the live feed, and AWS Elemental Link encodes and sends this video stream to MediaLive.
Processing & Transcoding: MediaLive processes the incoming stream, converting it into different formats and resolutions for adaptive streaming.
Packaging & Security: MediaPackage formats the video into compatible packages (HLS/DASH) for delivery across different platforms, while also providing encryption and security features.
Storage: Processed streams are stored in Amazon S3 for archival purposes.
Distribution: CloudFront delivers the video to end-user devices with low latency, ensuring a smooth viewing experience for a global audience.

Key Benefits of This Architecture

Scalability: AWS services like MediaLive, MediaPackage, and CloudFront automatically scale based on viewer demand, providing a reliable service even during high traffic.
Global Reach: CloudFront’s edge locations ensure that the video stream is delivered to users worldwide with minimal latency.
Security: Encryption and access control are built into the architecture, ensuring that content is secure both in transit and at rest.
Cost Efficiency: This setup leverages the pay-as-you-go pricing model, which means you only pay for the services you use, making it cost-effective for both small and large-scale live streaming events.

That’s it! We just figured how to reliable live stream via the AWS managed infrastructure and services to end-user devices.

Serve a HTML Resume Site via a Docker Container

Cerulean Cloud — Fri, 28 Feb 2025 18:53:18 GMT

In this blog, we'll walk you through the steps to serve a simple HTML website using Apache2 in a Docker container. This guide assumes you have an Ubuntu environment with Docker installed.

Assumptions

Ubuntu Environment: Ensure you're using an Ubuntu-based system.
Docker Installed: Verify Docker is installed and running using the command docker --version.

Create the HTML File

Create a directory to store your HTML file and your Dockerfile.

Dockerfile will contain instructions on how to build the image.

mkdir my-resume-site
cd my-resume-site
vi index.html

Create the index.html file inside this directory with the following content:

htmlCopy codehtml>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My Resumetitle>
    <style>
        body {
            font-family: Arial, sans-serif;
            margin: 20px;
            line-height: 1.6;
        }
        h1 {
            text-align: center;
        }
        .section {
            margin-bottom: 20px;
        }
    style>
head>
<body>
    <h1>John Doeh1>
    <p>Email: john.doe@example.com | Phone: (123) 456-7890p>
    <hr>
    <div class="section">
        <h2>Professional Summaryh2>
        <p>Motivated and detail-oriented professional with experience in web development, system administration, and cloud technologies. Passionate about delivering impactful solutions and learning new technologies.p>
    div>
    <div class="section">
        <h2>Experienceh2>
        <h3>Software Engineer, ABC Corph3>
        <p>Jan 2020 - Presentp>
        <ul>
            <li>Developed and maintained scalable web applications using modern frameworks.li>
            <li>Implemented CI/CD pipelines to automate deployments.li>
        ul>
    div>
    <div class="section">
        <h2>Educationh2>
        <h3>Bachelor of Science in Computer Scienceh3>
        <p>XYZ University, 2019p>
    div>
body>
html>

Create a Dockerfile

To serve the HTML site, create a Dockerfile to set up the Apache2 web server.

vi Dockerfile

Add the following content to the file and save it.

# Use the official Apache2 image as the base
FROM httpd:2.4

# Copy the HTML file into the container
COPY ./index.html /usr/local/apache2/htdocs/

Build the Docker Image

Build the Docker image using the Dockerfile.

docker build -t my-resume-site .

This command tags the image as my-resume-site .

The . in the command - points to the current directory. Docker will automatically know to look for a file named Dockerfile

Run the Docker Container

Run the container and map it to port 80 on your host machine.

docker run -d -p 80:80 --name resume-site my-resume-site

-d: Runs the container in detached mode.
-p 80:80: Maps port 80 of the host to port 80 of the container.
--name resume-site: Names the container resume-site.

Access the Site

Open your browser and navigate to http://localhost:80. You should see your resume displayed.

Verify the Container is Running

Check if the container is running:

docker ps

The output should list the resume-site container.

Cleanup - Stop and Remove the Container

To stop the container:

docker stop resume-site

To remove the container:

docker rm resume-site

Conclusion

You've successfully served a simple HTML resume site using Apache2 in a Docker container!

Git Introduction

Cerulean Cloud — Wed, 01 Jan 2025 19:23:52 GMT

Git is the most popular version control system used by developers around the world. It helps track changes in your code, collaborate with others, and manage project versions efficiently. Whether you're working solo or as part of a team, Git ensures your work is organized and never accidentally lost.

In this blog, we'll walk you through the following:

Understanding Git basics.
Installing Git on Ubuntu.
Creating a GitHub repository.
Pushing the index.html file that was shown in this blog to a GitHub repository.

Introduction to Git

What is Git?

Git is a distributed version control system designed to manage changes in files, especially code, efficiently. It allows developers to:

Track changes over time.
Collaborate with multiple people without conflicts.
Revert to previous versions if needed.

Why Use Git?

Versioning: Always have access to previous versions of your code.
Collaboration: Multiple developers can work on the same project simultaneously.
Backup: Your codebase is stored safely in remote repositories like GitHub.

Installing Git on Ubuntu

To use Git, you need to install and configure it on your machine.

To follow along with the rest of this blog, I am assuming the following:

You have a Github account
You have a working Ubuntu environment

Update the Package Manager

Open your terminal and update the package manager to ensure you have the latest package information:

sudo apt update

Install Git

Install Git using the following command:

sudo apt install git

Verify the Installation

Check if Git is installed correctly by running:

git --version

You should see something like:

git version 2.x.x

Configure Git

Set up your identity, which will be attached to your commits. This will be your github password and email.

git config --global user.name "Your Name"  
git config --global user.email "youremail@example.com"

To confirm your configuration, run:

git config --list

Creating a GitHub Repository

GitHub is a cloud-based service that hosts Git repositories. Here’s how to create one:

Log in to GitHub

Go to Github and log in or sign up for an account.

Create a New Repository

Click on the + icon in the top-right corner and select New Repository.

Provide a repository name (e.g., my-resume-site).
Choose visibility:
- Public: Anyone can view it.
- Private: Only you and your collaborators can view it.
  PS: I recommend using Private
Check “Initialize this repository with a README” option.
Click Create repository.4. Adding and Pushing Your index.html File

Now that your GitHub repository is ready, let’s add and push your index.html file to it.

Create a Local Project Directory

Move to your home directory
```
 cd ~
```
Create a new folder for your project and move into it:
```
 mkdir my-resume-site 
 cd my-resume site
```

Create the `index.html` File

Create a simple index.html file:
```
 vi index.html
```
Copy paste the index.html file contents from this blog to this file and save it.
Confirm file exists
```
 cat index.html
```

Initialize Git

Initialize Git in your project directory:

git init

Stage Your File

Add the index.html file to Git’s staging area:

git add index.html

Commit Your Changes

Save your changes to Git with a meaningful message:

git commit -m "Initial commit: Add index.html"

Link to Your GitHub Repository

Link your local repository to the GitHub repository you created earlier. Replace and with your GitHub username and repository name:

git remote add origin https://github.com//my-resume-site.git

Push Your Code to GitHub

Push your code to the main branch of your GitHub repository:

git branch -M main  
git push -u origin main

Recap on what we did so far

Verifying Your Upload

Open your GitHub repository in a browser.
You should see your index.html file.
Click on the file to view its contents.

Going further

Now that your code is on GitHub, you can:

Update the file and push changes using git add, git commit, and git push.
Clone the repository to other machines using git clone.
Collaborate with others by inviting them to your repository.

Git Cheatsheet

The below table shows most commands that a developer would use in development while using Git.

Command	Description	Example Usage
`git init`	Initialize a new Git repository	`git init`
`git clone`	Clone a repository	`git clone` `https://github.com/user/repo.git`
`git status`	Show the status of your working directory	`git status`
`git add`	Stage changes to be committed	`git add file.txt`
`git add .`	Stage all changes in the current directory	`git add .`
`git commit -m ""`	Commit staged changes with a message	`git commit -m "Added new feature"`
`git push`	Push commits to the remote repository	`git push origin main`
`git pull`	Fetch and merge changes from the remote	`git pull origin main`
`git branch`	List branches	`git branch`
`git branch`	Create a new branch	`git branch feature-branch`
`git checkout`	Switch to a branch	`git checkout feature-branch`
`git checkout -b`	Create and switch to a new branch	`git checkout -b feature-branch`
`git merge`	Merge a branch into the current branch	`git merge feature-branch`
`git log`	Show commit history	`git log`
`git log --oneline`	Show concise commit history	`git log --oneline`
`git diff`	Show changes in tracked files	`git diff`
`git diff`	Compare two branches	`git diff main feature-branch`
`git stash`	Temporarily save changes without committing	`git stash`
`git stash pop`	Apply the last stashed changes	`git stash pop`
`git reset`	Unstage a file	`git reset file.txt`
`git reset --hard`	Reset working directory to a specific commit	`git reset --hard abc1234`
`git remote -v`	Show remote repositories	`git remote -v`
`git remote add`	Add a remote repository	`git remote add origin` `https://github.com/user/repo.git`
`git tag`	Create a tag for a specific commit	`git tag v1.0.0`
`git fetch`	Fetch changes from remote without merging	`git fetch origin`
`git rebase`	Reapply commits on top of another branch	`git rebase main`
`git cherry-pick`	Apply a specific commit to the current branch	`git cherry-pick abc1234`

By mastering Git, you’ll streamline your development workflow and enhance collaboration on any project.

Happy coding!

Kubernetes Introduction

Cerulean Cloud — Sun, 29 Dec 2024 00:26:13 GMT

In modern application deployment, Kubernetes has emerged as the go-to platform for container orchestration. Whether you’re deploying microservices, scaling workloads, or managing clusters, Kubernetes offers a robust framework to simplify these tasks. Let’s dive into the basics of Kubernetes and understand its core concepts.

Before you begin:

In order to understand Kubernetes, you should be familiar with the following:

Linux Fundamentals
Networking Fundamentals
Basics of application deployment

What is Kubernetes?

Kubernetes, often abbreviated as K8s, is an open-source platform designed to automate deploying, scaling, and operating containerized applications. Initially developed by Google, it’s now maintained by the Cloud Native Computing Foundation (CNCF).

Key highlights of Kubernetes:

Portable: Runs on any infrastructure—on-premises, cloud, or hybrid.
Scalable: Effortlessly handles increased loads by scaling applications up or down.
Self-Healing: Automatically restarts failed containers, replaces unresponsive pods, and ensures desired application states.

Broad Kubernetes Architecture

Kubernetes cluster broadly consists of two planes.

Control Plane
Data Plane

Control Plane: Consists of components required to control the cluster

Data Plane: Consists of components where your data is hosted

Core Components of Kubernetes

Understanding the following core components is crucial for a better understanding of Kubernetes:

1. Nodes

A Kubernetes cluster consists of nodes, which are physical or virtual machines.

2. Pods

Pods are the smallest deployable units in Kubernetes. Each pod wraps one or more containers (e.g., Docker containers) that share:

Networking (IP address and ports).
Storage (volumes).

3. Services

Services define how to expose a set of pods to the network. They provide stable endpoints for dynamic pod environments, enabling reliable communication within and outside the cluster.

Common service types include:

ClusterIP: Internal communication within the cluster.
NodePort: Exposes the service on each node’s IP at a static port.
LoadBalancer: Integrates with cloud providers to provide external load balancing.

4. Deployments

Deployments manage the desired state of applications. They ensure a specified number of pods run at any given time and enable rolling updates or rollbacks. Deployments manage replicaset.

5. ConfigMaps and Secrets

ConfigMaps: Store non-sensitive configuration data like environment variables.
Secrets: Securely manage sensitive data such as passwords, tokens, or certificates.

6. Ingress

Ingress manages external access to services within the cluster, typically HTTP or HTTPS. It provides features like URL routing and SSL termination.

7. Namespaces

Namespaces are virtual clusters within a physical cluster, allowing resource isolation for different teams or environments (e.g., dev, test, prod). Think of namespace as a logical grouping of resources. For instance, the namespace “frontend” may group all the pods running various frontend microservices.

How Kubernetes Works?

At its core, Kubernetes follows a declarative model:

Define the desired state: Use YAML or JSON manifests to specify how applications should run.

Kubernetes reconciles the state: The control plane ensures the actual state matches the desired state by scheduling pods, replacing failed ones, and scaling workloads.

Key Processes:

API Server: Handles requests from users and external systems.
Scheduler: Assigns workloads to nodes based on resources.
Controller Manager: Ensures cluster components are functioning correctly.
Kubelet: Runs on worker nodes to manage pods.
Kube-Proxy: Manages network rules and communication.

Common Use Cases

Microservices: Kubernetes simplifies deploying, scaling, and managing interconnected services.
CI/CD Pipelines: Automate application builds, tests, and deployments.
Hybrid and Multi-Cloud: Build resilient applications spanning multiple environments.
Batch Processing: Manage distributed, high-performance workloads.

Conclusion

Kubernetes is a game-changer for managing containerized applications at scale. While it has a steep learning curve, mastering its fundamentals sets the stage for building scalable, resilient, and efficient cloud-native applications. Whether you’re a developer or a DevOps engineer, understanding Kubernetes basics is an essential step in modern cloud computing.

Deploy static resume on an Apache Server

Cerulean Cloud — Tue, 17 Dec 2024 12:57:57 GMT

Apache Web server is a free open server that serves web requests over HTTP. It is one of the widely used web servers in the internet and is known for being secure and reliable.

In this blog, we will deploy this server in Ubuntu and deploy a static site containing a sample resume.

Feel free to update it’s contents to make it yours.

Prerequisites

An Ubuntu machine (local or cloud instance). Use this link to setup WSL if you wish to do so.
Basic knowledge of working on Linux terminal. Use this link to familiarize yourself with basic commands.

Install Apache Web Server

Apache is one of the most popular web servers and is easy to set up on Ubuntu.

Update package repositories:
Open your terminal and run the following command to ensure all packages are up-to-date:
```
 sudo apt update
 sudo apt upgrade -y
```
Install Apache:
Install Apache using the apt package manager:
```
 sudo apt install apache2 -y
```
Start and enable Apache service:
Ensure the Apache service is running and set to start on boot:
```
 sudo systemctl start apache2
 sudo systemctl enable apache2
```
Verify installation:
Open a browser and visit your server’s IP address (or http://localhost if you're running it locally). You should see the default Apache welcome page.

Create Your Static HTML Resume

Next, we'll create a basic HTML file for your resume.

Navigate to the Apache web root directory:
By default, Apache serves files from /var/www/html. Navigate there:
```
 cd /var/www/html
```
Backup the default index file:
```
 sudo mv index.html index.html.bak
```
Create a new index.html file:
Use a text editor like nano to create your resume file:
```
 sudo nano index.html
```

Add basic HTML content for your resume:
Paste the following example content:

 htmlCopy codehtml>
 <html lang="en">
 <head>
     <meta charset="UTF-8">
     <meta name="viewport" content="width=device-width, initial-scale=1.0">
     <title>My Resumetitle>
     <style>
         body {
             font-family: Arial, sans-serif;
             margin: 20px;
             line-height: 1.6;
         }
         h1 {
             text-align: center;
         }
         .section {
             margin-bottom: 20px;
         }
     style>
 head>
 <body>
     <h1>John Doeh1>
     <p>Email: john.doe@example.com | Phone: (123) 456-7890p>
     <hr>
     <div class="section">
         <h2>Professional Summaryh2>
         <p>Motivated and detail-oriented professional with experience in web development, system administration, and cloud technologies. Passionate about delivering impactful solutions and learning new technologies.p>
     div>
     <div class="section">
         <h2>Experienceh2>
         <h3>Software Engineer, ABC Corph3>
         <p>Jan 2020 - Presentp>
         <ul>
             <li>Developed and maintained scalable web applications using modern frameworks.li>
             <li>Implemented CI/CD pipelines to automate deployments.li>
         ul>
     div>
     <div class="section">
         <h2>Educationh2>
         <h3>Bachelor of Science in Computer Scienceh3>
         <p>XYZ University, 2019p>
     div>
 body>
 html>

Save and exit:
Press Ctrl+O to save and Ctrl+X to exit the editor.

PS: Alternatively, if you are using WSL, you can use a notepad to create index.html file and simply copy paste it to your Linux directory using Windows File Explorer.

Test Your Website

Set correct permissions:
Ensure the file has the right permissions:
```
 sudo chmod 644 /var/www/html/index.html
```
Access your website:
Open a browser and navigate to your server’s IP or domain name (e.g. http://127.0.0.1). You should see your resume displayed.

In just a few steps, you’ve set up an Apache web server and hosted your static HTML resume. This is a great starting point for understanding how web-apps are hosted in real life.

Cheers!

Firewalls and Firewalld in Ubuntu

Cerulean Cloud — Sun, 15 Dec 2024 00:30:29 GMT

In today's interconnected world, protecting your systems from unauthorized access is non-negotiable. This is where firewalls come in, serving as your first line of defense against potential cyber threats. If you're using Ubuntu and want to efficiently manage your firewall, Firewalld is a tool you should know about. In this post, we'll explore what firewalls are, why they matter, and how to use Firewalld to secure your Ubuntu system.

What is a Firewall?

A firewall is like a security guard for your network. It monitors traffic entering and leaving your system and decides whether to allow or block it based on predefined rules. Firewalls are crucial for:

Blocking Unwanted Traffic: Preventing hackers and malicious programs from gaining access.
Allowing Trusted Services: Ensuring that essential services like SSH or web servers are accessible.
Monitoring and Controlling Traffic: Keeping tabs on what’s happening in your network.

Think of it as a filter that only lets the right people in while keeping the wrong ones out.

Now, let’s use Firewalld - a powerful and flexible firewall management tool for Linux to experiment with firewalls.

Firewalld is a dynamic firewall. It allows you to make real-time changes without restarting the entire service. Here are a few reasons to use Firewalld:

Zone-based Configuration: You can define trust levels for network connections using zones.
Real-time Changes: Add or remove rules without interrupting active connections.
Integration with iptables: It simplifies the complexity of managing iptables directly.

Install Firewalld

Firewalld isn’t installed by default on Ubuntu, but you can get it up and running in just a few steps:

sudo apt update  
sudo apt install firewalld  
sudo systemctl start firewalld  
sudo systemctl enable firewalld

These commands update your package lists, install Firewalld, and ensure it starts automatically whenever your system boots.

Checking Firewalld Status

Once installed, verify that Firewalld is active:

sudo systemctl status firewalld

If it’s running, you’ll see “active (running).”

Firewalld Basics

Now that you have Firewalld installed, we will quickly run through some basics about configuring it.

Zones in Firewalld

Zones are the core of Firewalld’s functionality. They define trust levels for your network connections. Common zones include:

Public: For untrusted networks, like public Wi-Fi.
Home: For trusted networks, like your private Wi-Fi.
Work: For office networks with medium trust levels.

You can view active zones with:

firewall-cmd --get-active-zones

Allowing a Service

Let’s say you’re hosting a website and must allow HTTP traffic. Run:

firewall-cmd --add-service=http --permanent  
firewall-cmd --reload

Blocking a Service

To block a service like SSH, use:

firewall-cmd --remove-service=ssh --permanent  
firewall-cmd --reload

Practical Example: Securing a Web Server

Imagine you’re hosting a web application on an Ubuntu server.

PS: Instead of imagining, you can run a simple site inan Apache server and try securing it

Here’s how you can secure it with Firewalld:

Allow HTTP Traffic:

firewall-cmd --add-service=http --permanent  
firewall-cmd --reload

Block Other Traffic:

Remove unnecessary services (e.g., SSH):

firewall-cmd --remove-service=ssh --permanent  
firewall-cmd --reload

Verify Rules:

Check active rules to ensure only HTTP is allowed:

firewall-cmd --list-all

This will display the active zone and its rules.

Firewalls are essential for securing your systems. Take a moment today to try and practically implement and play around with this.

Networking Crash Course for the cloud

Cerulean Cloud — Thu, 12 Dec 2024 03:16:47 GMT

Networking is the backbone of cloud computing. Whether you're deploying applications, managing virtual networks, or securing data, understanding core networking concepts is essential. In this crash course, we'll explore key networking topics as follows:

OSI Model
TCP/IP Model
What is TCP?
What is UDP?
TCP vs UDP
DNS
SSH
IP Address and Subnetting
IP Classes
CIDR

OSI Model

The OSI (Open Systems Interconnection) model provides a conceptual framework for understanding how data flows through a network. It has 7 layers, each with specific responsibilities:

Physical Layer
Manages the transmission of raw data bits over physical media (e.g., cables, Wi-Fi).
Example: Ethernet cables.
Data Link Layer
Ensures error-free data transfer between adjacent nodes. Includes MAC addresses.
Example: Ethernet, Wi-Fi (802.11).
Network Layer
Handles routing and addressing using IP addresses.
Example: IPv4, IPv6.
Transport Layer
Ensures reliable delivery (TCP) or faster, connectionless transfer (UDP).
Example: TCP, UDP.
Session Layer
Manages sessions and controls connections between applications.
Example: Remote Desktop Protocol (RDP).
Presentation Layer
Translates data formats for applications. Handles encryption and compression.
Example: SSL/TLS encryption.
Application Layer
Interfaces directly with users.
Example: HTTP, FTP, DNS.

TCP/IP Model

The TCP/IP model - is essentially a condensed version of the OSI model widely used in modern networking.

Link Layer
Combines the OSI Physical and Data Link layers.
Example: Ethernet.
Internet Layer
Maps to the OSI Network layer. Handles IP addressing and routing.
Example: IPv4, IPv6.
Transport Layer
Supports reliable (TCP) or best-effort (UDP) delivery of data.
Example: TCP, UDP.
Application Layer
Combines OSI Application, Presentation, and Session layers.
Example: HTTP, FTP.

TCP vs. UDP

What is TCP?

Transmission Control Protocol (TCP) is a communications standard that allows devices and applications to exchange data over a network. It's a fundamental protocol in the Internet Protocol (IP) suite and is a key part of the Internet's rules. TCP is responsible for ensuring that data is delivered reliably and in the correct order.

TCP (Transmission Control Protocol)

Ensures reliable data delivery through acknowledgment and retransmission.
Establishes a connection using a three-way handshake.
Suitable for applications needing accuracy (e.g., web browsing, file transfers).

What is UDP?

UDP stands for User Datagram Protocol, a communication protocol used to send data between computers on a network. UDP is often used for time-sensitive applications that require speed over reliability.

UDP

Focuses on speed and low latency.
Does not guarantee delivery, order, or error correction.
Ideal for real-time applications (e.g., video streaming, gaming).

Comparison Table

Feature	TCP	UDP
Reliability	Reliable	Unreliable
Speed	Slower	Faster
Use Cases	Web, file transfer	Streaming, gaming

Domain Name System

Domain Name System (DNS) is the "phonebook" of the internet, translating human-friendly domain names like example.com into IP addresses 192.168.4.4 . This is required because nodes in a network need to know their “address” - in other words - IP - to reach and communicate.

Key Components:
- DNS Servers: Store mappings of domain names to IPs.
- DNS Records: Types include:
  - A Record: Maps domain to IPv4.
  - AAAA Record: Maps domain to IPv6.
  - CNAME: Points to another domain.

SSH

Secure SHell is a network protocol that allows users to securely access and manage remote computers and systems over an unsecured network. It is commonly used in Linux environments.

IP Addressing and Subnetting

IP Addressing

An IP address is a unique identifier for devices on a network. It comes in two versions:

IPv4: 32-bit (e.g., 192.168.1.1).
IPv6: 128-bit (e.g., 2001:0db8::1).

Subnetting

Subnetting divides a network into smaller subnetworks, improving efficiency and security.

Subnet Mask: Determines the network and host portions of an IP.
Example:
- IP: 192.168.1.1
- Subnet Mask: 255.255.255.0
- Network: 192.168.1.0
- Hosts: 192.168.1.1 to 192.168.1.254

IP Classes

IP addresses are divided into classes to categorize networks based on their size and usage. IP classes were part of the original design of IPv4 and are used to define ranges of IP addresses. Below are the key classes:

Class A

Range: 1.0.0.0 to 126.255.255.255
Default Subnet Mask: 255.0.0.0 (or /8)
Purpose: Very large networks, typically used by organizations with a huge number of devices.
Addressing:
- The first octet represents the network.
- The remaining three octets represent the host.

Example: 10.0.0.1
Network: 10.0.0.0
Hosts: Over 16 million addresses.

Class B

Range: 128.0.0.0 to 191.255.255.255
Default Subnet Mask: 255.255.0.0 (or /16)
Purpose: Medium-sized networks, such as universities or large businesses.
Addressing:
- The first two octets represent the network.
- The last two octets represent the host.

Example: 172.16.0.1
Network: 172.16.0.0
Hosts: About 65,000 addresses.

Class C

Range: 192.0.0.0 to 223.255.255.255
Default Subnet Mask: 255.255.255.0 (or /24)
Purpose: Small networks, such as small businesses.
Addressing:
- The first three octets represent the network.
- The last octet represents the host.

Example: 192.168.1.1
Network: 192.168.1.0
Hosts: Up to 254 addresses.

Class D

Range: 224.0.0.0 to 239.255.255.255
Purpose: Reserved for multicasting (sending data to multiple hosts simultaneously).
Addressing: Does not use subnetting.

Class E

Range: 240.0.0.0 to 255.255.255.255
Purpose: Reserved for experimental purposes. Not used for general networking.

Special Ranges

Private IP Addresses:
- Reserved for internal use within a network.
- Class A: 10.0.0.0 to 10.255.255.255
- Class B: 172.16.0.0 to 172.31.255.255
- Class C: 192.168.0.0 to 192.168.255.255
Loopback Address: 127.0.0.0 to 127.255.255.255 (used for testing and diagnostics).
APIPA: 169.254.0.0 to 169.254.255.255 (used for automatic addressing when DHCP fails).

Classes are less relevant today because of the following reasons:

Classless Inter-Domain Routing (CIDR): Modern IP address allocation uses CIDR, which allows flexible subnetting regardless of class.
IPv6: The introduction of IPv6 reduces reliance on IPv4 classes.

CIDR: Efficient IP Allocation

CIDR (Classless Inter-Domain Routing) simplifies IP allocation using a prefix notation (e.g., /24).

Example: 192.168.1.0/24
- /24 means the first 24 bits define the network.
- Host range: 192.168.1.1 to 192.168.1.254.

Why It Matters in the Cloud
CIDR is widely used in defining virtual private networks (VPNs), configuring VPCs in AWS, or VNets in Azure.

This concludes our networking crash course for the cloud.

Build your first Docker Container

Cerulean Cloud — Fri, 29 Nov 2024 15:01:31 GMT

In this blog, let’s dive into building your first Docker image. We aim to create an Ubuntu-based image with Python 3 installed in it.

Why Build Your Docker Image?

Docker images are the foundation of containers. While public images on Docker Hub are convenient, building your image gives you control over the environment. This ensures consistency across your deployments and provides a deeper understanding of Docker's work.

Ready? Let’s get started!

Set Up Docker

First, make sure Docker is installed and running on your machine.

Follow this blog to install Docker on your Linux/WSL.

Verify installation by running:

docker --version

Create a Dockerfile

The Dockerfile is a simple text file with instructions to build your Docker image. Create a project directory to keep things tidy:

mkdir docker-python && cd docker-python

Now, create a file named Dockerfile:

PS: The file is Dockerfile with no extension

touch Dockerfile

Write the Dockerfile

Here’s what your Dockerfile should look like:

#Ubuntu as the base image  
FROM ubuntu:latest  

# Update and install Python 3  
RUN apt-get update && apt-get install -y python3

Let’s examine the above file

FROM ubuntu:latest: Starts with the latest Ubuntu base image.
RUN: Updates the package list and installs Python 3.

Build the Image

Time to turn the Dockerfile into a Docker image. Run:

docker build -t ubuntu-python3 .

Command Explanation:

The -t ubuntu-python3 flag tags the image with a name. In our case it’s ubuntu-python3
The . specifies the build context (current directory).

If everything goes well, Docker will download the Ubuntu base image and install Python 3.

Verify Your Image

Check if your image is built:

docker images

You should see ubuntu-python3 in the list.

Run a Container

Let’s test our image by running a container:

docker run ubuntu-python3

If everything goes well, your container should be running now.

Run the following command to confirm.

docker ps -a

Run the following command to connect to the docker container you just created

docker exec -it ubuntu-python3 /bin/bash

You should now see a “new terminal”. This is the terminal of your container.

Run python -V In the terminal, you will see the Python version.

If everything went well, you just built your first Docker image, ran it as a container and established a session with the container to run a command.

Docker - Hands-on

Cerulean Cloud — Sun, 24 Nov 2024 18:00:16 GMT

Before going ahead with installing and playing around with Docker, let’s have a quick recap of the significant differences between Virtual Machines and Containers.

Virtual Machines

Virtual Machines emulate an entire computer, including its own operating system, on top of a physical server. They run on a hypervisor, which allows multiple VMs to operate on the same physical machine.

Containers

Containers are lightweight, standalone units of software that include everything an application needs to run: code, libraries, dependencies, and configuration files. They share the host operating system's kernel, making them highly efficient and fast to start.

Docker

Docker is one of the most popular container runtimes, making it simple to create, deploy, and run applications in containers.

Here’s a step-by-step guide to install Docker on Ubuntu and test it with a container.

Ensure your system is up-to-date before installing Docker.

sudo apt update
sudo apt upgrade -y

Docker requires certain prerequisites. Install them using the following commands:

sudo apt install -y apt-transport-https ca-certificates curl software-properties-common

Add the Docker GPG key to verify its packages:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

Add the Docker repository to your system:

echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Update the APT package index and install Docker:4

sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io

Ensure Docker is installed correctly:

sudo docker --version

Run the hello-world container to test your Docker installation:

sudo docker run hello-world

To run Docker without sudo, add your user to the Docker group:

sudo usermod -aG docker $USER

Then, log out and back in for the changes to take effect.

Understanding the distinction between containers and virtual machines can help you choose the right technology. While containers are ideal for lightweight, cloud-native applications, virtual machines provide robust isolation and flexibility for diverse workloads.

What is a Virtual Machine Monitor?

Cerulean Cloud — Wed, 13 Nov 2024 12:58:54 GMT

Imagine you’re running a small business with limited hardware resources. You’d like to maximize these resources but can’t afford additional servers. Here’s where virtualization, specifically hypervisors, comes in.

Hypervisors allow you to run multiple “virtual” machines on a single physical machine, each with its own operating system and applications. By doing this, you make the most out of your hardware and reduce costs significantly. So, let’s dive into what hypervisors are, how they work, and highlight some popular options.

What is a Hypervisor?

A hypervisor, also known as a virtual machine monitor (VMM), is software that sits between the hardware and the virtual machines (VMs). It creates, runs, and manages these VMs by sharing the physical resources like CPU, memory, and storage with them. Think of a hypervisor as a “conductor” of resources, ensuring that each VM gets what it needs while sharing the same hardware infrastructure.

There are two main types of hypervisors:

Type 1 Hypervisors (Bare-Metal Hypervisors): These run directly on the host hardware, without a host operating system. They’re efficient and offer high performance since they interact directly with the physical hardware. Examples include VMware ESXi and Microsoft Hyper-V.
Type 2 Hypervisors (Hosted Hypervisors): These run on top of a host operating system like Windows or Linux. They are easier to set up but slightly less efficient because they depend on the host OS. Examples include VMware Workstation and Oracle VirtualBox.

Type 1 hypervisors are often preferred for large-scale enterprise environments, while Type 2 hypervisors are popular for smaller, testing-oriented setups.

Let’s talk about some of the about said hypervisors in detail here:

VMware ESXi

VMware ESXi is a leading Type 1 hypervisor known for its reliability and performance. Part of VMware’s larger vSphere suite, ESXi allows businesses to run multiple VMs on a single host, manage resources efficiently, and create highly available virtual environments.

Key Features:

Bare-Metal Architecture: Directly installed on server hardware for top performance.
Efficient Resource Management: Allows fine-tuning of CPU, memory, and storage allocation.
vMotion: A well-known feature in VMware’s suite, enabling live migration of VMs between hosts with no downtime.
Snapshot Support: Lets you take snapshots of your VMs, which is useful for backups and rollbacks during updates or testing.

VMware ESXi is a powerful choice for enterprises looking to virtualize their servers, but it can be more costly compared to other hypervisors.

Microsoft Hyper-V

Microsoft’s Hyper-V is another popular Type 1 hypervisor, integrated into Windows Server and available on some Windows desktop editions. Hyper-V is highly compatible with Microsoft products, making it a go-to choice for Windows-centric environments.

Key Features:

Native Windows Integration: Works seamlessly with Windows-based systems and other Microsoft products like System Center.
Dynamic Memory: Allocates memory dynamically to VMs based on their demand, helping to optimize usage.
Replica Support: Offers built-in disaster recovery by replicating VMs to other Hyper-V hosts.

Hyper-V is often chosen by businesses already invested in the Microsoft ecosystem, thanks to its integration with Windows and cost-effectiveness.

Oracle VM VirtualBox

For those seeking a free and open-source option, Oracle VirtualBox is a Type 2 hypervisor that runs on various platforms, including Windows, macOS, and Linux. Though not ideal for enterprise-scale deployments, it’s popular among developers and for testing purposes.

Key Features:

Cross-Platform Support: Runs on multiple host operating systems, making it flexible for development needs.
Snapshot Capability: Allows you to save VM states, making it easy to revert if needed.
Extensive Hardware Support: Supports a wide range of guest OS and hardware configurations, though it may lack the performance of Type 1 hypervisors.

While VirtualBox may not be as performant or secure as Type 1 hypervisors, its versatility and cost make it ideal for smaller projects and educational use.

KVM (Kernel-Based Virtual Machine)

KVM is a Linux-based open-source hypervisor, often used in data centers and for Linux virtual environments. It turns the Linux kernel into a Type 1 hypervisor, offering a high-performance solution.

Key Features:

Linux Integration: Built into the Linux kernel, so it’s a natural fit for Linux-based systems.
Open-Source: Free to use and supported by a strong community.
Scalability and Performance: Suitable for data centers and cloud providers, powering environments like Google Cloud.

KVM has gained popularity in cloud and enterprise environments because of its scalability and open-source flexibility.

To close, choosing a hypervisor depends on your specific needs. For high-performance and enterprise-grade solutions, VMware ESXi and Hyper-V are excellent choices. If you’re looking for something budget-friendly and versatile for testing, VirtualBox is a solid option, while KVM shines in Linux environments and for cloud solutions.

Hypervisors have transformed IT infrastructure, enabling companies to do more with less by running multiple virtual environments on a single piece of hardware. Whether you’re running a test lab at home or managing an enterprise environment, understanding and using hypervisors effectively is a vital skill in today’s tech landscape.

Linux Command Line Interface Cheatsheet

Cerulean Cloud — Sun, 10 Nov 2024 18:23:33 GMT

Linux is an open-source operating system known for its versatility, stability, and efficiency. It is widely used in various fields, from personal computing to servers, development, and embedded systems. Linux is built on a modular design, allowing users to modify and customize it according to their needs. One of the key reasons for its popularity is the powerful command-line interface (CLI), which provides deep control over the system through various commands.

The CLI is a text-based interface that allows users to interact with the operating system by typing commands. Unlike graphical user interfaces (GUIs), the CLI may initially seem intimidating, but it offers unmatched flexibility, precision, and speed once mastered. For system administrators, developers, and power users, the CLI is essential for managing files, running scripts, troubleshooting, and automating tasks.

For those new to Linux, becoming familiar with basic commands is the first step to mastering the CLI. The table below introduces essential commands in a logical sequence, covering navigation, file management, system monitoring, networking, and basic administrative tasks. These commands will provide a solid foundation for beginners and help them perform routine tasks and manage their system effectively.

Linux Basic Commands

*Command*	*Description*	*Example Usage*
`pwd`	Prints the current working directory.	`pwd`
`ls`	Lists files and directories in the current directory.	`ls -l`
`cd`	Changes the current directory.	`cd /home`
`mkdir`	Creates a new directory.	`mkdir my_folder`
`rmdir`	Removes an empty directory.	`rmdir my_folder`
`touch`	Creates a new empty file.	`touch myfile.txt`
`cp`	Copies files or directories.	`cp file1.txt file2.txt`
`mv`	Moves or renames files or directories.	`mv file1.txt /backup/`
`rm`	Deletes files or directories.	`rm file1.txt`
`cat`	Displays the contents of a file.	`cat file1.txt`
`head`	Displays the first lines of a file.	`head -n 5 file1.txt`
`tail`	Displays the last lines of a file.	`tail -n 5 file1.txt`
`chmod`	Changes file or directory permissions.	`chmod 755 myfile.txt`
`chown`	Changes file or directory ownership.	`chown user:group myfile.txt`
`find`	Searches for files and directories.	`find / -name "file1.txt"`
`grep`	Searches for a text pattern within files.	`grep "text" file1.txt`
`ps`	Lists current running processes.	`ps aux`
`top`	Displays real-time system processes.	`top`
`kill`	Terminates a process.	`kill 1234`
`df`	Shows disk space usage.	`df -h`
`du`	Shows disk usage of files and directories.	`du -sh *`
`free`	Displays memory usage.	`free -h`
`uname`	Shows system information.	`uname -a`
`ifconfig`	Displays network interface configuration.	`ifconfig`
`ping`	Tests network connectivity.	`ping` `google.com`
`wget`	Downloads files from the internet.	`wget` `http://example.com`
`tar`	Archives files and directories.	`tar -cvf archive.tar my_folder/`
`nano`	Opens a basic text editor.	`nano myfile.txt`
`apt`	Installs or updates packages (Debian-based systems).	`apt update && apt install vim`
`yum`	Installs or updates packages (Red Hat-based systems).	`yum install nano`
`reboot`	Reboots the system.	`reboot`
`shutdown`	Shuts down the system.	`shutdown now`

When you are done testing all the above commands, you should have gathered beginner-level Linux system administration skills.

Understanding Linux Filesystem

Cerulean Cloud — Thu, 07 Nov 2024 21:45:04 GMT

Before we begin, have a good look at the picture below.

Alright…

The root directory is at the top of the Linux Filesystem hierarchy, represented by a forward slash (/). This is the starting point for all paths in the Linux file system.

The next level down is where the system and configuration files are stored, usually in the /etc directory. This includes important files like configuration files for system services and startup scripts.

The /bin and /sbin directories contain the essential binary files for basic system operations. /bin contains executables that are required for the system to run, while /sbin contains executables that are used by the system administrator for system maintenance tasks.

When you install services on a Linux server, let's say, for instance, Apache Web Server, the config files would most probably located under /etc since that is where Linux, by default, adds the config files.

The /usr directory contains user-level binaries, libraries, and documentation for installed software. This includes applications installed through package managers like apt, as well as other utilities and libraries.

The /var directory contains variable data files, such as log files, spool files, and temporary files. You may find the log files of the server that we talked above in /var directory.

The /home directory is where user home directories are stored. Each user has a subdirectory here, with their files and settings. When you log in, you may be put in your home directory. As a rule of thumb, I begin my new projects from the /home directory for easy navigation and understanding.

Finally, the /tmp directory is used for temporary files that are deleted automatically when the system is rebooted.

I hope this gives you a good overview of the Linux file system hierarchy!

Remember there’s much more to learn as you dive deeper into Linux, but this should give you a solid foundation.

Windows Subsystem for Linux

Cerulean Cloud — Wed, 06 Nov 2024 16:03:31 GMT

What is WSL?

Windows Subsystem for Linux is a Windows feature that allows you to run a Linux environment without using a separate virtual machine. WSL leverages Hyper-V - a Microsoft hypervisor to spin up the environment without needing a separate Virtual Machine.

What is the need for WSL?

Windows has an amazing UI but a bad development environment. Linux, on the other hand, has some good development tools. Especially in Cloud and DevOps, where “automating everything” is part of the role, the ability to write scripts without switching between two machines can enhance productivity.

Installing various Windows Command Prompt modules and tools has always been a pain for me. Linux is straightforward. That is also one major reason why I prefer working on Linux terminals.

Installing Windows Subsystem for Linux

Open PowerShell as Administrator: Search for PowerShell in the Start Menu, right-click it, and select Run as Administrator.
Run the WSL Installation Command: Enter the following command in PowerShell to enable WSL and install the default Linux distribution (usually Ubuntu):
```
      wsl --install
```
Run the WSL Installation Command: Enter the following command in PowerShell to enable WSL and install the default Linux distribution (usually Ubuntu):
Launch Your Linux Distro: After the restart, open your Start Menu and search for your newly installed Linux distribution (e.g., Ubuntu). Click to launch it.
Complete Initial Setup: When you open the Linux distribution for the first time, it may take a few moments to complete its setup. You'll be prompted to create a username and password.
Verify the Installation: Once logged in, check that everything is working by running a simple command, like:
```
    uname -a
```
Check if Your System Supports WSL 2: WSL 2 requires Windows 10 Version 1903 or higher with Build 18362 or higher.

Install the Virtual Machine Platform: Open PowerShell as Administrator again and run:

  powershellCopy codedism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart

Set WSL 2 as the Default Version:

  powershellCopy codewsl --set-default-version 2

If you want to try other distributions, you can do so by opening the Microsoft Store, searching for Linux or WSL, and choosing additional distributions to install.

Cerulean Cloud Blog

Optimizing AWS Costs: NAT Gateways, S3 Storage Classes, and EBS Lifecycle Management

NAT Gateway data processing charges

S3 storage class mismatches

Unattached EBS volumes and orphaned snapshots

Building cost awareness into your operational rhythm

When to Use ECS vs EKS vs Lambda: A Decision Framework

The Four Factors That Actually Matter

Lambda: When You Want AWS to Handle Everything

ECS: When You Want Containers Without the Kubernetes Tax

EKS: When You Genuinely Need Kubernetes

The Decision Flow

The Hybrid Reality

What This Looks Like in Practice

Why I built stressllm?

The Core Bottleneck: VRAM vs. Context

Under the Hood: The Architecture

probe.py: Hardware Telemetry & Constraints

2. engine.py: The Saturation Logic

The Verdict: Mapping the Performance Cliff

Getting Started

Building a Multi Agent System to Track Real Madrid Matches Using AWS Strands and Ollama

What Is AWS Strands

What Is Ollama and Why Use It

Installing Ollama on Windows

Step 1: Download

Step 2: Install

Step 3: Verify

Pulling and Running a Model

High Level Design - Multi agent workflow

Understanding the Sequence Diagram

Step 1: Supervisor Initiates the Task

Step 2: Analyzer Calls the Football API

Step 3: Analyzer Performs Reasoning

Step 4: Supervisor Triggers Notification

Step 5: Comms Calls Telegram API

Why the System Struggled on My Laptop

Three AWS MCP Servers you should use today

What is MCP?

Connecting to AWS MCP Servers

Example Configuration Snippet

AWS MCP Servers You Should Use Today

1. AWS Documentation MCP

2. AWS Knowledge MCP

3. AWS Diagram MCP

JARK Stack for Gen AI on Kubernetes

Why Teams Are Building JARK on EKS

Experimentation + Production in One Place

Declarative Pipelines That Scale

Distributed Compute & Scalable Serving

Cloud-Native and Cost-Aware

Sample JARK stack deployment with EKS Auto Mode

Custom Node Pool Creation

Install JupyterHub, Argo and Ray

Install JupyterHub

Install Argo Workflows

Install Ray Operator and Ray Serve

Sample Model from Hugging Face & Serve Inference

Create a Ray Serve application

Build Docker image and push to ECR (inside a notebook or CI pipeline)

Deploy via Kubernetes manifest ray-serve-app.yaml:

Test inference

Build an AI powered customer service voice agent on AWS

Architecture

Setting Up Amazon Connect for Call Routing

Create an Amazon Connect Instance

Configure a Contact Flow to Invoke Lex

Creating an Amazon Lex Bot for Reservations

Define the Lex Bot and Intent

Sending Confirmation Emails with Amazon SES

Testing the Full Setup

Building a Live Stream Infrastructure on AWS

Architecture

How This Architecture Works

Key Benefits of This Architecture

Serve a HTML Resume Site via a Docker Container

Git Introduction

Introduction to Git

What is Git?

Why Use Git?

`probe.py`: Hardware Telemetry & Constraints

2. `engine.py`: The Saturation Logic

Deploy via Kubernetes manifest `ray-serve-app.yaml`:

Create the `index.html` File