Designing and Enabling Runners

Posted May 1, 2025

By Kitty Chiu

3 min read

TL;DR Design considerations to implement runner service

Whether the organisation strategy is vendor-hosted, self-hosted, cloud-first, security-first, sovereignty-first, or a combination of everything, ultimately the end goal is a good reliable compute service to run CI/CD activities.

What does a good runner service look like?

To sustain GitHub Actions Runners as a service, the key quality attributes are security, scalability, reliability, and maintainability:

Security: Protection of runners and hosting environments from unauthorized access, with proper isolation between workflows and secure credential management.
Scalability: Efficient handling of varying workloads through dynamic resource provisioning while balancing performance and cost considerations.
Reliability: Consistent and predictable workflow execution with stable environments and effective recovery mechanisms to ensure dependable service.
Maintainability: Streamlined management of runner environments through standardization, automation, and clear operational processes.

Enabling practices

A runner service is composed of two key components:

Runner - Practices for preparing, configuring, and managing the runner image and configuration
Runner Hosting - Practices for managing the infrastructure that hosts the runners

And both require quality to be purposefully designed and built into them:

Security

Runner

Use Trusted Base Images: Start with official or well-maintained images to minimize vulnerabilities.
Regularly Update Images: Keep runner images updated with the latest security patches.
Avoid Storing Secrets: Do not store sensitive credentials in the runner image.
Use Short-lived Credentials: Implement identity federation for temporary credentials instead of long-lived access keys.

Runner Hosting

Limit Access: Restrict access to self-hosted runners to specific repositories.
Harden Hosts: Ensure the runner host is hardened against security threats.
Implement Runner Groups: Isolate runners based on security requirements and repository access needs.
Disable Fork Workflows: Prevent potentially malicious code from forks running on self-hosted runners.

Scalability

Runner

Use Lightweight Images: Reduce unnecessary dependencies to keep images small and efficient.
Optimize Image Size: Reduce unnecessary dependencies to improve performance.
Use Caching: Implement caching mechanisms to speed up builds.
Optimize Build Process: Use caching and multi-stage builds to improve performance.
Optimize Base Images: Create optimized machine images with pre-installed dependencies for faster startup.

Runner Hosting

Autoscaling: Configure autoscaling to dynamically adjust runner availability and demand.
Use Warm Pools: Maintain a pool of pre-registered runners to reduce job wait times.
Consider Cost-efficient Compute: Use cost-efficient compute options for non-critical workloads.
Size Appropriately: Match runner types and capabilities to specific workflow requirements.

Reliability

Runner

Use Ephemeral Runners: Consider ephemeral runners that automatically clean up after execution.
Version Control: Maintain version control of runner images for consistency and integrity.
Forward Runner Logs: Ensure runner logs are captured and forwarded to monitoring systems.

Runner Hosting

Monitor Runner Health: Set up monitoring tools to track runner performance.
Monitor Usage: Track image usage to identify inefficiencies and optimize storage.
Centralized Image Repository: Store images in a secure, centralized registry for easy access.
Implement High Availability: Design runner hosting infrastructure for redundancy across failure domains.
Establish Metrics: Record metrics about runner performance and job execution for observability.
Create Alerts: Set up notifications for abnormal conditions like pool exhaustion or consistent failures.

Maintainability

Runner

Update Tools: Regularly update tools for secure CI/CD executions.
Limit Image Variants: Avoid excessive variations; standardize images for different workloads.
Document Runner Setups: Maintain documentation about runner configurations and customizations.

Runner Hosting

Standardize Runner Setup: Use automation tools to ensure consistent runner configurations.
Use Labels: Organize runners using labels for better management.
Automate Cleanup: Automatically remove idle instances to free up resources.
Purge Images: Regularly remove outdated or unused images to free up storage.
Use Infrastructure as Code: Automate the provisioning and configuration of runner infrastructure.
Implement CI/CD for Runners: Apply CI/CD practices to the runner infrastructure itself.
Design for Self-service: Create mechanisms for teams to easily request or deploy runners they need.

L300

This post is licensed under CC BY 4.0 by the author.

What does a good runner service look like?

Enabling practices

Security

Runner

Runner Hosting

Scalability

Runner

Runner Hosting

Reliability

Runner

Runner Hosting

Maintainability

Runner

Runner Hosting

Trending Tags