Post

Designing and Enabling Runners

Designing and Enabling Runners

TL;DR Design considerations to implement runner service

Whether the organisation strategy is vendor-hosted, self-hosted, cloud-first, security-first, sovereignty-first, or a combination of everything, ultimately the end goal is a good reliable compute service to run CI/CD activities.

What does a good runner service look like?

To sustain GitHub Actions Runners as a service, the key quality attributes are security, scalability, reliability, and maintainability:

  • Security: Protection of runners and hosting environments from unauthorized access, with proper isolation between workflows and secure credential management.

  • Scalability: Efficient handling of varying workloads through dynamic resource provisioning while balancing performance and cost considerations.

  • Reliability: Consistent and predictable workflow execution with stable environments and effective recovery mechanisms to ensure dependable service.

  • Maintainability: Streamlined management of runner environments through standardization, automation, and clear operational processes.

Enabling practices

A runner service is composed of two key components:

  1. Runner - Practices for preparing, configuring, and managing the runner image and configuration
  2. Runner Hosting - Practices for managing the infrastructure that hosts the runners

And both require quality to be purposefully designed and built into them:

Security

Runner

  • Use Trusted Base Images: Start with official or well-maintained images to minimize vulnerabilities.
  • Regularly Update Images: Keep runner images updated with the latest security patches.
  • Avoid Storing Secrets: Do not store sensitive credentials in the runner image.
  • Use Short-lived Credentials: Implement identity federation for temporary credentials instead of long-lived access keys.

Runner Hosting

  • Limit Access: Restrict access to self-hosted runners to specific repositories.
  • Harden Hosts: Ensure the runner host is hardened against security threats.
  • Implement Runner Groups: Isolate runners based on security requirements and repository access needs.
  • Disable Fork Workflows: Prevent potentially malicious code from forks running on self-hosted runners.

Scalability

Runner

  • Use Lightweight Images: Reduce unnecessary dependencies to keep images small and efficient.
  • Optimize Image Size: Reduce unnecessary dependencies to improve performance.
  • Use Caching: Implement caching mechanisms to speed up builds.
  • Optimize Build Process: Use caching and multi-stage builds to improve performance.
  • Optimize Base Images: Create optimized machine images with pre-installed dependencies for faster startup.

Runner Hosting

  • Autoscaling: Configure autoscaling to dynamically adjust runner availability and demand.
  • Use Warm Pools: Maintain a pool of pre-registered runners to reduce job wait times.
  • Consider Cost-efficient Compute: Use cost-efficient compute options for non-critical workloads.
  • Size Appropriately: Match runner types and capabilities to specific workflow requirements.

Reliability

Runner

  • Use Ephemeral Runners: Consider ephemeral runners that automatically clean up after execution.
  • Version Control: Maintain version control of runner images for consistency and integrity.
  • Forward Runner Logs: Ensure runner logs are captured and forwarded to monitoring systems.

Runner Hosting

  • Monitor Runner Health: Set up monitoring tools to track runner performance.
  • Monitor Usage: Track image usage to identify inefficiencies and optimize storage.
  • Centralized Image Repository: Store images in a secure, centralized registry for easy access.
  • Implement High Availability: Design runner hosting infrastructure for redundancy across failure domains.
  • Establish Metrics: Record metrics about runner performance and job execution for observability.
  • Create Alerts: Set up notifications for abnormal conditions like pool exhaustion or consistent failures.

Maintainability

Runner

  • Update Tools: Regularly update tools for secure CI/CD executions.
  • Limit Image Variants: Avoid excessive variations; standardize images for different workloads.
  • Document Runner Setups: Maintain documentation about runner configurations and customizations.

Runner Hosting

  • Standardize Runner Setup: Use automation tools to ensure consistent runner configurations.
  • Use Labels: Organize runners using labels for better management.
  • Automate Cleanup: Automatically remove idle instances to free up resources.
  • Purge Images: Regularly remove outdated or unused images to free up storage.
  • Use Infrastructure as Code: Automate the provisioning and configuration of runner infrastructure.
  • Implement CI/CD for Runners: Apply CI/CD practices to the runner infrastructure itself.
  • Design for Self-service: Create mechanisms for teams to easily request or deploy runners they need.
This post is licensed under CC BY 4.0 by the author.