Designing and Enabling Runners
TL;DR Design considerations to implement runner service
Whether the organisation strategy is vendor-hosted, self-hosted, cloud-first, security-first, sovereignty-first, or a combination of everything, ultimately the end goal is a good reliable compute service to run CI/CD activities.
What does a good runner service look like?
To sustain GitHub Actions Runners as a service, the key quality attributes are security, scalability, reliability, and maintainability:
Security: Protection of runners and hosting environments from unauthorized access, with proper isolation between workflows and secure credential management.
Scalability: Efficient handling of varying workloads through dynamic resource provisioning while balancing performance and cost considerations.
Reliability: Consistent and predictable workflow execution with stable environments and effective recovery mechanisms to ensure dependable service.
Maintainability: Streamlined management of runner environments through standardization, automation, and clear operational processes.
Enabling practices
A runner service is composed of two key components:
- Runner - Practices for preparing, configuring, and managing the runner image and configuration
- Runner Hosting - Practices for managing the infrastructure that hosts the runners
And both require quality to be purposefully designed and built into them:
Security
Runner
- Use Trusted Base Images: Start with official or well-maintained images to minimize vulnerabilities.
- Regularly Update Images: Keep runner images updated with the latest security patches.
- Avoid Storing Secrets: Do not store sensitive credentials in the runner image.
- Use Short-lived Credentials: Implement identity federation for temporary credentials instead of long-lived access keys.
Runner Hosting
- Limit Access: Restrict access to self-hosted runners to specific repositories.
- Harden Hosts: Ensure the runner host is hardened against security threats.
- Implement Runner Groups: Isolate runners based on security requirements and repository access needs.
- Disable Fork Workflows: Prevent potentially malicious code from forks running on self-hosted runners.
Scalability
Runner
- Use Lightweight Images: Reduce unnecessary dependencies to keep images small and efficient.
- Optimize Image Size: Reduce unnecessary dependencies to improve performance.
- Use Caching: Implement caching mechanisms to speed up builds.
- Optimize Build Process: Use caching and multi-stage builds to improve performance.
- Optimize Base Images: Create optimized machine images with pre-installed dependencies for faster startup.
Runner Hosting
- Autoscaling: Configure autoscaling to dynamically adjust runner availability and demand.
- Use Warm Pools: Maintain a pool of pre-registered runners to reduce job wait times.
- Consider Cost-efficient Compute: Use cost-efficient compute options for non-critical workloads.
- Size Appropriately: Match runner types and capabilities to specific workflow requirements.
Reliability
Runner
- Use Ephemeral Runners: Consider ephemeral runners that automatically clean up after execution.
- Version Control: Maintain version control of runner images for consistency and integrity.
- Forward Runner Logs: Ensure runner logs are captured and forwarded to monitoring systems.
Runner Hosting
- Monitor Runner Health: Set up monitoring tools to track runner performance.
- Monitor Usage: Track image usage to identify inefficiencies and optimize storage.
- Centralized Image Repository: Store images in a secure, centralized registry for easy access.
- Implement High Availability: Design runner hosting infrastructure for redundancy across failure domains.
- Establish Metrics: Record metrics about runner performance and job execution for observability.
- Create Alerts: Set up notifications for abnormal conditions like pool exhaustion or consistent failures.
Maintainability
Runner
- Update Tools: Regularly update tools for secure CI/CD executions.
- Limit Image Variants: Avoid excessive variations; standardize images for different workloads.
- Document Runner Setups: Maintain documentation about runner configurations and customizations.
Runner Hosting
- Standardize Runner Setup: Use automation tools to ensure consistent runner configurations.
- Use Labels: Organize runners using labels for better management.
- Automate Cleanup: Automatically remove idle instances to free up resources.
- Purge Images: Regularly remove outdated or unused images to free up storage.
- Use Infrastructure as Code: Automate the provisioning and configuration of runner infrastructure.
- Implement CI/CD for Runners: Apply CI/CD practices to the runner infrastructure itself.
- Design for Self-service: Create mechanisms for teams to easily request or deploy runners they need.