Agent
Some configurations benefit from running YAML Manifests continuously. For example, dotfiles might be left unmanaged (allowing local modifications), while Docker should always be up to date.
The CCM Agent runs manifests continuously, loading them from local files, Object Storage, or HTTP(S) URLs, with Key-Value data overlaid.
The Agent also supports Registration, a service discovery system where resources that reach a stable state publish their details to NATS. Other nodes can query the registry in templates to build configurations dynamically.
Run modes
The agent supports two modes of operation that combine to be efficient and fast-reacting:
- Full manifest apply: Manages the complete state of every resource
- Health check mode: Runs only Monitoring checks, which can trigger a full manifest apply as remediation
By enabling both modes, you can run health checks very frequently (even at 10- or 20-second intervals) while keeping full Configuration Management runs less frequent (every few hours).
Enabling both modes is optional but recommended. Adding health checks to key resources is also recommended.
Supported manifest and data sources
Manifest sources
Manifests can be loaded from:
- Local file:
/path/to/manifest.yaml - Object Storage:
obj://bucket/key.tar.gz - HTTP(S):
https://example.com/manifest.tar.gz(supports Basic Auth via URL credentials)
Remote sources (Object Storage and HTTP) must be .tar.gz archives containing a manifest.yaml file, templates and file sources.
External data sources
For Hiera data resolution, the agent supports:
- Local file:
file:///path/to/data.yaml - Key-Value store:
kv://bucket/key - HTTP(S):
https://example.com/data.yaml
Logical flow
The agent continuously runs and manages manifests as follows:
- At startup, the agent fetches data and gathers facts
- Starts a worker for each manifest source
- Each worker starts watchers to download and manage the manifest (polling every 30 seconds for remote sources)
- Triggers workers at the configured interval for a full apply
- Each run updates facts (minimum 2-minute interval) and data
- Applies each manifest serially
- Triggers workers at the configured health check interval
- Health check runs do not update facts or data
- Runs health checks for each manifest serially
- If any health checks are critical (not warning), the agent triggers a full apply for that worker
In the background, object stores and HTTP sources are watched for changes. Updates trigger immediate apply runs with exponential backoff retry on failures.
Prometheus metrics
When monitor_port is configured, the agent exposes Prometheus metrics on /metrics. These metrics can be used to monitor agent health, track resource states and events, and observe health check statuses.
Agent metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
choria_ccm_agent_apply_duration_seconds | Summary | manifest | Time taken to apply manifests |
choria_ccm_agent_healthcheck_duration_seconds | Summary | manifests | Time taken for health check runs |
choria_ccm_agent_healthcheck_remediations_count | Counter | manifest | Health checks that triggered remediation |
choria_ccm_agent_data_resolve_duration_seconds | Summary | - | Time taken to resolve external data |
choria_ccm_agent_data_resolve_error_count | Counter | url | Data resolution failures |
choria_ccm_agent_facts_resolve_duration_seconds | Summary | - | Time taken to resolve facts |
choria_ccm_agent_facts_resolve_error_count | Counter | - | Facts resolution failures |
choria_ccm_agent_manifest_fetch_count | Counter | manifest | Remote manifest fetches |
choria_ccm_agent_manifest_fetch_error_count | Counter | manifest | Remote manifest fetch failures |
Resource metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
choria_ccm_manifest_apply_duration_seconds | Summary | source | Time taken to apply an entire manifest |
choria_ccm_resource_apply_duration_seconds | Summary | type, provider, name | Time taken to apply a resource |
choria_ccm_resource_state_total_count | Counter | type, name | Total resources processed |
choria_ccm_resource_state_stable_count | Counter | type, name | Resources in stable state |
choria_ccm_resource_state_changed_count | Counter | type, name | Resources that changed |
choria_ccm_resource_state_refreshed_count | Counter | type, name | Resources that were refreshed |
choria_ccm_resource_state_failed_count | Counter | type, name | Resources that failed |
choria_ccm_resource_state_error_count | Counter | type, name | Resources with errors |
choria_ccm_resource_state_skipped_count | Counter | type, name | Resources that were skipped |
choria_ccm_resource_state_noop_count | Counter | type, name | Resources in noop mode |
Health check metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
choria_ccm_healthcheck_duration_seconds | Summary | type, name, check | Time taken for health checks |
choria_ccm_healthcheck_status_count | Counter | type, name, status, check | Health check results by status |
Facts metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
choria_ccm_facts_gather_duration_seconds | Summary | - | Time taken to gather system facts |
Configuration
The agent is included in the ccm binary. To use it, create a configuration file and enable the systemd service.
The configuration file is located at /etc/choria/ccm/config.yaml:
After configuring, start the service: