Infrapolicies - policy as code
What's infrapolicies---policy-as-code ?
Introduction
Infrastructure Policies (InfraPolicy) are an implementation of Policy as Code. The idea is to have fine-grained control over changes across an organization's infrastructures while simultaneously defining validation rules.
The validation process is based on the Open Policy Agent engine, which allows us to write a set of rules using the Rego language to express validation behavior.
The infrastructure changes are represented by changes in the Terraform Plan that must respect the defined rules. For example, you can ensure that tags are applied to all resources or ensure that no instance bigger than t2.large
has been used, etc.
Concepts
An InfraPolicy is represented by a Body
that contains the set of rules and a Severity
that defines the enforcement level.
Body
The rules deny or allow changes are written using the Rego language. Currently only allow
and deny
rules are checked.
See examples below:
The Input body for the OPA engine during validation will include the following fields:
{
"environment_canonical": "env-example-canonical",
"project_canonical": "proj-example-canonical",
"tfplan": {
...
}
}
This allows rules in the Body
code to include checks on environment_canonical
and project_canonical
:
package example
default allow = false
allow {
input.project_canonical == "proj-example-canonical"
input.environment_canonical == "env-example-canonical"
}
The tfplan
represents Terraform Plan in JSON format. All the rules must use dot notation to reference specific fields:
package example
default allow = false
allow {
input.tfplan.resource_changes[_].change.after["max_size"] < 5
}
allow
rule must return a direct boolean value
package example
default allow = false
allow {
input.project_canonical == "proj-example-canonical"
}
And the deny
rule must return an array of strings containing the reasons for the failure:
package example
deny[reason] {
not input.project_canonical == "proj-example-canonical"
reason := sprintf("The project canonical %q is not expected", [input.project_canonical])
}
allow
and deny
are opposite concepts, this means allow
will fail if the result of the rule is false
, and deny
will fail if the rule is true
.
The OPA documentation contains useful information on how to write policies that include deny rules, and defines best practices to follow in order to build reliable policy systems.
Severity
The client checks the Severity level to decide what to do with changes in the event an InfraPolicy has not been respected.
- Critical: the changes must be blocked
- Warning: the changes are blocked but they can be overridden with a manual operation
- Advisory: the changes can be automatically applied but a notification must be sent
Status
The status can be enabled or disabled. When it is disabled the InfraPolicy will be excluded from the validation process.
Testing
The OPA engine has a dedicated command for testing. The best practices suggested in the OPA testing documentation should be followed. Other than the opa test
command, the OPA ecosystem has a Playground that can be used for fast and simple assertions.
Validation
In order to validate your Terraform Plan against the defined policies, you can use of the following methods:
Locally
Terraform Plan can be validated locally using Cycloid cli with the validate
subcommand:
$ terraform plan -out=./plan; terraform show -json ./plan > plan.json
$ cy infrapolicy validate --plan-path ./plan.json
ADVISORIES CRITICALS WARNINGS
1 0 0
In Cycloid pipeline
In a pipeline context, a Concourse Resource is available and can be easily plugged right after a terraform plan
step and just before a notification mechanism.
Example of output
Validation with advisories, the job is green and display advisories result as metadata:
Validation with criticals and/or warnings, the job fails and display result as metadata:
Setup
-
Create a new policy from the
Security / InfraPolicies
page by clicking on theAdd InfraPolicy
button. -
Fill in the mandatory fields and click on
Save
to add the resource. Enable the policy to include it in the next validation run.
-
Add the pipeline and a validation resource as described before.
-
Try to submit some unexpected infrastructure changes that go against your policies (e.g. try to double the size of the instance) to see InfraPolicy in action.
Code examples
Examples of Rego code for some common use cases. Other examples can be found in the fugue/regula or open-policy-agent/conftest repositories.
Tags required
Allow only resources with defined tags.
package example
deny[reason] {
resource := input.tfplan.planned_values.root_module.resources[_]
not resource.values.tags
reason = sprintf("tags required for the resource %q", [resource.address])
}
Instance type
Allow only specific types of instances.
package example
allowed_instance_types = {
"t2.medium",
"t2.large",
"t2.xlarge"
}
deny[reason] {
itype := input.tfplan.resource_changes[_].change.after["instance_type"]
not allowed_instance_types[itype]
reason = sprintf("instance_type %q is not accepted, use one of the allowed: %v", [itype, allowed_instance_types])
}
Instance quantity
Define the maximum amount of running instances that the Auto Scaling Group can spawn.
package example
default allow = false
allow {
input.tfplan.resource_changes[_].change.after["max_size"] < 5
}
Only a specific region
Allow only a certain set of cloud provider regions.
package example
default allow = false
allow {
provider := input.tfplan.configuration.provider_config.aws
provider.expressions.region.constant_value == "eu-west-1"
}
Security group required
Make the security group mandatory.
package example
deny[reason] {
r := input.tfplan.resource_changes[_]
r.change.after_unknown.vpc_security_group_ids == true
reason := "A security group is required"
}
Specific AMI
Allow only a specific set of AWS AMI.
package example
import input.tfplan as tfplan
allowed_amis = {
"ami-abc",
"ami-xyz"
}
deny[reason] {
ami := tfplan.resource_changes[_].change.after.image_id
not allowed_amis[ami]
reason := sprintf("AWS AMI %q is not accepted, use one of the allowed: %v", [ami, allowed_amis])
}
RDS Backup set
Allow only RDS databases with a backup set in the production environment.
package example
is_production {
input.environment_canonical == "prod"
}
deny[reason] {
is_production
resource := input.tfplan.planned_values.root_module.resources[_]
resource.type == "aws_db_instance"
not resource.values.backup_retention_period > 0
reason = sprintf("Backup is required on production for %q", [resource.address])
}