In this post, we describe how the Zip security team leveraged the Python CDK for Terraform (CDKTF) to enforce security guardrails for our AWS infrastructure. We provide example configurations and code to help other security teams build their own secure AWS infrastructure-as-code.
Like any early stage startup, Zip’s AWS infrastructure management primarily involved click-ops or making changes through the web console/CLI. This was non-ideal from a security perspective, but gave developers the ability to easily and quickly build products.
As the company grew, the number of infrastructure engineers with AWS administrative rights increased. For most, these permissions were excessive. Click-ops also made it difficult to require changes to be reviewed by a peer, and there was limited visibility into the impact of changes.
As the security team, we wanted to up-level our infrastructure by limiting the number of AWS admins, enforcing reviews for all changes, improving auditability, providing guardrails, while simultaneously enabling developers to confidently build infrastructure.
We evaluated multiple solutions and decided to migrate our infrastructure to infrastructure-as-code (IaC) using Terraform CDK (Terraform Cloud Development Kit) with Python. Terraform CDK acts as an overlay to Terraform that can be managed with TypeScript, Python, Java, C#, and Go. The benefits from using a dynamic programming language allow for more flexibility in how we deploy our infrastructure. While there are many resources for security best practices and tools with Terraform with HCL, there are few for Terraform CDK.
In this blog post, we’ll show how we leveraged Terraform CDK to provide a set of powerful security tools and guardrails, leading to a 95% reduction in AWS admins and 100% removal of click-ops for critical production resources. Our goal will be to demonstrate how we implemented the following:
Stay tuned for our next blog post for more details on how we conducted our evaluation, set up the CI/CD, imported the state, and codified our resources.
Our Terraform folder is structured to separate all of our environments into their own stacks. We also created a set of secure templates for resources, which are shared across our infrastructure. As we show in the next section, these templates are used by developers to instantiate resources with our secure defaults.
/secure_templates
├── s3.py
├── iam.py
├── rds.py
└── ...
/production
├── resources
│ ├── rds.py
│ ├── iam.py
│ └── s3.py
└── main.py
/dev
├── resources
│ ├── iam.py
│ └── s3.py
└── main.py
...
Each stack has a dedicated IAM role provisioned via OIDC through GitHub Actions. To isolate from our existing infrastructure, we chose to have them operate on their own runners due to the sensitivity of the permissions they use.
Instead of allowing developers to use the base Terraform CDK libraries, the security team built custom Python classes to implement Terraform resources. This allowed us to define secure configurations and prevent developers from making dangerously configured resources, with the flexibility of Python constructs.
For example, in our secure template for RDS, we do not allow databases to be publicly accessible unless they are in an allowlist.
# /secure_templates/rds.py
from cdktf_cdktf_provider_aws.db_instance import DbInstance
ALLOWED_PUBLICLY_ACCESSIBLE_DB_NAMES = ["public_db_1", "public_db_2"]
class DatabaseInstance(DbInstance):
"""AWS DB instance."""
def __init__(
self,
stack: TerraformStack,
db_name: str,
tags: dict[str, str],
multi_az: bool = False,
storage_encrypted: bool = True,
publicly_accessible: bool = False,
**kwargs,
):
"""
Constructs a new DB instance.
param stack: CDKTF stack.
param db_name: Name of the DB instance.
param tags: Tags to apply to the DB instance.
param multi_az: Whether to create a multi-AZ DB instance.
param storage_encrypted: Whether to encrypt the storage.
param publicly_accessible: Whether the DB instance is publicly accessible. Default false.
"""
if (
publicly_accessible is True
and db_name not in ALLOWED_PUBLICLY_ACCESSIBLE_DB_NAMES
):
raise SecurityException(
"This database cannot be public. Please reach out to security@ for more details."
)
super().__init__(
stack,
id_=db_name,
multi_az=multi_az,
storage_encrypted=storage_encrypted,
publicly_accessible=publicly_accessible,
tags=tags,
**kwargs,
)
Developers can use this template class to instantiate their databases in a resource file like this:
# /production/resources/databases.py
from secure_templates.rds import DatabaseInstance
def generate_databases(stack: TerraformStack, tags: dict[str, str], other_providers: dict[str, AwsProvider]):
DatabaseInstance(
stack,
db_name="zip-db",
tags=tags,
)
...
In our main.py
, where we create our stack, we can now import the generation of databases. This allows for the secure use of a resource without the end developer needing to know the secure by default configurations we already define.
# /production/main.py
from cdktf import App, S3Backend, TerraformStack
from cdktf_cdktf_provider_aws.provider import AwsProvider, AwsProviderDefaultTags
from constructs import Construct
from resources.databases import generate_databases
#from ...resources... import ...functions...
class ProdStack(TerraformStack):
"""Stack for Prod AWS Account."""
def __init__(self, scope: Construct, id: str):
super().__init__(scope, id)
self.tags = {
"env": f"{ENVIRONMENT}",
"team": f"{TEAM}",
"terraform-managed": "true",
"zip:cost-allocation": "production",
}
self.load_resources()
def load_resources(self):
generate_databases(self, self.tags, self.other_providers)
generate_s3(...)
generate_iam(...)
To protect our secure templates, we include an entry in CODEOWNERS to set the security team as a required reviewer for any pull requests with changes to the secure_templates
folder.
# .github/CODEOWNERS
secure_templates/* @ziphq/security
production/main.py @ziphq/security
As we migrated our infrastructure into IaC, we wanted to restrict our engineering team from making changes to AWS via the AWS console and CLI.
For all of our resources defined in Terraform, we added a terraform-managed
tag:
# production/main.py
from resources.databases import GenerateDatabases
class ProdStack(TerraformStack):
"""Stack for AWS Terraform Account."""
def __init__(self, scope: Construct, id: str):
self.tags = {
"terraform-managed": "true",
...
}
self.load_resources()
...
def load_resources(self):
generate_databases(self, self.tags, self.other_providers)
# production/resources/databases.py
def generate_databases(stack, tags, other_providers):
DatabaseInstance(
stack,
db_name="zip-db",
tags=tags,
)
Using an SCP, we denied non-read access to these terraform-managed resources to all principals, with the exception of a few emergency on-call engineers and the Terraform runner. With this SCP in place, we now guarantee that all changes to resources must undergo our Terraform change process, which includes mandatory code review and CI checks.
{
"Effect": "Deny",
"NotAction": [
"tags:List*",
"iam:Get*",
"iam:List*",
"ec2:Describe*",
... // all other read only permissions
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:ResourceTag/terraform-managed": "true"
},
"StringNotLike": {
"aws:PrincipalARN": [
"arn:aws:sts::*:assumed-role/*/oncall@ziphq.com*",
"arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_admin_*"
"arn:aws:iam::*:role/terraform-runner"
]
}
}
}
Throughout these steps of deploying resources, securing, and preventing misconfigured infrastructure from being created, we wanted to ensure we had visibility at all layers. In order to achieve this we identified the following as good signals to use for our telemetry:
terraform-managed
resource tag if the SCP was triggeredUsing our templates, we ensured all of our critical production resources had correct configurations. During the migration process, the team identified and fixed a few minor configurations in resources as we created our secure templating pathway. This included updating security groups to reflect the correct inbound and outbound rules, while also making exceptions for specific use cases. In addition, we also staged any resources that were no longer in use for removal to reduce excess attack surface.
Through our SCP to enforce Terraform use, we achieved 100% code review for IAM, S3, RDS, Security Groupsch
anges, and reduced the number of AWS admins by 95%.
Special thanks to the team at Zip who helped with these achievements:
Stay tuned for our next blog post for more details on our evaluation, how we set up the CI/CD, how we imported the state, and codified our infrastructure.