TCH
Published on

IAM Cleanup: How I Built an Automated IAM Governance Tool on AWS

Authors

IAM Cleanup: How I Built an Automated IAM Governance Tool on AWS

In a previous article, I argued that unused IAM credentials are not harmless — they are an invitation. Every dormant role or policy is a persistent attack surface, decaying institutional knowledge, and a permissions drift waiting to happen.

This article is the follow-up: instead of just describing the problem, I built something to fix it.


The Problem, Restated Briefly

Over time, AWS accounts accumulate IAM roles and managed policies that nobody uses anymore: migration service accounts, abandoned CI/CD pipelines, forgotten cross-account roles, temporary admin escalations that were never cleaned up. These credentials:

  • Remain valid indefinitely unless explicitly deleted
  • Retain permissions that may have been appropriate months ago but are now excessive
  • Carry no warning signal — silence is indistinguishable from legitimate inactivity
  • Eventually belong to nobody, making manual cleanup politically difficult

Manual audits happen too rarely and rely on tribal knowledge that has long since left the organization.


The Solution: IAM Cleanup

IAM Cleanup is a fully serverless, Terraform-deployed tool that automates the full lifecycle of unused IAM credential governance on AWS.

Architecture

EventBridge (daily)Lambda Scanner  →  tags unused resources
EventBridge (weekly)Lambda Cleaner  →  sends SNS notification + deletes

Two Lambda functions handle distinct responsibilities:

  • Scanner — runs daily, queries AWS IAM for roles and customer-managed policies that have not been used within a configurable inactivity window, and applies cleanup tags.
  • Cleaner — runs weekly, identifies resources that have been tagged for longer than a configurable retention period, sends an email warning via SNS, and deletes them (or simulates deletion in dry-run mode).

All configuration is stored in SSM Parameter Store, so no secrets or tunables are hardcoded in Lambda code.


The Full Cleanup Lifecycle

Here is what happens to an unused role from detection to deletion:

Day 0   : role / policy unused for90 days
          → tag applied: iam-cleanup:status=unused
                         iam-cleanup:first-detected=<date>

Day 030: tag is present, resource is preserved
if the resource is used again: tag is automatically removed

Day 23  : 7 days before scheduled deletion
          → warning email sent via SNS

Day 30  : tag has been present for 30 days
          → confirmation email sent + resource deleted (unless DRY_RUN=true)

This grace period is a deliberate design choice. Hard deleting resources immediately upon detection would be dangerous. The 30-day window gives teams time to react, reattach still-needed roles to a schedule, or simply exercise the resource to signal it is still in use — all without manual intervention.


What Gets Tagged (and Deleted)

The Scanner detects five categories of IAM roles:

Role typeDetection heuristic
servicePrefix AWSServiceRoleFor* or a service principal in trust
adminAdministratorAccess policy attached, or *:* inline policy
organizationTrust policy references o-* or OrganizationAccountAccessRole
cross_accountTrust policy references an external account ID
federatedTrust policy references a SAML provider, OIDC provider, or Web Identity
customEverything else

The Scanner also supports permission-level analysis: for each role, it inspects which individual IAM actions within attached policies have never been used (via IAM Access Advisor data), and can flag or prune those unused actions independently of the role itself.


Tags Applied to Resources

Every flagged resource receives exactly two tags:

TagExample valuePurpose
iam-cleanup:statusunusedMarks the resource as a deletion candidate
iam-cleanup:first-detected2026-03-10Anchor date for the retention window countdown

The Cleaner derives the scheduled deletion date at runtime by adding retention_days to first-detected. Keeping the tag surface minimal means fewer tag keys to manage and fewer IAM TagRole/UntagRole permissions required.


Configuration

All tunable parameters are defined in terraform/variables.tf and injected into Lambda via SSM at deploy time:

VariableDefaultDescription
inactivity_days20Days of inactivity before tagging
retention_days10Days after tagging before deletion
notification_days_before2Warning email lead time before deletion
dry_runfalseAnalyze and notify without actually deleting
notification_emailSNS subscription address for alerts
role_typesallWhich role categories to include in the scan
scan_policiestrueAlso scan customer-managed IAM policies
scan_permissionstrueAnalyze unused individual actions within policies (ACTION_LEVEL)
delete_unused_permissionstruePrune unused actions from policies (creates a new policy version)
excluded_roles[]List of role names to skip entirely
excluded_policy_arns[]List of policy ARNs to skip entirely

dry_run: false is the default — the tool deletes by default. If you want to observe behaviour before committing, set it to true before deploying.

Built-in Exclusions — AWS Service-Linked Roles

The Scanner automatically excludes AWS-managed roles that should never be touched, regardless of your configuration:

Prefix / PatternDescription
AWSServiceRoleFor*Service-linked roles managed by AWS
Path /aws-service-role/*Any role under the service-linked path
OrganizationAccountAccessRoleDefault AWS Organizations role
AWSControlTower*Roles managed by AWS Control Tower
aws-reserved* / AWSReserved*AWS reserved roles
AWSReservedSSO_*AWS IAM Identity Center (SSO) roles

These roles are created and managed by AWS itself — deleting them would break the services that depend on them, and AWS would just recreate them anyway. The tool also excludes its own Lambda execution role automatically, so it can never flag itself for deletion.

Manual Exclusions

Beyond the built-in list, some of your own roles should never be touched: your CI/CD pipeline, break-glass admin roles, or anything flagged as business-critical. Add them to the exclusion lists and the Scanner will skip them entirely:

# terraform/terraform.tfvars
excluded_roles = [
  "my-cicd-deployment-role",
  "break-glass-admin",
]

excluded_policy_arns = [
  "arn:aws:iam::123456789012:policy/SharedServicePolicy",
]

Deploying It

Prerequisites: Terraform >= 1.5, AWS CLI configured (aws configure), Python 3.11+, and optionally an S3 bucket for the Terraform backend.

cd terraform

# 1. Initialize Terraform
terraform init

# 2. Review the changes
terraform plan

# 3. Deploy
terraform apply

To disable dry-run and allow actual deletion, set dry_run = false in terraform/terraform.tfvars and re-run terraform apply.

To tear down the infrastructure:

terraform destroy

Terraform provisions the two Lambda functions, their IAM execution roles (with least-privilege policies), EventBridge schedules, the SNS topic, and the SSM parameters — nothing manual to configure once variables are set.


Design Decisions Worth Highlighting

Why Terraform and not CDK or SAM? Terraform is infrastructure-agnostic and already the standard in most enterprise environments. The goal of this project is to be deployable into real accounts with minimal friction.

Why SSM for configuration? Hardcoding inactivity thresholds or account-specific values into Lambda code is a maintenance trap. SSM Parameters are versioned, auditable, and can be updated without redeploying the function.

Why a two-Lambda architecture? Separating scanning from deletion respects the principle of least privilege at the compute level. The Scanner only needs read and tag access; the Cleaner needs delete access. They run on different schedules and can be disabled independently.

Why tags instead of a separate database? Tags live on the resource itself. They are visible in the Console, queryable via AWS Config and Resource Groups, and they survive Lambda failures or redeployments without any state synchronization problem.


What This Does Not Cover

A few honest limitations:

  • IAM users are not in scope (access key rotation is a different problem with different tooling).
  • AWS-managed policies are excluded from deletion — the tool only touches customer-managed policies.
  • Resource-based policies (S3 bucket policies, SQS queue policies, Lambda function policies, etc.) are not scanned for now. The tool focuses on identity-based IAM roles and customer-managed policies only.
  • Roles used by running EC2 instances or ECS tasks will still be flagged if IAM Access Advisor records no recent calls. Always verify before disabling dry_run in production.
  • Cross-account consumers may not appear in Access Advisor data from the role's home account. Cross-account roles should be reviewed with extra care.

Closing Thoughts

Credential hygiene is not glamorous work, but it is foundational. Most real-world IAM compromises do not exploit zero-days — they exploit credentials that should have been revoked months ago but weren't, because no process existed to catch them.

IAM Cleanup is that process, made automatic.

The full source code, Terraform modules, and Lambda handlers are available on GitHub. Contributions and feedback are welcome.


Related reading: