CI/CD pipelines, infrastructure as code, monitoring, and operational practices.
Scope
CI/CD
- Build automation
- Test automation
- Deployment pipelines
- Release management
- Feature flags
Infrastructure as Code
- AWS CDK / CloudFormation
- Terraform
- Environment management
- Configuration management
Observability
- Logging
- Metrics
- Tracing
- Alerting
- Dashboards
Operations
- Incident management
- On-call procedures
- Change management
- Capacity planning
Research Topics
Architecture Considerations
CI/CD Pipeline
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Code │───►│ Build │───►│ Test │───►│ Deploy │
│ Push │ │ + Lint │ │ Suite │ │ Stage │
└─────────┘ └─────────┘ └─────────┘ └────┬────┘
│
┌────────▼────────┐
│ Integration Test │
└────────┬────────┘
│
┌────────▼────────┐
│ Approval │
└────────┬────────┘
│
┌────────▼────────┐
│ Deploy Prod │
└─────────────────┘
GitHub Actions Workflow
name: Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npm run lint
- run: npm run test
- run: npm run build
deploy-staging:
needs: build
environment: staging
steps:
- run: npx cdk deploy --require-approval never
integration-test:
needs: deploy-staging
steps:
- run: npm run test:integration
deploy-prod:
needs: integration-test
environment: production
steps:
- run: npx cdk deploy --require-approval never
Infrastructure as Code (CDK)
// Example CDK stack
export class BookingStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
// DynamoDB table
const bookingsTable = new dynamodb.Table(this, 'Bookings', {
partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
pointInTimeRecovery: true
});
// Lambda function
const bookingHandler = new lambda.Function(this, 'BookingHandler', {
runtime: lambda.Runtime.NODEJS_20_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('lambda'),
environment: {
TABLE_NAME: bookingsTable.tableName
}
});
bookingsTable.grantReadWriteData(bookingHandler);
}
}
Observability
Logging Strategy
Log Levels:
├── ERROR: System errors, failures
├── WARN: Degraded performance, retries
├── INFO: Business events, transactions
└── DEBUG: Detailed troubleshooting (dev only)
Structured Logging:
{
"timestamp": "2024-06-15T10:30:00Z",
"level": "INFO",
"service": "booking-api",
"traceId": "abc123",
"event": "booking.created",
"bookingRef": "XYZ789",
"duration": 245
}
Metrics
Key Metrics:
├── Business
│ ├── bookings_created_total
│ ├── revenue_total
│ └── conversion_rate
├── Technical
│ ├── request_duration_seconds
│ ├── error_rate
│ └── concurrent_users
└── Infrastructure
├── lambda_invocations
├── dynamodb_consumed_capacity
└── api_gateway_latency
Distributed Tracing
X-Ray Trace:
Request → API Gateway → Lambda → DynamoDB
│
└── Lambda → Payment Gateway
│
└── Lambda → Notification
Deployment Strategies
Blue/Green
Production (Blue) ←── Traffic
│
Staging (Green) │ Switch
│
▼
Production (Green) ←── Traffic (after verification)
Canary
Version 1 ←── 95% Traffic
Version 2 ←── 5% Traffic → Monitor → Increase gradually
Feature Flags
// LaunchDarkly / AWS AppConfig
const showNewCheckout = await featureFlags.variation(
'new-checkout-flow',
{ userId: user.id, tier: user.loyaltyTier },
false // default
);
if (showNewCheckout) {
// New flow
} else {
// Old flow
}
Incident Management
Severity Levels
| Level | Description | Response Time | Examples |
|---|
| P1 | Critical | 15 min | Booking down, payment failures |
| P2 | High | 1 hour | Degraded performance |
| P3 | Medium | 4 hours | Non-critical feature issue |
| P4 | Low | 24 hours | Minor bug |
Incident Process
Detection → Alert → Acknowledge → Investigate → Mitigate → Resolve → Postmortem
Runbooks
Runbook: High Error Rate
1. Check error logs in CloudWatch
2. Identify affected service
3. Check recent deployments
4. Rollback if deployment-related
5. Scale up if capacity-related
6. Engage on-call engineer
Environments
| Environment | Purpose | Data |
|---|
| Development | Local dev | Mock |
| Integration | Service testing | Synthetic |
| Staging | Pre-prod validation | Anonymized prod |
| Production | Live system | Real |
CI/CD
| Tool | Use |
|---|
| GitHub Actions | Primary CI/CD |
| AWS CodePipeline | AWS deployments |
| ArgoCD | Kubernetes (if used) |
Monitoring
| Tool | Use |
|---|
| CloudWatch | AWS native |
| DataDog | APM, logs, metrics |
| PagerDuty | Alerting, on-call |
IaC
| Tool | Use |
|---|
| AWS CDK | AWS infrastructure |
| Terraform | Multi-cloud option |