Understanding AWS Lambda Concurrency: Cold Starts, Warm Starts, and Container Reuse
When discussing AWS Lambda, a common point of confusion revolves around how concurrency is managed and what exactly happens when your function scales. Let's clarify the mechanics of Lambda concurrency, focusing on container reuse, new container creation, and the concept of "cold starts" versus "warm starts."
Lambda Doesn't "Create New Lambdas" for Concurrency
It's a common misconception that each concurrent execution of a Lambda function results in the creation of a "new Lambda." This isn't accurate. Instead, Lambda manages containers – isolated runtime environments where your function code executes.
What Actually Happens:
1. Container Reuse (Warm Starts)
AWS Lambda prioritizes reusing existing containers whenever possible. If a container has just finished processing a request, it remains "warm" for a period (typically around 15 minutes). When a new request arrives for the same function, and a warm container is available, Lambda can immediately route the request to that container.
- This is known as a "warm start."
- Warm starts are significantly faster because they bypass the overhead of setting up a new execution environment.
- There's no initialization overhead for your function code in a warm start, as the environment and code are already loaded.
2. New Container Creation (Cold Starts)
Lambda only creates a new container when:
- No warm containers are available to handle an incoming request.
- The number of concurrent requests exceeds the currently available warm containers.
- An existing container has been idle for too long and was terminated by the Lambda service.
When a new container is created, this leads to a "cold start."
3. The Process Illustrated:
Consider a scenario with sequential and concurrent requests:
- Request 1 comes in: A new container is created (cold start). Your function code is downloaded, the runtime environment is initialized, and your code is loaded.
- Request 1 finishes: The container stays warm for a period (e.g., ~15 minutes), waiting for another request.
- Request 2 comes in (shortly after Request 1 finishes): If the container from Request 1 is still warm, it reuses that existing container (warm start).
BUT, if Request 2 comes in while Request 1 is still running:
- A new container is created for Request 2 (cold start). This is because the first container is busy, and Lambda needs to handle the new concurrent request.
Key Points:
- Same Lambda function = Same deployment/code: Regardless of how many containers are running, they all execute the same version of your deployed Lambda function code.
- Concurrent executions = Multiple containers running that same code: The number of concurrent executions directly correlates to the number of active containers processing requests at that moment.
- Container ≠ New Lambda function: A container is a temporary runtime environment, not a new deployment or instance of your Lambda function definition.
- Containers are temporary: They are managed by AWS and are spun up and down as needed to meet demand.
Example:
If you have 1 Lambda function deployed and receive 50 concurrent requests, the result will be up to 50 containers running the same function code simultaneously. After the traffic subsides, most of these containers will be terminated, with perhaps 1-2 kept warm for future requests.
So, concurrent execution creates new containers/instances, not new Lambda functions.
Total Concurrent Execution == Total Container Lambda?
Yes, exactly! The total number of concurrent executions at any given moment is equal to the total number of active Lambda containers. Each concurrent execution runs in its own container:
- 1 concurrent execution = 1 active container
- 50 concurrent executions = 50 active containers
- 0 concurrent executions = 0 active containers (all containers either terminated or idle)
Visual Example:
Lambda Function "ProcessOrder"
├── Container 1 (handling request A) ← 1 concurrent execution
├── Container 2 (handling request B) ← 1 concurrent execution
├── Container 3 (handling request C) ← 1 concurrent execution
└── Container 4 (warm, waiting) ← 0 concurrent executions
Total: 3 concurrent executions = 3 active containers
Important Notes:
- Active containers: These are containers currently processing requests.
- Warm containers: These are containers that are ready but not currently processing requests (contributing 0 to the concurrent execution count).
- When AWS reports "10 concurrent executions," that means 10 containers are actively running your code.
- AWS automatically manages the container lifecycle (creation/termination) to scale your function.
Your understanding is correct: the concurrent execution count directly equals the number of containers actively running at that moment.
Each New Container, Always Startup?
No, not always! Container creation involves two distinct phases:
Container Creation Process:
1. Container Startup (Always happens for a new container)
When a new container is provisioned, the following steps always occur:
- Your deployment package (code) is downloaded to the container.
- The runtime environment (e.g., Node.js, Python, Java) is initialized.
- Your function code is loaded into memory.
- The container becomes ready to receive requests.
This phase typically takes between ~100ms to 1 second, depending on the size of your deployment package and the complexity of the runtime environment.
2. Your Function Initialization (Depends on your code)
After the container startup, your function's initialization code runs. This includes:
- Import statements or module requirements.
- Initialization of global variables.
- Establishing database connections or other external service clients (if defined outside the handler).
- Any other code that runs before your main handler function is invoked.
The time taken for this phase varies greatly based on the complexity and dependencies of your function's initialization code (from 0ms to 10 seconds or more).
Cold Start = Container Startup + Function Initialization
The total time for a cold start is the sum of the container startup time and your function's initialization time.
Examples:
Simple Function (Fast startup):
exports.handler = async (event) => {
return { message: "Hello" }; // Very fast, minimal initialization
}
Complex Function (Slower startup):
const AWS = require('aws-sdk'); // Takes time to load module
const db = new AWS.DynamoDB(); // Takes time to establish connection
exports.handler = async (event) => {
// Handler execution
}
Key Points:
- Container startup: Always happens when a new container is created.
- Function initialization: Depends entirely on your code's complexity and external dependencies.
- Total cold start time: The sum of both phases.
- Warm start: Once a container exists and is reused, subsequent requests skip both the container startup and your function's initialization steps, leading to much faster execution.
So, while every new container creation involves startup overhead, the duration of that overhead (the cold start) is significantly influenced by your function's initialization code.