Configure instance concurrency to improve resource utilization - Function Compute

Instance concurrency specifies the maximum number of concurrent requests that each function instance can process at a time. You can use the instance concurrency feature of Function Compute to manage resource usage in an efficient manner during traffic peaks and mitigate the impact of cold starts. This helps improve performance and reduce costs.

Background information

Resource usage of Function Compute is billed based on the execution duration and specifications of function instances. Therefore, the longer the execution duration of a function instance, the higher the resource usage fees.

For example, three requests need to be processed at a time. The processing of each request lasts 10 seconds. The following items describe the execution duration for different concurrency settings:

If instance concurrency is set to 1, each instance can process only one request at a time. Function Compute creates three instances to process the three requests. The total execution duration is 30 seconds.
If instance concurrency is set to 10 and this concurrency does not exceed the queries per second (QPS) limit for an instance, each instance can process 10 requests at a time. Function Compute creates only one instance to process the three requests. The total execution duration is 10 seconds.

Note

If the concurrency of a function instance is set to 1, each instance can process only one request at a time. If you set instance concurrency to a value greater than 1, Function Compute creates a new instance only when the number of requests that are concurrently processed by existing instances exceeds the specified value.
The instance concurrency is usually configured together with the instance specifications. This facilitates you to optimize the function performance and reduce costs. You can use the function performance profiling feature to obtain the optimal instance specifications and concurrency.

The following figure shows the difference in request execution when instance concurrency is set to different values.

Benefits

Reduces the execution duration and costs.
For example, for functions that involve a large number of I/O operations, you can use an instance to concurrently process multiple requests. This reduces the number of instances that are used to process requests and the total execution duration of requests.
Shares the status among requests.
Multiple requests can share the connection pool of a database in one instance to minimize the connections between requests and the database.
Reduces the frequency of cold starts.
One instance can process multiple requests, which reduces the number of new instances and the frequency of cold starts.
Reduces the number of IP addresses used in a virtual private cloud (VPC).
For a fixed number of requests to be processed, the number of required instances is reduced if each instance can process multiple requests. This reduces the number of IP addresses used in the VPC.
Important
Make sure that the vSwitch associated with your VPC has at least two available IP addresses. Otherwise, services may be unavailable, which causes request failures.

Scenarios

If it takes a long time for a function to obtain responses from downstream services, we recommend that you use a single instance to concurrently process multiple requests. In most cases, waiting for responses does not consume resources. Concurrent processing of multiple requests by one instance saves costs and improves the responsiveness and throughput of applications.

Limits

Item	Limit
The runtime environment supported by the layer.	Custom runtimes Custom images
Valid value of instance concurrency	1 to 200
Function execution logs provided in the X-Fc-Log-Result response header	Not supported if instance concurrency is set to a value greater than 1

Procedure

You can specify the instance concurrency of a function when you create the function.

dg-instance-concurrency

After you create the function, go to the function details page. On the Function Details tab, click the Configurations tab and then click Runtime in the left-side pane. In the Runtime section, click Modify. In the Runtime panel, you can modify the Instance Concurrency parameter.

Provisioned instances can also concurrently process multiple requests. For more information, see Configure provisioned instances.

Impacts

This section describes the differences between the scenario where an instance can process only one request at a time (Instance Concurrency = 1) and the scenario where an instance can process multiple requests at a time (Instance Concurrency > 1).

Billing

The execution duration and cost of a function instance vary based on the instance concurrency. For more information, see Billing overview.

Instance Concurrency = 1
An instance can process only one request at a time. The billing duration starts when the first request starts to be processed and ends when the last request is processed.
Instance Concurrency > 1
An instance can process multiple requests at a time. The actual running duration of the instance is used to measure the function execution duration. The billing duration starts when the first request starts to be processed and ends when the last request is processed.

Concurrency throttling

By default, Function Compute supports up to 100 on-demand instances in a region. The maximum number of requests that can be concurrently processed in a region is calculated by using the following formula: 100 × Instance concurrency. For example, if you set instance concurrency to 10, a maximum of 1,000 concurrent requests can be concurrently processed in a region. If the number of concurrent requests exceeds the maximum number of requests that Function Compute can handle, the throttling error ResourceExhausted is reported.

Note

To increase the upper limit of on-demand instances in a region, contact us.

Logging

If instance concurrency is set to 1, Function Compute returns function logs in the X-Fc-Log-Result response header if you specify X-Fc-Log-Type: Tail in the HTTP request header when you invoke a function. If instance concurrency is set to a value greater than 1, the response header does not include function logs because the logs of a specific request cannot be obtained among concurrent requests.

For a Node.js runtime, the function console.info() was used to return logs, which include request IDs. If Instance concurrency is set to a value greater than 1, console.info() cannot display request IDs as expected. All the request IDs are displayed as req 2. The following example shows a sample log:

2019-11-06T14:23:37.587Z req1 [info] logger begin
2019-11-06T14:23:37.587Z req1 [info] ctxlogger begin
2019-11-06T14:23:37.587Z req2 [info] logger begin
2019-11-06T14:23:37.587Z req2 [info] ctxlogger begin
2019-11-06T14:23:40.587Z req1 [info] ctxlogger end
2019-11-06T14:23:40.587Z req2 [info] ctxlogger end
2019-11-06T14:23:37.587Z req2 [info] logger end
2019-11-06T14:23:37.587Z req2 [info] logger end

In this case, we recommend that you use context.logger.info() to print logs. This method allows request IDs to be displayed as expected. The following sample code shows an example:

exports.handler = (event, context, callback) => {
    console.info('logger begin');
    context.logger.info('ctxlogger begin');

    setTimeout(function() {
        context.logger.info('ctxlogger end');
        console.info('logger end');
        callback(null, 'hello world');
    }, 3000);
};

Error handling

When an instance concurrently processes multiple requests, unexpected process quits caused by failed requests affect other concurrent requests. Therefore, you must compile logic to capture request-level exceptions in the function code to prevent impacts on other requests. The following example shows sample code in Node.js:

exports.handler = (event, context, callback) => {
    try {
        JSON.parse(event);
    } catch (ex) {
        callback(ex);
    }

    callback(null, 'hello world');
};

Shared variables

When an instance concurrently processes multiple requests, errors may occur if multiple requests attempt to modify the same variable at a time. You must use mutual exclusion in your code to prevent variable modifications that are not safe for threads. The following example shows sample code in Java:

public class App implements StreamRequestHandler
{
    private static int counter = 0;

    @Override
    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
        synchronized (this) {
            counter = counter + 1;
        }
        outputStream.write(new String("hello world").getBytes());
    }
}

Monitoring metrics

After you set instance concurrency to a value greater than 1, the number of used instances is reduced in the instance monitoring chart when a fixed number of requests are processed. 实例数据监控图