AWS CDK(v2): Build AWSBatch environment for FARGATE_SPOT
We will use AWS CDK v2 to build an AWSBatch environment for FARGATE_SPOT. To create an AWSBatch environment, we first create a
- VPC
- ComputeEnvironment
- JobQueue
for each task you want to run, and a
- JobDefinition
for each task you want to execute. I was not very familiar with this configuration and stumbled a lot, so it took me about a day the first time I made it… Well, maybe next time I can do it in an hour or so.
Preparation
The following preparations are assumed to have been made
Version
aws-cdk:2.20.0
Key Points
How to write depends on the type of ComputeEnvironment
The types of ComputeEnvironment are described here
Currently there are four types: EC2 | FARGATE | FARGATE_SPOT | SPOT. The type of JobQueue, which can be specified and which must be specified, seems to change depending on the type of JobQueue.
In this example, FARGATE_SPOT is used.
ecsTaskExecutionRole
In my case, it was created sometime ago, but if not, you need to create one. You can find instructions at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html and so on.
VPC
You can either create a new one or use an existing one. I will leave both ways of writing.
executionRoleArn and jobRoleArn
- The
executionRoleArnis the minimum Role required to start Batch execution (e.g. pull an image). jobRoleArnis used when a container needs a Role to execute further.
assignPublicIp
If you don’t set this to ENABLED, you will get an error because you can’t pull the container Image.
However, the error message will be as follows if the container is in docker.io, and
CannotPullContainerError: inspect image has been retried 5 time(s):
failed to resolve ref "docker.io/library/busybox:latest": failed to do request:
Head https://registry-1.docker.io/v2/library/busybox/manifests/latest: dial tcp 54.85.133.123:443: i/o t...
If the container is in the ECR, it will look like this
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed:
unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError:
send request failed caused by: Post https://api.ecr....
It took me quite a while to solve the problem because I couldn’t quite figure out the cause from here.
platformCapabilities
If you do not specify this, you will be stuck with a ComputeEnvironment of type FARGATE_SPOT that cannot be executed (i.e., cannot be placed in a JobQueue).
Code
// lib/awsbatch-stack.ts
import { aws_batch, Stack, StackProps } from "aws-cdk-lib";
import { IVpc, SecurityGroup, SubnetType, Vpc } from "aws-cdk-lib/aws-ec2";
import { Construct } from "constructs";
// Stack name to be created this time
const STACK_BASE_NAME = "SampelAWSBatch";
// Specify if using an existing VPC
const VPC_ID = "vpc-12345678";
// If there is no `ecsTaskExecutionRole`, it must be created.
const DEFAULT_EXEC_ROLE_ARN =
"arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/ecsTaskExecutionRole";
export class SampleAWSBatchStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
// https://docs.aws.amazon.com/cdk/api/v1/docs/aws-batch-readme.html
///////////////////////////////////////////////////////////////////
// Prepare VPC
///////////////////////////////////////////////////////////////////
let vpc: IVpc;
if (!VPC_ID) {
// When creating a new VPC
vpc = new Vpc(this, `${STACK_BASE_NAME}VPC`, {
cidr: "10.9.0.0/16", // 172.16.0.0/16 or whatever.
subnetConfiguration: [
{
name: `${STACK_BASE_NAME}Subnet`,
subnetType: SubnetType.PUBLIC,
cidrMask: 18,
},
],
});
} else {
// If you are using an existing VPC.
// To use `Vpc.fromLookup() `.
// it seems that you need to specify the region and accountId in the `env` in `bin/awsbatch.ts` or in the environment variables when running the cdk.
vpc = Vpc.fromLookup(this, "VPC", {
vpcId: VPC_ID,
});
}
///////////////////////////////////////////////////////////////////
// Security Group in the VPC
///////////////////////////////////////////////////////////////////
const securityGroup = new SecurityGroup(this, `${STACK_BASE_NAME}SG`, {
vpc: vpc,
});
///////////////////////////////////////////////////////////////////
// ComputeEnvironment type=FARGATE_SPOT
///////////////////////////////////////////////////////////////////
const fargateSpotEnvironment = new aws_batch.CfnComputeEnvironment(
this,
`${STACK_BASE_NAME}ComputeEnvironment`,
{
type: "MANAGED",
computeEnvironmentName: STACK_BASE_NAME,
computeResources: {
type: "FARGATE_SPOT",
maxvCpus: 64,
subnets: vpc.publicSubnets.map((x) => x.subnetId), // List of SubnetId
securityGroupIds: [securityGroup.securityGroupId],
},
}
);
///////////////////////////////////////////////////////////////////
// Create JobQueue
///////////////////////////////////////////////////////////////////
const jobQueue = new aws_batch.CfnJobQueue(
this,
`${STACK_BASE_NAME}JobQueue`,
{
jobQueueName: STACK_BASE_NAME,
computeEnvironmentOrder: [
{
computeEnvironment:
fargateSpotEnvironment.attrComputeEnvironmentArn,
order: 1,
},
],
priority: 1,
}
);
///////////////////////////////////////////////////////////////////
// Create JobDefinitions
///////////////////////////////////////////////////////////////////
const jobs: { [key: string]: string } = {}; // repoUri -> JobDefArn
for (const setting of CONTAINER_JOB_SETTINGS) {
const jobDef = new aws_batch.CfnJobDefinition(
this,
`${setting.jobName}JobDef`,
{
type: "container",
jobDefinitionName: setting.jobName,
platformCapabilities: ["FARGATE"], // Note: If FARGATE is not specified, it will not run in a FARGATE environment.
containerProperties: {
image: setting.imageUri,
executionRoleArn: DEFAULT_EXEC_ROLE_ARN,
jobRoleArn: setting.jobRoleArn,
resourceRequirements: [
{ type: "MEMORY", value: String(setting.memory) },
{ type: "VCPU", value: String(setting.vcpu) },
],
networkConfiguration: {
assignPublicIp: "ENABLED", // Note: Without it, you cannot access ECR.
},
},
retryStrategy: {
attempts: 1,
},
}
);
jobs[setting.imageUri] = jobDef.ref;
}
}
}
export type ContainerJobSetting = {
imageUri: string;
jobName: string;
jobRoleArn?: string;
memory: number; // in MB
vcpu: number;
};
/**
* JobDefinition information
* Note that the combinations of Memory and CPU that can be specified are limited to the following.
* https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-resourcerequirement.html
*/
const CONTAINER_JOB_SETTINGS: ContainerJobSetting[] = [
{
imageUri: "busybox",
jobName: "HelloWorld",
memory: 512,
vcpu: 0.25,
},
];
Afterword
It’s not a big deal once you figure it out, but it’s quite a challenge to get there.