AWS CDK(v2): Build AWSBatch environment for FARGATE_SPOT
We will use AWS CDK v2 to build an AWSBatch environment for FARGATE_SPOT. To create an AWSBatch environment, we first create a
- VPC
- ComputeEnvironment
- JobQueue
for each task you want to run, and a
- JobDefinition
for each task you want to execute. I was not very familiar with this configuration and stumbled a lot, so it took me about a day the first time I made it… Well, maybe next time I can do it in an hour or so.
Preparation
The following preparations are assumed to have been made
Version
aws-cdk
:2.20.0
Key Points
How to write depends on the type of ComputeEnvironment
The types of ComputeEnvironment are described here
Currently there are four types: EC2 | FARGATE | FARGATE_SPOT | SPOT
. The type of JobQueue, which can be specified and which must be specified, seems to change depending on the type of JobQueue.
In this example, FARGATE_SPOT
is used.
ecsTaskExecutionRole
In my case, it was created sometime ago, but if not, you need to create one. You can find instructions at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_execution_IAM_role.html and so on.
VPC
You can either create a new one or use an existing one. I will leave both ways of writing.
executionRoleArn and jobRoleArn
- The
executionRoleArn
is the minimum Role required to start Batch execution (e.g. pull an image). jobRoleArn
is used when a container needs a Role to execute further.
assignPublicIp
If you don’t set this to ENABLED
, you will get an error because you can’t pull the container Image.
However, the error message will be as follows if the container is in docker.io
, and
CannotPullContainerError: inspect image has been retried 5 time(s):
failed to resolve ref "docker.io/library/busybox:latest": failed to do request:
Head https://registry-1.docker.io/v2/library/busybox/manifests/latest: dial tcp 54.85.133.123:443: i/o t...
If the container is in the ECR, it will look like this
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed:
unable to retrieve ecr registry auth: service call has been retried 3 time(s): RequestError:
send request failed caused by: Post https://api.ecr....
It took me quite a while to solve the problem because I couldn’t quite figure out the cause from here.
platformCapabilities
If you do not specify this, you will be stuck with a ComputeEnvironment of type FARGATE_SPOT
that cannot be executed (i.e., cannot be placed in a JobQueue).
Code
// lib/awsbatch-stack.ts
import { aws_batch, Stack, StackProps } from "aws-cdk-lib";
import { IVpc, SecurityGroup, SubnetType, Vpc } from "aws-cdk-lib/aws-ec2";
import { Construct } from "constructs";
// Stack name to be created this time
const STACK_BASE_NAME = "SampelAWSBatch";
// Specify if using an existing VPC
const VPC_ID = "vpc-12345678";
// If there is no `ecsTaskExecutionRole`, it must be created.
const DEFAULT_EXEC_ROLE_ARN =
"arn:aws:iam::<<AWS_ACCOUNT_ID>>:role/ecsTaskExecutionRole";
export class SampleAWSBatchStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
// https://docs.aws.amazon.com/cdk/api/v1/docs/aws-batch-readme.html
///////////////////////////////////////////////////////////////////
// Prepare VPC
///////////////////////////////////////////////////////////////////
let vpc: IVpc;
if (!VPC_ID) {
// When creating a new VPC
vpc = new Vpc(this, `${STACK_BASE_NAME}VPC`, {
cidr: "10.9.0.0/16", // 172.16.0.0/16 or whatever.
subnetConfiguration: [
{
name: `${STACK_BASE_NAME}Subnet`,
subnetType: SubnetType.PUBLIC,
cidrMask: 18,
},
],
});
} else {
// If you are using an existing VPC.
// To use `Vpc.fromLookup() `.
// it seems that you need to specify the region and accountId in the `env` in `bin/awsbatch.ts` or in the environment variables when running the cdk.
vpc = Vpc.fromLookup(this, "VPC", {
vpcId: VPC_ID,
});
}
///////////////////////////////////////////////////////////////////
// Security Group in the VPC
///////////////////////////////////////////////////////////////////
const securityGroup = new SecurityGroup(this, `${STACK_BASE_NAME}SG`, {
vpc: vpc,
});
///////////////////////////////////////////////////////////////////
// ComputeEnvironment type=FARGATE_SPOT
///////////////////////////////////////////////////////////////////
const fargateSpotEnvironment = new aws_batch.CfnComputeEnvironment(
this,
`${STACK_BASE_NAME}ComputeEnvironment`,
{
type: "MANAGED",
computeEnvironmentName: STACK_BASE_NAME,
computeResources: {
type: "FARGATE_SPOT",
maxvCpus: 64,
subnets: vpc.publicSubnets.map((x) => x.subnetId), // List of SubnetId
securityGroupIds: [securityGroup.securityGroupId],
},
}
);
///////////////////////////////////////////////////////////////////
// Create JobQueue
///////////////////////////////////////////////////////////////////
const jobQueue = new aws_batch.CfnJobQueue(
this,
`${STACK_BASE_NAME}JobQueue`,
{
jobQueueName: STACK_BASE_NAME,
computeEnvironmentOrder: [
{
computeEnvironment:
fargateSpotEnvironment.attrComputeEnvironmentArn,
order: 1,
},
],
priority: 1,
}
);
///////////////////////////////////////////////////////////////////
// Create JobDefinitions
///////////////////////////////////////////////////////////////////
const jobs: { [key: string]: string } = {}; // repoUri -> JobDefArn
for (const setting of CONTAINER_JOB_SETTINGS) {
const jobDef = new aws_batch.CfnJobDefinition(
this,
`${setting.jobName}JobDef`,
{
type: "container",
jobDefinitionName: setting.jobName,
platformCapabilities: ["FARGATE"], // Note: If FARGATE is not specified, it will not run in a FARGATE environment.
containerProperties: {
image: setting.imageUri,
executionRoleArn: DEFAULT_EXEC_ROLE_ARN,
jobRoleArn: setting.jobRoleArn,
resourceRequirements: [
{ type: "MEMORY", value: String(setting.memory) },
{ type: "VCPU", value: String(setting.vcpu) },
],
networkConfiguration: {
assignPublicIp: "ENABLED", // Note: Without it, you cannot access ECR.
},
},
retryStrategy: {
attempts: 1,
},
}
);
jobs[setting.imageUri] = jobDef.ref;
}
}
}
export type ContainerJobSetting = {
imageUri: string;
jobName: string;
jobRoleArn?: string;
memory: number; // in MB
vcpu: number;
};
/**
* JobDefinition information
* Note that the combinations of Memory and CPU that can be specified are limited to the following.
* https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-batch-jobdefinition-resourcerequirement.html
*/
const CONTAINER_JOB_SETTINGS: ContainerJobSetting[] = [
{
imageUri: "busybox",
jobName: "HelloWorld",
memory: 512,
vcpu: 0.25,
},
];
Afterword
It’s not a big deal once you figure it out, but it’s quite a challenge to get there.