DEV Community

Cover image for Accelerating R&D with Instant Data Labeling Infrastructure - Label Studio on AWS AppRunner Terraform Module
hayata-yamamoto
hayata-yamamoto

Posted on

Accelerating R&D with Instant Data Labeling Infrastructure - Label Studio on AWS AppRunner Terraform Module

Introduction: Nothing Starts Without Data

"We want to build an excellent machine learning model."

Everyone involved in R&D projects thinks this way. But what's the reality? Weeks after project kickoff, you're still stuck on infrastructure setup. Machine learning engineers are wrestling with the AWS console. Have you witnessed such scenes before?

At Tied, inc., through supporting numerous R&D projects, we've encountered a common challenge. It's the "data labeling environment setup" - a seemingly minor but actually critical bottleneck.

This article introduces the Terraform module "terraform-aws-label-studio-on-apprunner" we developed to solve this challenge, along with its background and value.

Why Data Labeling Infrastructure Matters

The Success Formula for Machine Learning

What factors determine the success of machine learning projects? The latest algorithms? High-performance GPUs? While these are important, the most fundamental factor is "high-quality data."

And to produce high-quality data, an efficient labeling environment is essential. Label Studio is widely recognized as a powerful open-source tool for this purpose. It supports all data formats - images, text, audio - and enables team collaboration.

The Reality of Time to Market

R&D projects are a race against time. Delivering value to market faster than competitors determines business success. However, many projects experience the following timeline:

  • Week 1-2: Infrastructure design and setup
  • Week 3: Label Studio finally starts running
  • Week 4: Data labeling finally begins

In other words, it takes a full month before actual value creation begins.

Three Challenges R&D Teams Face

1. Skill Set Mismatch

The primary role of machine learning engineers is to analyze data, build models, and create business value. In reality, however, they spend time configuring VPCs and setting up security groups.

This isn't just wasted time. The opportunity cost of having expensive specialists work on tasks outside their expertise is immeasurable.

2. Barriers at Project Launch

"We can't start anything without data first."

This is the fate of R&D projects. But what if it takes weeks to set up the environment for creating that data? The initial project momentum is lost, and team motivation declines.

3. Scaling Complexity

When a small-scale project succeeds, scaling up becomes the next challenge. You want to increase annotators from 10 to 100. You want to run multiple projects in parallel. However, with traditional methods, environment replication and management are complex, often ending without successful scaling.

Realistic Comparison of Label Studio Deployment Methods

What options exist for deploying Label Studio? Based on our experience, let's compare each method in detail.

Deployment Method Comparison

Deployment Method Setup Time Operational Load Scalability Cost Efficiency Security Use Case
Local Docker 10 min Low Personal testing
Manual EC2 2-3 days High Small teams
ECS/Fargate 1-2 days Medium Medium scale
Kubernetes 3-5 days High Large scale
Our Module (AppRunner) 30 min Low R&D projects

Detailed Analysis of Each Method

Local Docker Compose is the easiest way to start. Following official documentation, you can launch it in 10 minutes. However, it can't be shared with teams and doesn't guarantee data persistence. Consider it only for personal initial validation.

Manual EC2 setup is the first choice for many companies. While offering complete control, it has these challenges:

  • Extensive work including Nginx configuration, SSL certificate management, database setup
  • Person-dependent setup procedures
  • Regular OS updates and security patches

ECS/Fargate is a good option for teams familiar with container technology. However, initial setup complexity cannot be ignored:

  • Task definition creation
  • Load balancer configuration
  • Service discovery setup
  • Network mode selection

Kubernetes is the most flexible and scalable option, but also has the highest learning cost and operational overhead. Unless you're a large organization with a dedicated SRE team, it's likely over-engineering.

Our Solution: AWS AppRunner + Terraform

To solve these challenges, we chose a new approach. By combining AWS AppRunner with Terraform, we achieved both "simplicity" and "scalability."

Why AppRunner?

AWS AppRunner is a fully managed service for container applications. These features make it ideal for R&D projects:

  1. Auto-scaling: Automatically scales up/down based on usage
  2. Zero operational overhead: AWS manages OS updates, security patches
  3. Pay-per-use: Pay only for what you use, minimizing idle costs

The Value of Terraform Modules

Following Infrastructure as Code principles, by providing it as a reusable module:

module "label_studio" {
  source = "tied-inc/label-studio-on-apprunner/aws"

  vpc_id               = data.aws_vpc.main.id
  private_subnet_ids   = data.aws_subnets.private.ids
  database_subnet_ids  = data.aws_subnets.database.ids
}
Enter fullscreen mode Exit fullscreen mode

With just a few lines of code, you can build production-level Label Studio.

Business Impact: Effects in Numbers

Quantitative Results

Measured results from actual projects:

  • Infrastructure setup time: 3 days → 30 minutes (98% reduction)
  • Monthly operational hours: 40 hours → 2 hours (95% reduction)
  • Time to project start: 2 weeks → Same day

Technical Advantages

Secure Design

This module is designed with enterprise-level security in mind:

  • Private communication within VPC
  • Secure data management with RDS Aurora Serverless v2
  • Secret management via AWS Systems Manager
  • Principle of least privilege through IAM roles

Compatibility with Existing Environments

Many companies already have environments built on AWS. This module is designed to seamlessly integrate with such existing environments:

module "label_studio" {
  source = "tied-inc/label-studio-on-apprunner/aws"

  # Specify existing VPC and subnets
  vpc_id               = var.existing_vpc_id
  private_subnet_ids   = var.existing_private_subnets

  # Customize specs as needed
  cpu    = "1 vCPU"
  memory = "2 GB"

  # Use existing domain
  custom_domain = "label.example.com"
}
Enter fullscreen mode Exit fullscreen mode

Teams That Should Consider Adoption

Ideal for These Teams

  • Running multiple R&D projects in parallel
  • Machine learning engineer-centric team composition
  • Emphasizing agile project management
  • Aiming for rapid transition from POC to production

ROI Perspective

Initial investment is nearly zero. AppRunner's pay-per-use model allows starting small and scaling as needed. Meanwhile, costs you can reduce include:

  • Engineer hour reduction: 40 hours/month × $100/hour = $4,000/month savings
  • Avoiding opportunity loss through faster time to market
  • Improved engineer motivation through reduced operational burden

Conclusion: Toward Data-Driven R&D

The Terraform module introduced in this article is not just an infrastructure automation tool. It's an "enabler" that allows R&D teams to focus on their original value creation.

From "infrastructure building" to "value creation."
From "person-dependent operations" to "democratized data creation."
From "slow R&D cycles" to "rapid iterations."

This paradigm shift is the essence of data-driven R&D.

Next Steps

  1. Access our GitHub repository
  2. Follow the README to set up your environment in 30 minutes
  3. Experience the value in your actual projects

At Tied, inc., we hope that through this module, more R&D teams can create value rapidly. We look forward to your feedback and contributions.

Let's end the time spent struggling with data labeling environment setup. Use that time for creating real innovation.


Tied, inc. is a professional team that supports R&D project success through technology. Beyond this module, we provide MLOps implementation support and data strategy consulting.

Top comments (0)