Augustus otu

Posted on Jun 24

Building an Online Code Compiler: A Complete Guide

#go #webdev #programming #distributedsystems

By the Toki Space Team

Creating a production-grade online code compiler is one of the most complex and demanding projects in distributed systems. It calls for expertise in container orchestration, job queuing, real-time data flow, security sandboxing, and precise resource control. Unlike basic code execution tools, a full-fledged compiler platform must support multiple programming languages, manage parallel executions, stream outputs live, and recover gracefully from errors.

In this tutorial, you’ll learn how to build a fully functional online compiler using Docker for containerization, RabbitMQ for managing execution jobs, Redis for real-time communication, and React for the user interface. We’ll walk through the architecture, implementation, and deployment process—drawing from real-world experience building the code execution system behind Toki Space.
Please forgive me, if there are code execution errors. This is mainly meant to give you an idea of how it works under the hood. Let's dive in.

Architecture Overview

System Components

Our online code compiler consists of five main components working together:

┌─────────────────┐    WebSocket    ┌─────────────────┐
│   React         │ ◄──────────────► │   Node.js       │
│   Frontend      │                 │   Backend       │
│                 │   HTTP/REST     │                 │
└─────────────────┘ ◄──────────────► └─────────────────┘
                                              │
                                              │ Jobs
                                              ▼
┌─────────────────┐                 ┌─────────────────┐
│   Redis         │                 │   RabbitMQ      │
│   (Streaming)   │                 │   (Job Queue)   │
└─────────────────┘                 └─────────────────┘
         ▲                                   │
         │ Results                           │ Jobs
         │                                   ▼
         └─────────── ┌─────────────────┐ ◄─────
                     │   Runner        │
                     │   Service       │
                     │   (Go)          │
                     └─────────────────┘
                              │
                              │ Docker API
                              ▼
                     ┌─────────────────┐
                     │   Docker        │
                     │   Containers    │
                     │   (Multi-lang)  │
                     └─────────────────┘

Core Technologies

Frontend: React with TypeScript for the user interface
Backend: Node.js with Express for API and WebSocket handling
Runner Service: Go service for container management and code execution
Message Queue: RabbitMQ for reliable job distribution
Streaming: Redis for real-time output streaming
Containers: Docker for secure code execution isolation

Design Principles

Language Agnostic: Support for Python, Node.js, Go, Rust, Java, and more
Secure Isolation: Each execution runs in a separate Docker container
Real-time Feedback: Stream output as code executes
Scalable Architecture: Horizontal scaling through message queues
Fault Tolerance: Graceful handling of failures and timeouts

Docker Container System

The Challenge of Multi-Language Execution

Running user code safely requires solving several complex problems:

Security Isolation: Prevent malicious code from accessing the host system
Resource Limits: Control CPU, memory, and execution time
Environment Setup: Provide language-specific tools and dependencies
Cleanup: Remove containers and workspaces after execution

Container-Per-Language Architecture

Instead of spinning up new containers for each execution (which is slow), we use persistent containers per language:

type LanguageVM struct {
    Language      string
    ContainerName string
    IsRunning     bool
    WorkspaceDir  string
    mutex         sync.Mutex
}

type Manager struct {
    config       config.FirecrackerConfig
    logger       *logrus.Logger
    vms          map[string]*LanguageVM
    vmsMutex     sync.RWMutex
}

Benefits of Persistent Containers:

Fast Execution: No container startup overhead
Warm Environments: Dependencies already installed
Resource Efficiency: Reuse container resources
Consistent State: Predictable execution environment

Container Initialization

Each language gets its own persistent container with pre-installed tools:

func (m *Manager) initializePersistentContainers() error {
    m.logger.Info("Initializing persistent containers for all languages...")

    for language := range m.config.Environments {
        m.logger.Infof("Starting persistent container for language: %s", language)

        vm := &LanguageVM{
            Language:      language,
            ContainerName: fmt.Sprintf("runner-vm-%s", language),
            WorkspaceDir:  filepath.Join(m.config.WorkspaceDir, language),
            IsRunning:     false,
        }

        // Create language-specific workspace directory
        if err := os.MkdirAll(vm.WorkspaceDir, 0755); err != nil {
            return fmt.Errorf("failed to create workspace directory for %s: %w", language, err)
        }

        // Start the container
        if err := m.startPersistentContainer(vm); err != nil {
            m.logger.Errorf("Failed to start container for %s: %v", language, err)
            continue
        }

        m.vms[language] = vm
    }

    return nil
}

Language-Specific Execution

Each language requires different setup and execution commands:

func (m *Manager) executeInPersistentContainer(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) {
    var execCmd []string
    workspacePath := fmt.Sprintf("/tmp/workspaces/%s/%s", vm.Language, req.JobID)

    switch vm.Language {
    case "python":
        entryPoint := req.EntryPoint
        if entryPoint == "" {
            entryPoint = "main.py"
        }
        execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
            fmt.Sprintf("cd %s && if [ -f requirements.txt ]; then pip install -r requirements.txt; fi && timeout 30 python %s", workspacePath, entryPoint)}

    case "nodejs":
        entryPoint := req.EntryPoint
        if entryPoint == "" {
            entryPoint = "index.js"
        }
        execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
            fmt.Sprintf("cd %s && if [ -f package.json ]; then npm install; fi && timeout 30 node %s", workspacePath, entryPoint)}

    case "go":
        execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
            fmt.Sprintf("cd %s && go mod tidy 2>/dev/null || true && timeout 30 go run .", workspacePath)}

    case "rust":
        execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
            fmt.Sprintf("cd %s && timeout 30 cargo run", workspacePath)}

    case "java":
        entryPoint := req.EntryPoint
        if entryPoint == "" {
            entryPoint = "Main.java"
        }
        className := strings.TrimSuffix(entryPoint, ".java")
        execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
            fmt.Sprintf("cd %s && javac %s && timeout 30 java %s", workspacePath, entryPoint, className)}
    }

    // Execute with timeout
    execCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
    defer cancel()
    cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...)

    output, err := cmd.CombinedOutput()

    result := &ExecutionResult{
        JobID:       req.JobID,
        WorkspaceID: req.WorkspaceID,
        Output:      string(output),
        ExitCode:    0,
    }

    if err != nil {
        result.Error = err.Error()
        result.ExitCode = 1
    }

    return result, nil
}

Key Implementation Details:

Workspace Isolation: Each job gets its own directory within the container
Dependency Management: Automatic installation of requirements.txt, package.json, etc.
Timeout Protection: 30-second execution limit prevents infinite loops
Error Handling: Capture both stdout and stderr for complete output

Message Queue Integration

Why RabbitMQ for Code Execution?

Code execution jobs have specific requirements that make RabbitMQ ideal:

Reliability: Jobs must not be lost if a worker crashes
Durability: Job queue survives server restarts
Fair Distribution: Distribute jobs evenly across worker instances
Dead Letter Queues: Handle failed jobs gracefully
Priority Queues: Support urgent job execution

RabbitMQ Topology

Our message queue setup uses a topic exchange for flexible routing:

# Exchange Configuration
exchanges:
  jobs:
    name: "code-execution.jobs"
    type: "topic"
    durable: true
    auto_delete: false
  results:
    name: "code-execution.results"
    type: "topic"
    durable: true
    auto_delete: false
  dead_letter:
    name: "code-execution.dead-letter"
    type: "direct"
    durable: true
    auto_delete: false

# Queue Configuration
queues:
  job_prefix: "jobs"
  result_prefix: "results"
  dead_letter_suffix: "dlq"

Routing Keys Pattern:

jobs.python - Python execution jobs
jobs.nodejs - Node.js execution jobs
jobs.go - Go execution jobs
results.python - Python execution results
results.nodejs - Node.js execution results

Job Message Format

Standardized job messages ensure compatibility across services:

interface ExecutionJob {
  job_id: string;           // Unique identifier
  workspace_id: string;     // User workspace
  language: string;         // Target language
  source_code: string;      // Code to execute
  entry_point?: string;     // Main file (optional)
  dependencies?: string[];  // Package dependencies
  timeout?: number;         // Execution timeout
  memory_limit?: number;    // Memory limit in MB
}

Producer Implementation (Node.js Backend)

const amqp = require('amqplib');

class JobProducer {
  constructor(rabbitmqUrl) {
    this.rabbitmqUrl = rabbitmqUrl;
    this.connection = null;
    this.channel = null;
  }

  async connect() {
    this.connection = await amqp.connect(this.rabbitmqUrl);
    this.channel = await this.connection.createChannel();

    // Declare exchanges
    await this.channel.assertExchange('code-execution.jobs', 'topic', {
      durable: true
    });

    await this.channel.assertExchange('code-execution.results', 'topic', {
      durable: true
    });
  }

  async submitJob(job) {
    const routingKey = `jobs.${job.language}`;
    const jobMessage = {
      job_id: job.job_id,
      workspace_id: job.workspace_id,
      language: job.language,
      url: this.createDataURL(job.source_code, job.entry_point),
      entry_point: job.entry_point
    };

    await this.channel.publish(
      'code-execution.jobs',
      routingKey,
      Buffer.from(JSON.stringify(jobMessage)),
      {
        persistent: true,
        messageId: job.job_id,
        timestamp: Date.now()
      }
    );

    console.log(`Job ${job.job_id} submitted for ${job.language}`);
  }

  createDataURL(sourceCode, filename = 'main.py') {
    // Create inline data URL for source code
    return `data:text/plain;filename=${filename},${sourceCode}`;
  }

  async close() {
    if (this.channel) await this.channel.close();
    if (this.connection) await this.connection.close();
  }
}

Consumer Implementation (Go Runner Service)

func (m *Manager) consumeRabbitMQMessages(config RabbitMQConfig) error {
    conn, err := amqp.Dial(config.URL)
    if err != nil {
        return fmt.Errorf("failed to connect to RabbitMQ: %w", err)
    }
    defer conn.Close()

    ch, err := conn.Channel()
    if err != nil {
        return fmt.Errorf("failed to open channel: %w", err)
    }
    defer ch.Close()

    // Declare unified queue for all languages
    jobQueue, err := ch.QueueDeclare(
        "jobs.all-languages", // name
        true,                 // durable
        false,                // delete when unused
        false,                // exclusive
        false,                // no-wait
        nil,                  // arguments
    )
    if err != nil {
        return fmt.Errorf("failed to declare job queue: %w", err)
    }

    // Bind queue to exchange for each language
    languages := []string{"python", "nodejs", "go", "rust", "java"}
    for _, lang := range languages {
        err = ch.QueueBind(
            jobQueue.Name,
            fmt.Sprintf("jobs.%s", lang),
            "code-execution.jobs",
            false,
            nil,
        )
        if err != nil {
            return fmt.Errorf("failed to bind queue for %s: %w", lang, err)
        }
    }

    // Set QoS for fair distribution
    err = ch.Qos(1, 0, false)
    if err != nil {
        return fmt.Errorf("failed to set QoS: %w", err)
    }

    // Start consuming
    msgs, err := ch.Consume(
        jobQueue.Name,
        "",    // consumer tag
        false, // auto-ack
        false, // exclusive
        false, // no-local
        false, // no-wait
        nil,   // args
    )
    if err != nil {
        return fmt.Errorf("failed to register consumer: %w", err)
    }

    for msg := range msgs {
        go m.processJob(msg)
    }

    return nil
}

func (m *Manager) processJob(msg amqp.Delivery) {
    var execReq ExecutionRequest
    if err := json.Unmarshal(msg.Body, &execReq); err != nil {
        m.logger.WithError(err).Error("Failed to parse job message")
        msg.Nack(false, false) // Don't requeue invalid messages
        return
    }

    // Execute the job
    ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
    result, err := m.ExecuteCode(ctx, execReq)
    cancel()

    if err != nil {
        m.logger.WithError(err).Error("Job execution failed")
        result = &ExecutionResult{
            JobID:    execReq.JobID,
            Output:   "",
            Error:    err.Error(),
            ExitCode: 1,
        }
    }

    // Publish result
    resultBytes, _ := json.Marshal(result)
    err = msg.Channel.Publish(
        "code-execution.results",
        fmt.Sprintf("results.%s", execReq.Language),
        false, // mandatory
        false, // immediate
        amqp.Publishing{
            ContentType: "application/json",
            Body:        resultBytes,
        })

    if err != nil {
        m.logger.WithError(err).Error("Failed to publish result")
        msg.Nack(false, true) // Requeue on publish failure
    } else {
        msg.Ack(false) // Acknowledge successful processing
    }
}

Real-time Streaming

The Challenge of Live Output

Unlike traditional job systems that return results after completion, code execution benefits from real-time output streaming:

User Experience: Immediate feedback as code runs
Long-Running Jobs: Show progress for lengthy operations
Debugging: See output line-by-line for easier debugging
Interactive Input: Support for programs requiring user input

Redis Pub/Sub for Streaming

Redis provides excellent pub/sub capabilities for real-time streaming:

// Redis streaming setup
const redis = require('redis');

class StreamingService {
  constructor(redisUrl) {
    this.publisher = redis.createClient({ url: redisUrl });
    this.subscriber = redis.createClient({ url: redisUrl });
  }

  async connect() {
    await this.publisher.connect();
    await this.subscriber.connect();
  }

  // Publish output line from runner service
  async publishOutput(jobId, line, stream = 'stdout') {
    const message = {
      job_id: jobId,
      line: line,
      stream: stream,
      timestamp: new Date().toISOString()
    };

    await this.publisher.publish(
      `execution:${jobId}`,
      JSON.stringify(message)
    );
  }

  // Subscribe to job output
  async subscribeToJob(jobId, callback) {
    await this.subscriber.subscribe(`execution:${jobId}`, (message) => {
      const data = JSON.parse(message);
      callback(data);
    });
  }

  async unsubscribeFromJob(jobId) {
    await this.subscriber.unsubscribe(`execution:${jobId}`);
  }
}

WebSocket Integration

Connect Redis streams to frontend via WebSocket:

// WebSocket server integration
const WebSocket = require('ws');

class WebSocketManager {
  constructor(server, streamingService) {
    this.wss = new WebSocket.Server({ server });
    this.streamingService = streamingService;
    this.connections = new Map(); // jobId -> Set of WebSocket connections

    this.setupWebSocketHandlers();
  }

  setupWebSocketHandlers() {
    this.wss.on('connection', (ws) => {
      ws.on('message', async (data) => {
        const message = JSON.parse(data);

        switch (message.type) {
          case 'subscribe':
            await this.subscribeToJobOutput(ws, message.job_id);
            break;
          case 'unsubscribe':
            await this.unsubscribeFromJobOutput(ws, message.job_id);
            break;
        }
      });

      ws.on('close', () => {
        this.cleanupConnection(ws);
      });
    });
  }

  async subscribeToJobOutput(ws, jobId) {
    // Add connection to job subscription
    if (!this.connections.has(jobId)) {
      this.connections.set(jobId, new Set());

      // Subscribe to Redis stream for this job
      await this.streamingService.subscribeToJob(jobId, (data) => {
        // Broadcast to all subscribed WebSocket connections
        const connections = this.connections.get(jobId);
        if (connections) {
          connections.forEach(conn => {
            if (conn.readyState === WebSocket.OPEN) {
              conn.send(JSON.stringify({
                type: 'output',
                data: data
              }));
            }
          });
        }
      });
    }

    this.connections.get(jobId).add(ws);
  }

  async unsubscribeFromJobOutput(ws, jobId) {
    const connections = this.connections.get(jobId);
    if (connections) {
      connections.delete(ws);

      // If no more connections, unsubscribe from Redis
      if (connections.size === 0) {
        await this.streamingService.unsubscribeFromJob(jobId);
        this.connections.delete(jobId);
      }
    }
  }

  cleanupConnection(ws) {
    // Remove connection from all job subscriptions
    for (const [jobId, connections] of this.connections.entries()) {
      connections.delete(ws);
      if (connections.size === 0) {
        this.streamingService.unsubscribeFromJob(jobId);
        this.connections.delete(jobId);
      }
    }
  }
}

Enhanced Runner with Streaming

Modify the runner service to stream output line-by-line:

func (m *Manager) executeWithStreaming(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) {
    // ... command setup ...

    cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...)

    // Create pipes for real-time output capture
    stdout, err := cmd.StdoutPipe()
    if err != nil {
        return nil, fmt.Errorf("failed to create stdout pipe: %w", err)
    }

    stderr, err := cmd.StderrPipe()
    if err != nil {
        return nil, fmt.Errorf("failed to create stderr pipe: %w", err)
    }

    // Start the command
    if err := cmd.Start(); err != nil {
        return nil, fmt.Errorf("failed to start command: %w", err)
    }

    // Stream output in real-time
    var outputBuffer strings.Builder
    var wg sync.WaitGroup

    wg.Add(2)

    // Stream stdout
    go func() {
        defer wg.Done()
        scanner := bufio.NewScanner(stdout)
        for scanner.Scan() {
            line := scanner.Text()
            outputBuffer.WriteString(line + "\n")

            // Publish to Redis for real-time streaming
            m.publishStreamingOutput(req.JobID, line, "stdout")
        }
    }()

    // Stream stderr
    go func() {
        defer wg.Done()
        scanner := bufio.NewScanner(stderr)
        for scanner.Scan() {
            line := scanner.Text()
            outputBuffer.WriteString(line + "\n")

            // Publish to Redis for real-time streaming
            m.publishStreamingOutput(req.JobID, line, "stderr")
        }
    }()

    // Wait for command completion
    err = cmd.Wait()
    wg.Wait() // Wait for all output to be processed

    result := &ExecutionResult{
        JobID:       req.JobID,
        WorkspaceID: req.WorkspaceID,
        Output:      outputBuffer.String(),
        ExitCode:    0,
    }

    if err != nil {
        result.Error = err.Error()
        result.ExitCode = 1
    }

    return result, nil
}

func (m *Manager) publishStreamingOutput(jobID, line, stream string) {
    // This would integrate with your Redis client
    message := StreamingOutput{
        JobID:     jobID,
        Line:      line,
        Stream:    stream,
        Timestamp: time.Now(),
    }

    // Publish to Redis (implementation depends on your Redis client)
    // m.redisClient.Publish(fmt.Sprintf("execution:%s", jobID), message)
}

This streaming approach provides real-time feedback to users, making the code execution feel immediate and interactive rather than a black-box operation.

Backend Implementation

Node.js API Server

The backend serves as the orchestration layer between the frontend and the execution infrastructure:

const express = require('express');
const WebSocket = require('ws');
const { v4: uuidv4 } = require('uuid');
const cors = require('cors');

class CodeExecutionAPI {
  constructor() {
    this.app = express();
    this.server = null;
    this.jobProducer = null;
    this.streamingService = null;
    this.wsManager = null;
    this.activeJobs = new Map(); // Track running jobs

    this.setupMiddleware();
    this.setupRoutes();
  }

  setupMiddleware() {
    this.app.use(cors());
    this.app.use(express.json({ limit: '10mb' }));
    this.app.use(express.urlencoded({ extended: true }));

    // Request logging
    this.app.use((req, res, next) => {
      console.log(`${req.method} ${req.path} - ${new Date().toISOString()}`);
      next();
    });
  }

  setupRoutes() {
    // Health check
    this.app.get('/health', (req, res) => {
      res.json({ status: 'ok', timestamp: new Date().toISOString() });
    });

    // Submit code execution job
    this.app.post('/api/execute', async (req, res) => {
      try {
        const result = await this.handleCodeExecution(req.body);
        res.json(result);
      } catch (error) {
        console.error('Execution error:', error);
        res.status(500).json({ 
          error: 'Execution failed', 
          message: error.message 
        });
      }
    });

    // Get job status
    this.app.get('/api/jobs/:jobId', (req, res) => {
      const jobId = req.params.jobId;
      const job = this.activeJobs.get(jobId);

      if (!job) {
        return res.status(404).json({ error: 'Job not found' });
      }

      res.json(job);
    });

    // List active jobs
    this.app.get('/api/jobs', (req, res) => {
      const jobs = Array.from(this.activeJobs.values());
      res.json({ jobs, count: jobs.length });
    });

    // Cancel job
    this.app.delete('/api/jobs/:jobId', async (req, res) => {
      const jobId = req.params.jobId;
      await this.cancelJob(jobId);
      res.json({ message: 'Job cancelled' });
    });
  }

  async handleCodeExecution(requestBody) {
    const {
      language,
      source_code,
      entry_point,
      workspace_id = 'default',
      timeout = 30
    } = requestBody;

    // Validate request
    if (!language || !source_code) {
      throw new Error('Language and source_code are required');
    }

    const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java'];
    if (!supportedLanguages.includes(language)) {
      throw new Error(`Unsupported language: ${language}`);
    }

    // Generate unique job ID
    const jobId = uuidv4();

    // Create job record
    const job = {
      job_id: jobId,
      workspace_id,
      language,
      source_code,
      entry_point,
      timeout,
      status: 'queued',
      created_at: new Date().toISOString(),
      updated_at: new Date().toISOString()
    };

    this.activeJobs.set(jobId, job);

    // Submit to RabbitMQ
    await this.jobProducer.submitJob(job);

    // Update status
    job.status = 'submitted';
    job.updated_at = new Date().toISOString();

    return {
      job_id: jobId,
      status: 'submitted',
      message: 'Job submitted for execution',
      stream_url: `/stream/${jobId}`
    };
  }

  async cancelJob(jobId) {
    const job = this.activeJobs.get(jobId);
    if (job && job.status === 'running') {
      // Send cancellation signal (implementation depends on your setup)
      job.status = 'cancelled';
      job.updated_at = new Date().toISOString();
    }
  }

  async start(port = 3001) {
    // Initialize services
    await this.initializeServices();

    // Start HTTP server
    this.server = this.app.listen(port, () => {
      console.log(`Code execution API running on port ${port}`);
    });

    // Setup WebSocket manager
    this.wsManager = new WebSocketManager(this.server, this.streamingService);

    // Setup result consumer
    this.setupResultConsumer();
  }

  async initializeServices() {
    // Initialize RabbitMQ producer
    this.jobProducer = new JobProducer(process.env.RABBITMQ_URL);
    await this.jobProducer.connect();

    // Initialize Redis streaming
    this.streamingService = new StreamingService(process.env.REDIS_URL);
    await this.streamingService.connect();

    console.log('All services initialized successfully');
  }

  setupResultConsumer() {
    // Consumer for job results from RabbitMQ
    const amqp = require('amqplib');

    amqp.connect(process.env.RABBITMQ_URL)
      .then(conn => conn.createChannel())
      .then(ch => {
        // Declare results queue
        return ch.assertQueue('results.all', { durable: true })
          .then(() => {
            // Bind to results exchange
            return ch.bindQueue('results.all', 'code-execution.results', 'results.*');
          })
          .then(() => {
            // Consume results
            return ch.consume('results.all', (msg) => {
              if (msg) {
                this.handleJobResult(JSON.parse(msg.content.toString()));
                ch.ack(msg);
              }
            });
          });
      })
      .catch(console.error);
  }

  handleJobResult(result) {
    const job = this.activeJobs.get(result.job_id);
    if (job) {
      job.status = result.exit_code === 0 ? 'completed' : 'failed';
      job.result = result;
      job.updated_at = new Date().toISOString();

      // Optionally clean up completed jobs after some time
      setTimeout(() => {
        this.activeJobs.delete(result.job_id);
      }, 300000); // 5 minutes
    }
  }

  async stop() {
    if (this.server) {
      this.server.close();
    }
    if (this.jobProducer) {
      await this.jobProducer.close();
    }
    if (this.streamingService) {
      await this.streamingService.close();
    }
  }
}

// Environment configuration
const config = {
  port: process.env.PORT || 3001,
  rabbitmq_url: process.env.RABBITMQ_URL || 'amqp://admin:admin123@localhost:5672',
  redis_url: process.env.REDIS_URL || 'redis://localhost:6379'
};

// Start the server
const api = new CodeExecutionAPI();
api.start(config.port);

// Graceful shutdown
process.on('SIGINT', async () => {
  console.log('Shutting down gracefully...');
  await api.stop();
  process.exit(0);
});

Express Middleware for Code Validation

Add security and validation middleware:

const rateLimit = require('express-rate-limit');
const validator = require('validator');

// Rate limiting for code execution
const executionLimiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 10, // Maximum 10 executions per minute per IP
  message: 'Too many execution requests, please try again later',
  standardHeaders: true,
  legacyHeaders: false
});

// Code validation middleware
const validateCodeRequest = (req, res, next) => {
  const { language, source_code, entry_point } = req.body;

  // Check required fields
  if (!language || !source_code) {
    return res.status(400).json({
      error: 'Missing required fields',
      required: ['language', 'source_code']
    });
  }

  // Validate language
  const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java'];
  if (!supportedLanguages.includes(language)) {
    return res.status(400).json({
      error: 'Unsupported language',
      supported: supportedLanguages
    });
  }

  // Check code length (prevent abuse)
  if (source_code.length > 100000) { // 100KB limit
    return res.status(400).json({
      error: 'Source code too large',
      max_size: '100KB'
    });
  }

  // Validate entry point if provided
  if (entry_point && !validator.isAlphanumeric(entry_point.replace(/[._-]/g, ''))) {
    return res.status(400).json({
      error: 'Invalid entry point format'
    });
  }

  // Basic security checks (can be expanded)
  const dangerousPatterns = [
    /rm\s+-rf/,
    /sudo/,
    /passwd/,
    /\/etc\/passwd/,
    /mkfs/,
    /format/,
    /del\s+\/[a-z]/i
  ];

  for (const pattern of dangerousPatterns) {
    if (pattern.test(source_code)) {
      return res.status(400).json({
        error: 'Code contains potentially dangerous operations'
      });
    }
  }

  next();
};

// Apply middleware to execution endpoint
app.post('/api/execute', 
  executionLimiter, 
  validateCodeRequest, 
  async (req, res) => {
    // ... execution logic
  }
);

Frontend Development

React Code Execution Component

Based on the CodeRunnerDemo.tsx, here's a production-ready implementation:

import React, { useState, useEffect, useCallback, useRef } from 'react';
import { Button } from '@/components/ui/button';
import { Card, CardContent } from '@/components/ui/card';
import { Badge } from '@/components/ui/badge';
import { Tabs, TabsList, TabsTrigger, TabsContent } from '@/components/ui/tabs';
import { Play, Square, Copy, Download, Settings } from 'lucide-react';
import CodeMirror from '@uiw/react-codemirror';
import { githubDark } from '@uiw/codemirror-theme-github';
import { javascript } from '@codemirror/lang-javascript';
import { python } from '@codemirror/lang-python';
import { rust } from '@codemirror/lang-rust';
import { go } from '@codemirror/lang-go';
import { java } from '@codemirror/lang-java';

interface ExecutionResult {
  job_id: string;
  status: string;
  output?: string;
  error?: string;
  exit_code?: number;
  duration?: number;
}

interface OutputLine {
  line: string;
  stream: 'stdout' | 'stderr';
  timestamp: string;
}

const LANGUAGES = {
  python: { ext: python(), icon: '🐍', name: 'Python', starter: 'print("Hello, World!")' },
  javascript: { ext: javascript(), icon: '⚡', name: 'JavaScript', starter: 'console.log("Hello, World!");' },
  go: { ext: go(), icon: '🔵', name: 'Go', starter: 'package main\n\nimport "fmt"\n\nfunc main() {\n    fmt.Println("Hello, World!")\n}' },
  rust: { ext: rust(), icon: '🦀', name: 'Rust', starter: 'fn main() {\n    println!("Hello, World!");\n}' },
  java: { ext: java(), icon: '☕', name: 'Java', starter: 'public class Main {\n    public static void main(String[] args) {\n        System.out.println("Hello, World!");\n    }\n}' }
};

export function CodeExecutor() {
  const [selectedLanguage, setSelectedLanguage] = useState('python');
  const [code, setCode] = useState(LANGUAGES.python.starter);
  const [isExecuting, setIsExecuting] = useState(false);
  const [output, setOutput] = useState<OutputLine[]>([]);
  const [executionResult, setExecutionResult] = useState<ExecutionResult | null>(null);
  const [currentJobId, setCurrentJobId] = useState<string | null>(null);

  const wsRef = useRef<WebSocket | null>(null);
  const outputRef = useRef<HTMLDivElement>(null);

  // WebSocket connection for real-time output
  const connectWebSocket = useCallback((jobId: string) => {
    const wsUrl = `ws://localhost:3001/stream/${jobId}`;
    wsRef.current = new WebSocket(wsUrl);

    wsRef.current.onopen = () => {
      console.log('WebSocket connected for job:', jobId);
      wsRef.current?.send(JSON.stringify({
        type: 'subscribe',
        job_id: jobId
      }));
    };

    wsRef.current.onmessage = (event) => {
      const message = JSON.parse(event.data);

      if (message.type === 'output') {
        const outputLine: OutputLine = {
          line: message.data.line,
          stream: message.data.stream,
          timestamp: message.data.timestamp
        };

        setOutput(prev => [...prev, outputLine]);

        // Auto-scroll to bottom
        setTimeout(() => {
          if (outputRef.current) {
            outputRef.current.scrollTop = outputRef.current.scrollHeight;
          }
        }, 10);
      }
    };

    wsRef.current.onclose = () => {
      console.log('WebSocket disconnected');
    };

    wsRef.current.onerror = (error) => {
      console.error('WebSocket error:', error);
    };
  }, []);

  // Execute code
  const executeCode = useCallback(async () => {
    if (isExecuting) return;

    setIsExecuting(true);
    setOutput([]);
    setExecutionResult(null);

    try {
      const response = await fetch('http://localhost:3001/api/execute', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          language: selectedLanguage,
          source_code: code,
          workspace_id: 'web-editor'
        })
      });

      if (!response.ok) {
        throw new Error(`HTTP ${response.status}: ${response.statusText}`);
      }

      const result = await response.json();
      setCurrentJobId(result.job_id);

      // Connect WebSocket for streaming output
      connectWebSocket(result.job_id);

      // Poll for final result
      pollJobResult(result.job_id);

    } catch (error) {
      console.error('Execution failed:', error);
      setOutput([{
        line: `Error: ${error.message}`,
        stream: 'stderr',
        timestamp: new Date().toISOString()
      }]);
      setIsExecuting(false);
    }
  }, [code, selectedLanguage, isExecuting, connectWebSocket]);

  // Poll for job completion
  const pollJobResult = useCallback(async (jobId: string) => {
    const pollInterval = 1000; // 1 second
    const maxPolls = 60; // 60 seconds timeout
    let polls = 0;

    const poll = async () => {
      try {
        const response = await fetch(`http://localhost:3001/api/jobs/${jobId}`);
        const job = await response.json();

        if (job.status === 'completed' || job.status === 'failed') {
          setExecutionResult(job.result);
          setIsExecuting(false);

          // Close WebSocket
          if (wsRef.current) {
            wsRef.current.close();
          }

          return;
        }

        polls++;
        if (polls < maxPolls) {
          setTimeout(poll, pollInterval);
        } else {
          // Timeout
          setIsExecuting(false);
          setOutput(prev => [...prev, {
            line: 'Execution timeout - job may still be running',
            stream: 'stderr',
            timestamp: new Date().toISOString()
          }]);
        }
      } catch (error) {
        console.error('Polling error:', error);
        setIsExecuting(false);
      }
    };

    setTimeout(poll, pollInterval);
  }, []);

  // Stop execution
  const stopExecution = useCallback(async () => {
    if (currentJobId) {
      try {
        await fetch(`http://localhost:3001/api/jobs/${currentJobId}`, {
          method: 'DELETE'
        });
      } catch (error) {
        console.error('Failed to cancel job:', error);
      }
    }

    if (wsRef.current) {
      wsRef.current.close();
    }

    setIsExecuting(false);
    setCurrentJobId(null);
  }, [currentJobId]);

  // Copy code to clipboard
  const copyCode = useCallback(() => {
    navigator.clipboard.writeText(code);
  }, [code]);

  // Download output
  const downloadOutput = useCallback(() => {
    const outputText = output.map(line => 
      `[${line.timestamp}] ${line.stream}: ${line.line}`
    ).join('\n');

    const blob = new Blob([outputText], { type: 'text/plain' });
    const url = URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = `execution-output-${Date.now()}.txt`;
    a.click();
    URL.revokeObjectURL(url);
  }, [output]);

  // Language change handler
  useEffect(() => {
    setCode(LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].starter);
  }, [selectedLanguage]);

  // Cleanup WebSocket on unmount
  useEffect(() => {
    return () => {
      if (wsRef.current) {
        wsRef.current.close();
      }
    };
  }, []);

  return (
    <div className="max-w-6xl mx-auto p-4 space-y-4">
      {/* Header */}
      <div className="flex items-center justify-between">
        <h1 className="text-2xl font-bold">Online Code Compiler</h1>
        <Badge variant="outline" className="text-sm">
          Real-time execution with Docker containers
        </Badge>
      </div>

      {/* Language Selection */}
      <Card>
        <CardContent className="p-4">
          <Tabs value={selectedLanguage} onValueChange={setSelectedLanguage}>
            <TabsList className="grid w-full grid-cols-5">
              {Object.entries(LANGUAGES).map(([lang, config]) => (
                <TabsTrigger key={lang} value={lang} className="flex items-center gap-2">
                  <span>{config.icon}</span>
                  <span className="hidden sm:inline">{config.name}</span>
                </TabsTrigger>
              ))}
            </TabsList>
          </Tabs>
        </CardContent>
      </Card>

      {/* Code Editor */}
      <Card>
        <CardContent className="p-0">
          <div className="border-b p-4 flex items-center justify-between">
            <h3 className="font-semibold">Code Editor</h3>
            <div className="flex items-center gap-2">
              <Button
                variant="outline"
                size="sm"
                onClick={copyCode}
              >
                <Copy className="h-4 w-4 mr-2" />
                Copy
              </Button>
              <Button
                onClick={isExecuting ? stopExecution : executeCode}
                disabled={!code.trim()}
                variant={isExecuting ? "destructive" : "default"}
              >
                {isExecuting ? (
                  <>
                    <Square className="h-4 w-4 mr-2" />
                    Stop
                  </>
                ) : (
                  <>
                    <Play className="h-4 w-4 mr-2" />
                    Run {LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].name}
                  </>
                )}
              </Button>
            </div>
          </div>

          <CodeMirror
            value={code}
            onChange={(value) => setCode(value)}
            theme={githubDark}
            extensions={[LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].ext]}
            className="text-sm"
            basicSetup={{
              lineNumbers: true,
              foldGutter: true,
              dropCursor: false,
              allowMultipleSelections: false
            }}
          />
        </CardContent>
      </Card>

      {/* Output Panel */}
      <Card>
        <CardContent className="p-0">
          <div className="border-b p-4 flex items-center justify-between">
            <div className="flex items-center gap-2">
              <h3 className="font-semibold">Output</h3>
              {isExecuting && (
                <Badge variant="secondary" className="animate-pulse">
                  Executing...
                </Badge>
              )}
              {executionResult && (
                <Badge 
                  variant={executionResult.exit_code === 0 ? "default" : "destructive"}
                >
                  Exit code: {executionResult.exit_code}
                </Badge>
              )}
            </div>
            <div className="flex items-center gap-2">
              {output.length > 0 && (
                <Button
                  variant="outline"
                  size="sm"
                  onClick={downloadOutput}
                >
                  <Download className="h-4 w-4 mr-2" />
                  Download
                </Button>
              )}
              <Button
                variant="outline"
                size="sm"
                onClick={() => setOutput([])}
              >
                Clear
              </Button>
            </div>
          </div>

          <div 
            ref={outputRef}
            className="h-96 overflow-y-auto p-4 bg-gray-950 text-gray-100 font-mono text-sm"
          >
            {output.length === 0 ? (
              <div className="text-gray-500 italic">
                Click "Run" to execute your code. Output will appear here in real-time.
              </div>
            ) : (
              output.map((line, index) => (
                <div 
                  key={index} 
                  className={`whitespace-pre-wrap ${
                    line.stream === 'stderr' ? 'text-red-400' : 'text-gray-100'
                  }`}
                >
                  {line.line}
                </div>
              ))
            )}
          </div>
        </CardContent>
      </Card>

      {/* Execution Statistics */}
      {executionResult && (
        <Card>
          <CardContent className="p-4">
            <h3 className="font-semibold mb-2">Execution Statistics</h3>
            <div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm">
              <div>
                <span className="text-gray-500">Duration:</span>
                <div className="font-mono">{executionResult.duration || 0}ms</div>
              </div>
              <div>
                <span className="text-gray-500">Exit Code:</span>
                <div className="font-mono">{executionResult.exit_code}</div>
              </div>
              <div>
                <span className="text-gray-500">Language:</span>
                <div className="font-mono">{selectedLanguage}</div>
              </div>
              <div>
                <span className="text-gray-500">Job ID:</span>
                <div className="font-mono text-xs">{executionResult.job_id}</div>
              </div>
            </div>
          </CardContent>
        </Card>
      )}
    </div>
  );
}

Security & Isolation

The Security Challenge

Running arbitrary user code presents significant security risks:

Host System Access: Malicious code could access the host filesystem
Network Attacks: Code could scan internal networks or launch attacks
Resource Exhaustion: Infinite loops or memory bombs could crash servers
Data Exfiltration: Code could attempt to steal sensitive information
Privilege Escalation: Attempts to gain root or admin access

Docker Security Model

Docker provides multiple layers of security isolation:

# Docker container security configuration
version: '3.8'
services:
  runner-python:
    image: python:3.11-slim
    security_opt:
      - no-new-privileges:true
      - seccomp:unconfined  # May need custom seccomp profile
    cap_drop:
      - ALL
    cap_add:
      - SETGID
      - SETUID
    read_only: true
    tmpfs:
      - /tmp:exec,size=100m
      - /var/tmp:exec,size=100m
    ulimits:
      nproc: 64        # Limit number of processes
      nofile: 1024     # Limit open files
      memlock: 67108864 # Limit locked memory
    memory: 256m       # Memory limit
    cpus: '0.5'       # CPU limit
    pids_limit: 100   # Process limit
    networks:
      - isolated_network

networks:
  isolated_network:
    driver: bridge
    internal: true  # No external network access

Advanced Container Security

Implement additional security measures in the runner service:

func (m *Manager) createSecureContainer(language, jobID string) error {
    containerConfig := &container.Config{
        Image: m.config.Environments[language],
        Cmd:   []string{"tail", "-f", "/dev/null"},
        Env: []string{
            "HOME=/tmp",
            "USER=runner",
            "SHELL=/bin/bash",
        },
        WorkingDir: "/tmp/workspace",
        User:       "1000:1000", // Non-root user

        // Resource limits
        Memory:     256 * 1024 * 1024, // 256MB
        MemorySwap: 256 * 1024 * 1024, // No swap
        CpuShares:  512,                // Half CPU priority

        // Security options
        SecurityOpts: []string{
            "no-new-privileges:true",
            "seccomp=unconfined", // or custom profile
        },

        // Network isolation
        NetworkDisabled: true,
    }

    hostConfig := &container.HostConfig{
        // Resource constraints
        Resources: container.Resources{
            Memory:     256 * 1024 * 1024,
            MemorySwap: 256 * 1024 * 1024,
            CPUShares:  512,
            PidsLimit:  50,
            Ulimits: []*units.Ulimit{
                {Name: "nproc", Soft: 32, Hard: 32},
                {Name: "nofile", Soft: 256, Hard: 256},
            },
        },

        // Security
        ReadonlyRootfs: true,
        CapDrop:        []string{"ALL"},
        CapAdd:         []string{"SETGID", "SETUID"},

        // Temporary filesystems
        Tmpfs: map[string]string{
            "/tmp":     "exec,size=100m",
            "/var/tmp": "exec,size=100m",
        },

        // No privileged access
        Privileged: false,

        // Network isolation
        NetworkMode: "none",
    }

    // Create container
    resp, err := m.dockerClient.ContainerCreate(
        context.Background(),
        containerConfig,
        hostConfig,
        nil, // networking config
        nil, // platform
        fmt.Sprintf("runner-%s-%s", language, jobID),
    )

    if err != nil {
        return fmt.Errorf("failed to create container: %w", err)
    }

    // Start container
    if err := m.dockerClient.ContainerStart(
        context.Background(),
        resp.ID,
        types.ContainerStartOptions{},
    ); err != nil {
        return fmt.Errorf("failed to start container: %w", err)
    }

    return nil
}

Code Sanitization

Implement static analysis for dangerous patterns:

class CodeSanitizer {
  constructor() {
    this.dangerousPatterns = {
      filesystem: [
        /\bopen\s*\(\s*['"][\/\\]/,     // File system access
        /\bfile\s*\(\s*['"][\/\\]/,     // File operations
        /\bos\.system/,                  // OS commands
        /\bsubprocess/,                  // Process execution
        /\beval\s*\(/,                   // Code evaluation
        /\bexec\s*\(/,                   // Code execution
      ],
      network: [
        /\bsocket\s*\(/,                 // Network sockets
        /\burllib/,                      // URL operations
        /\brequests\./,                  // HTTP requests
        /\bhttplib/,                     // HTTP library
        /\bfetch\s*\(/,                  // Fetch API
      ],
      system: [
        /\bos\.getenv/,                  // Environment variables
        /\bprocess\.env/,                // Node.js environment
        /\b__import__/,                  // Dynamic imports
        /\brequire\s*\(\s*['"]child_process['"]/, // Child process
      ]
    };
  }

  analyze(code, language) {
    const issues = [];

    for (const [category, patterns] of Object.entries(this.dangerousPatterns)) {
      for (const pattern of patterns) {
        const matches = code.match(pattern);
        if (matches) {
          issues.push({
            category,
            pattern: pattern.toString(),
            match: matches[0],
            severity: this.getSeverity(category)
          });
        }
      }
    }

    return {
      safe: issues.length === 0,
      issues,
      score: this.calculateSafetyScore(issues)
    };
  }

  getSeverity(category) {
    const severityMap = {
      filesystem: 'high',
      network: 'medium',
      system: 'high'
    };
    return severityMap[category] || 'low';
  }

  calculateSafetyScore(issues) {
    const weights = { high: 10, medium: 5, low: 1 };
    const totalWeight = issues.reduce((sum, issue) => 
      sum + weights[issue.severity], 0);
    return Math.max(0, 100 - totalWeight);
  }
}

// Usage in API
app.post('/api/execute', validateCodeRequest, async (req, res) => {
  const sanitizer = new CodeSanitizer();
  const analysis = sanitizer.analyze(req.body.source_code, req.body.language);

  if (!analysis.safe && analysis.score < 50) {
    return res.status(400).json({
      error: 'Code contains potentially dangerous operations',
      issues: analysis.issues,
      safety_score: analysis.score
    });
  }

  // Proceed with execution...
});

Network Isolation

Implement network restrictions at multiple levels:

# Docker network setup with restrictions
docker network create --driver bridge \
  --subnet=172.20.0.0/16 \
  --opt com.docker.network.bridge.enable_icc=false \
  --opt com.docker.network.bridge.enable_ip_masquerade=false \
  isolated-execution

# Firewall rules for container network
iptables -I DOCKER-USER -s 172.20.0.0/16 -j DROP
iptables -I DOCKER-USER -s 172.20.0.0/16 -d 172.20.0.0/16 -j ACCEPT

Performance Optimization

Container Lifecycle Management

Optimize container startup and cleanup:

type ContainerPool struct {
    pools    map[string]*LanguagePool
    mutex    sync.RWMutex
    logger   *logrus.Logger
}

type LanguagePool struct {
    language     string
    containers   []*ContainerInstance
    available    chan *ContainerInstance
    maxSize      int
    currentSize  int
    mutex        sync.Mutex
}

type ContainerInstance struct {
    ID        string
    Language  string
    CreatedAt time.Time
    LastUsed  time.Time
    InUse     bool
}

func NewContainerPool(maxSize int, logger *logrus.Logger) *ContainerPool {
    return &ContainerPool{
        pools:  make(map[string]*LanguagePool),
        logger: logger,
    }
}

func (cp *ContainerPool) GetContainer(language string) (*ContainerInstance, error) {
    cp.mutex.RLock()
    pool, exists := cp.pools[language]
    cp.mutex.RUnlock()

    if !exists {
        cp.mutex.Lock()
        pool = &LanguagePool{
            language:  language,
            available: make(chan *ContainerInstance, 10),
            maxSize:   10,
        }
        cp.pools[language] = pool
        cp.mutex.Unlock()
    }

    // Try to get from available pool
    select {
    case container := <-pool.available:
        container.InUse = true
        container.LastUsed = time.Now()
        return container, nil
    default:
        // Create new container if under limit
        return cp.createNewContainer(pool)
    }
}

func (cp *ContainerPool) ReturnContainer(container *ContainerInstance) {
    cp.mutex.RLock()
    pool := cp.pools[container.Language]
    cp.mutex.RUnlock()

    container.InUse = false
    container.LastUsed = time.Now()

    // Clean container workspace
    cp.cleanContainerWorkspace(container)

    // Return to pool
    select {
    case pool.available <- container:
        // Successfully returned to pool
    default:
        // Pool is full, destroy container
        cp.destroyContainer(container)
    }
}

func (cp *ContainerPool) cleanContainerWorkspace(container *ContainerInstance) {
    // Execute cleanup commands in container
    cleanupCmd := []string{
        "docker", "exec", container.ID,
        "bash", "-c", "rm -rf /tmp/workspace/* 2>/dev/null || true"
    }

    exec.Command(cleanupCmd[0], cleanupCmd[1:]...).Run()
}

Memory Management

Implement intelligent memory management:

type MemoryManager struct {
    totalMemory    uint64
    usedMemory     uint64
    containerMem   map[string]uint64
    mutex          sync.RWMutex
    logger         *logrus.Logger
}

func (mm *MemoryManager) AllocateMemory(containerID string, requested uint64) error {
    mm.mutex.Lock()
    defer mm.mutex.Unlock()

    // Check if allocation would exceed limits
    if mm.usedMemory + requested > mm.totalMemory * 80 / 100 { // 80% threshold
        return fmt.Errorf("insufficient memory: %d MB requested, %d MB available", 
            requested/1024/1024, (mm.totalMemory-mm.usedMemory)/1024/1024)
    }

    mm.usedMemory += requested
    mm.containerMem[containerID] = requested

    mm.logger.Infof("Allocated %d MB to container %s", requested/1024/1024, containerID)
    return nil
}

func (mm *MemoryManager) ReleaseMemory(containerID string) {
    mm.mutex.Lock()
    defer mm.mutex.Unlock()

    if allocated, exists := mm.containerMem[containerID]; exists {
        mm.usedMemory -= allocated
        delete(mm.containerMem, containerID)
        mm.logger.Infof("Released %d MB from container %s", allocated/1024/1024, containerID)
    }
}

func (mm *MemoryManager) GetMemoryStats() map[string]interface{} {
    mm.mutex.RLock()
    defer mm.mutex.RUnlock()

    return map[string]interface{}{
        "total_mb":        mm.totalMemory / 1024 / 1024,
        "used_mb":         mm.usedMemory / 1024 / 1024,
        "available_mb":    (mm.totalMemory - mm.usedMemory) / 1024 / 1024,
        "utilization":     float64(mm.usedMemory) / float64(mm.totalMemory) * 100,
        "active_containers": len(mm.containerMem),
    }
}

Load Balancing

Implement intelligent load balancing:

type LoadBalancer struct {
    workers       []*WorkerNode
    roundRobin    int
    mutex         sync.Mutex
    healthChecker *HealthChecker
}

type WorkerNode struct {
    ID           string
    Address      string
    CPU          float64
    Memory       float64
    ActiveJobs   int
    MaxJobs      int
    LastSeen     time.Time
    Healthy      bool
}

func (lb *LoadBalancer) SelectWorker(job *ExecutionJob) (*WorkerNode, error) {
    lb.mutex.Lock()
    defer lb.mutex.Unlock()

    healthyWorkers := lb.getHealthyWorkers()
    if len(healthyWorkers) == 0 {
        return nil, fmt.Errorf("no healthy workers available")
    }

    // Sort by load (CPU + Memory + Active Jobs)
    sort.Slice(healthyWorkers, func(i, j int) bool {
        loadI := lb.calculateLoad(healthyWorkers[i])
        loadJ := lb.calculateLoad(healthyWorkers[j])
        return loadI < loadJ
    })

    // Select least loaded worker
    selected := healthyWorkers[0]
    selected.ActiveJobs++

    lb.logger.Infof("Selected worker %s (load: %.2f)", selected.ID, lb.calculateLoad(selected))
    return selected, nil
}

func (lb *LoadBalancer) calculateLoad(worker *WorkerNode) float64 {
    // Weighted load calculation
    cpuWeight := 0.3
    memoryWeight := 0.3
    jobWeight := 0.4

    cpuLoad := worker.CPU / 100.0
    memoryLoad := worker.Memory / 100.0
    jobLoad := float64(worker.ActiveJobs) / float64(worker.MaxJobs)

    return cpuWeight*cpuLoad + memoryWeight*memoryLoad + jobWeight*jobLoad
}

Production Deployment

Docker Compose Production Setup

version: '3.8'

services:
  # RabbitMQ cluster
  rabbitmq:
    image: rabbitmq:3.12-management
    hostname: rabbitmq-main
    environment:
      RABBITMQ_ERLANG_COOKIE: ${RABBITMQ_COOKIE}
      RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER}
      RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASS}
      RABBITMQ_DEFAULT_VHOST: /
    volumes:
      - rabbitmq_data:/var/lib/rabbitmq
      - ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
    networks:
      - backend
    deploy:
      replicas: 1
      resources:
        limits:
          memory: 1G
          cpus: '0.5'

  # Redis cluster
  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru
    volumes:
      - redis_data:/data
    networks:
      - backend
    deploy:
      replicas: 1
      resources:
        limits:
          memory: 512M
          cpus: '0.25'

  # API Backend
  api-backend:
    build:
      context: ./backend
      dockerfile: Dockerfile.production
    environment:
      NODE_ENV: production
      RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/
      REDIS_URL: redis://redis:6379
      LOG_LEVEL: info
      RATE_LIMIT_WINDOW: 60000
      RATE_LIMIT_MAX: 10
    depends_on:
      - rabbitmq
      - redis
    networks:
      - backend
      - frontend
    deploy:
      replicas: 2
      resources:
        limits:
          memory: 512M
          cpus: '0.5'
      update_config:
        order: start-first
        failure_action: rollback

  # Runner Service
  runner-service:
    build:
      context: ./runner
      dockerfile: Dockerfile.production
    environment:
      RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/
      REDIS_URL: redis://redis:6379
      LOG_LEVEL: info
      MAX_CONCURRENT_JOBS: 5
      WORKSPACE_DIR: /tmp/workspaces
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - runner_workspaces:/tmp/workspaces
    depends_on:
      - rabbitmq
      - redis
    networks:
      - backend
      - execution
    deploy:
      replicas: 3
      resources:
        limits:
          memory: 2G
          cpus: '1.0'
      placement:
        constraints:
          - node.role == worker

  # Frontend
  frontend:
    build:
      context: ./frontend
      dockerfile: Dockerfile.production
    environment:
      REACT_APP_API_URL: http://api-backend:3001
      REACT_APP_WS_URL: ws://api-backend:3001
    depends_on:
      - api-backend
    networks:
      - frontend
    deploy:
      replicas: 2
      resources:
        limits:
          memory: 256M
          cpus: '0.25'

  # Load Balancer
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/ssl/certs
    depends_on:
      - frontend
      - api-backend
    networks:
      - frontend
    deploy:
      replicas: 1
      resources:
        limits:
          memory: 128M
          cpus: '0.1'

volumes:
  rabbitmq_data:
  redis_data:
  runner_workspaces:

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay
  execution:
    driver: overlay
    internal: true

Kubernetes Deployment

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: code-compiler

---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: code-compiler
data:
  RABBITMQ_URL: "amqp://admin:password@rabbitmq:5672/"
  REDIS_URL: "redis://redis:6379"
  LOG_LEVEL: "info"
  MAX_CONCURRENT_JOBS: "5"

---
# runner-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: runner-service
  namespace: code-compiler
spec:
  replicas: 3
  selector:
    matchLabels:
      app: runner-service
  template:
    metadata:
      labels:
        app: runner-service
    spec:
      containers:
      - name: runner
        image: your-registry/runner-service:latest
        envFrom:
        - configMapRef:
            name: app-config
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        volumeMounts:
        - name: docker-sock
          mountPath: /var/run/docker.sock
        - name: workspaces
          mountPath: /tmp/workspaces
        securityContext:
          runAsNonRoot: true
          runAsUser: 1000
      volumes:
      - name: docker-sock
        hostPath:
          path: /var/run/docker.sock
      - name: workspaces
        emptyDir:
          sizeLimit: 10Gi

---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: runner-service-hpa
  namespace: code-compiler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: runner-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Monitoring and Observability

# monitoring-stack.yaml
version: '3.8'

services:
  # Prometheus
  prometheus:
    image: prom/prometheus:latest
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    networks:
      - monitoring

  # Grafana
  grafana:
    image: grafana/grafana:latest
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
      - ./grafana/datasources:/etc/grafana/provisioning/datasources
    networks:
      - monitoring

  # ELK Stack for Logs
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    environment:
      discovery.type: single-node
      xpack.security.enabled: false
    volumes:
      - elasticsearch_data:/usr/share/elasticsearch/data
    networks:
      - logging

  logstash:
    image: docker.elastic.co/logstash/logstash:8.11.0
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    depends_on:
      - elasticsearch
    networks:
      - logging

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.0
    environment:
      ELASTICSEARCH_HOSTS: http://elasticsearch:9200
    depends_on:
      - elasticsearch
    networks:
      - logging

volumes:
  prometheus_data:
  grafana_data:
  elasticsearch_data:

networks:
  monitoring:
  logging:

Key Learnings

1. Container Management is Complex

Key Challenges:

Cold Start Problem: Container creation takes 2-5 seconds
Resource Leaks: Containers not properly cleaned up
State Management: Persistent vs ephemeral container strategies
Network Isolation: Balancing security with functionality

Solutions Implemented:

Container pooling with pre-warmed instances
Automatic cleanup with garbage collection
Persistent containers with workspace isolation
Network-isolated execution environments

2. Real-time Streaming Requires Careful Architecture

Technical Insights:

WebSocket Management: Connection pooling and cleanup crucial
Message Ordering: Ensure output lines arrive in sequence
Buffer Management: Handle high-frequency output efficiently
Connection Recovery: Graceful handling of network issues

Best Practices:

Use Redis pub/sub for scalable streaming
Implement connection heartbeats
Buffer and batch small messages
Provide fallback to polling for unreliable connections

3. Security Cannot Be an Afterthought

Critical Security Measures:

Defense in Depth: Multiple security layers
Principle of Least Privilege: Minimal container permissions
Resource Limits: Prevent resource exhaustion attacks
Code Analysis: Static analysis before execution

Security Architecture:

┌─────────────────┐
│   Code Input    │
├─────────────────┤
│ Static Analysis │  ← First line of defense
├─────────────────┤
│ Rate Limiting   │  ← Prevent abuse
├─────────────────┤
│ Docker Sandbox  │  ← Isolation layer
├─────────────────┤
│ Resource Limits │  ← Resource protection
├─────────────────┤
│ Network Filter  │  ← Network restrictions
└─────────────────┘

4. Performance Optimization is Multi-Faceted

Optimization Areas:

Container Lifecycle: Pool management and reuse
Resource Allocation: Dynamic scaling based on load
Queue Management: Fair distribution and priority handling
Caching: Language environment and dependency caching

Performance Metrics to Track:

Container startup time
Execution latency
Queue depth
Resource utilization
Success/failure rates

5. Production Reliability Requires Operational Excellence

Observability Stack:

Metrics: Prometheus + Grafana for system health
Logging: ELK stack for centralized log analysis
Tracing: Distributed tracing for request flows
Alerting: PagerDuty integration for critical issues

Deployment Strategies:

Blue-green deployments for zero downtime
Canary releases for gradual rollouts
Circuit breakers for fault tolerance
Auto-scaling based on queue depth and CPU usage

6. Language-Specific Considerations

Each programming language has unique requirements:

Python:

Dependency management with pip
Virtual environment isolation
Import path security
Package installation caching

Node.js:

npm/yarn dependency resolution
Module loading restrictions
Event loop management
Memory garbage collection

Go:

Module system (go.mod)
Build caching for faster compilation
Static binary advantages
Goroutine resource management

Rust:

Cargo package management
Compilation time optimization
Memory safety guarantees
Target architecture handling

Java:

Classpath management
JVM startup optimization
Garbage collection tuning
Security manager configuration

Conclusion

Building a production-ready online code compiler is a journey that touches every aspect of modern distributed systems engineering. From container orchestration to real-time streaming, from security isolation to performance optimization, each component requires careful consideration and robust implementation.

The key to success lies in:

Robust Architecture: Design for failure and scale from day one
Security First: Implement security at every layer
Performance Focus: Optimize for user experience and resource efficiency
Operational Excellence: Monitor, measure, and continuously improve
Incremental Development: Start simple and add complexity gradually

The result should be a platform that feels immediate and reliable, allowing developers to focus on code rather than infrastructure. When users can execute code with the same confidence they have in their local development environment, you've achieved the goal of a truly powerful online code compiler.

Learning Resources

Essential Reading

Distributed Systems:

"Designing Data-Intensive Applications" by Martin Kleppmann - Comprehensive guide to distributed system patterns
"Building Microservices" by Sam Newman - Microservice architecture and communication patterns
"Site Reliability Engineering" by Google - Production system reliability practices

Container Technologies:

"Docker Deep Dive" by Nigel Poulton - Comprehensive Docker guide
"Kubernetes in Action" by Marko Lukša - Kubernetes orchestration patterns
"Container Security" by Liz Rice - Security best practices for containers

Real-time Systems:

"High Performance Browser Networking" by Ilya Grigorik - WebSocket and real-time communication
"Redis in Action" by Josiah Carlson - Redis patterns for real-time applications

Documentation and Specifications

Container Security:

Message Queues:

Performance Optimization:

Open Source Projects

Code Execution Platforms:

Judge0 - Online code execution system
HackerEarth API - Commercial code execution platform
Glot.io - Simple code execution service

Container Management:

Docker - Container runtime
Podman - Alternative container runtime
gVisor - Application kernel for containers

Message Queue Solutions:

RabbitMQ - Feature-rich message broker
Apache Kafka - High-throughput distributed streaming
Redis - In-memory data structure store

Tools and Development Environment

Development Tools:

Docker Desktop - Local container development
Kubernetes KIND - Local Kubernetes development
Minikube - Local Kubernetes cluster

Monitoring and Observability:

Prometheus - Metrics collection and alerting
Grafana - Metrics visualization and dashboards
ELK Stack - Centralized logging and analysis

Testing Frameworks:

Testcontainers - Integration testing with containers
k6 - Load testing for APIs and WebSockets
Artillery - Performance testing toolkit

With love from the Toki Space team

This tutorial represents our collective experience building Toki's code execution platform. The architecture and lessons shared here will help you build your own robust online code compiler. For questions or contributions, reach out to our engineering team at [email protected]

Table of Contents