By the Toki Space Team
Creating a production-grade online code compiler is one of the most complex and demanding projects in distributed systems. It calls for expertise in container orchestration, job queuing, real-time data flow, security sandboxing, and precise resource control. Unlike basic code execution tools, a full-fledged compiler platform must support multiple programming languages, manage parallel executions, stream outputs live, and recover gracefully from errors.
In this tutorial, you’ll learn how to build a fully functional online compiler using Docker for containerization, RabbitMQ for managing execution jobs, Redis for real-time communication, and React for the user interface. We’ll walk through the architecture, implementation, and deployment process—drawing from real-world experience building the code execution system behind Toki Space.
Please forgive me, if there are code execution errors. This is mainly meant to give you an idea of how it works under the hood. Let's dive in.
Table of Contents
- Architecture Overview
- Docker Container System
- Message Queue Integration
- Real-time Streaming
- Backend Implementation
- Frontend Development
- Security & Isolation
- Performance Optimization
- Production Deployment
- Key Learnings
Architecture Overview
System Components
Our online code compiler consists of five main components working together:
┌─────────────────┐ WebSocket ┌─────────────────┐
│ React │ ◄──────────────► │ Node.js │
│ Frontend │ │ Backend │
│ │ HTTP/REST │ │
└─────────────────┘ ◄──────────────► └─────────────────┘
│
│ Jobs
▼
┌─────────────────┐ ┌─────────────────┐
│ Redis │ │ RabbitMQ │
│ (Streaming) │ │ (Job Queue) │
└─────────────────┘ └─────────────────┘
▲ │
│ Results │ Jobs
│ ▼
└─────────── ┌─────────────────┐ ◄─────
│ Runner │
│ Service │
│ (Go) │
└─────────────────┘
│
│ Docker API
▼
┌─────────────────┐
│ Docker │
│ Containers │
│ (Multi-lang) │
└─────────────────┘
Core Technologies
- Frontend: React with TypeScript for the user interface
- Backend: Node.js with Express for API and WebSocket handling
- Runner Service: Go service for container management and code execution
- Message Queue: RabbitMQ for reliable job distribution
- Streaming: Redis for real-time output streaming
- Containers: Docker for secure code execution isolation
Design Principles
- Language Agnostic: Support for Python, Node.js, Go, Rust, Java, and more
- Secure Isolation: Each execution runs in a separate Docker container
- Real-time Feedback: Stream output as code executes
- Scalable Architecture: Horizontal scaling through message queues
- Fault Tolerance: Graceful handling of failures and timeouts
Docker Container System
The Challenge of Multi-Language Execution
Running user code safely requires solving several complex problems:
- Security Isolation: Prevent malicious code from accessing the host system
- Resource Limits: Control CPU, memory, and execution time
- Environment Setup: Provide language-specific tools and dependencies
- Cleanup: Remove containers and workspaces after execution
Container-Per-Language Architecture
Instead of spinning up new containers for each execution (which is slow), we use persistent containers per language:
type LanguageVM struct {
Language string
ContainerName string
IsRunning bool
WorkspaceDir string
mutex sync.Mutex
}
type Manager struct {
config config.FirecrackerConfig
logger *logrus.Logger
vms map[string]*LanguageVM
vmsMutex sync.RWMutex
}
Benefits of Persistent Containers:
- Fast Execution: No container startup overhead
- Warm Environments: Dependencies already installed
- Resource Efficiency: Reuse container resources
- Consistent State: Predictable execution environment
Container Initialization
Each language gets its own persistent container with pre-installed tools:
func (m *Manager) initializePersistentContainers() error {
m.logger.Info("Initializing persistent containers for all languages...")
for language := range m.config.Environments {
m.logger.Infof("Starting persistent container for language: %s", language)
vm := &LanguageVM{
Language: language,
ContainerName: fmt.Sprintf("runner-vm-%s", language),
WorkspaceDir: filepath.Join(m.config.WorkspaceDir, language),
IsRunning: false,
}
// Create language-specific workspace directory
if err := os.MkdirAll(vm.WorkspaceDir, 0755); err != nil {
return fmt.Errorf("failed to create workspace directory for %s: %w", language, err)
}
// Start the container
if err := m.startPersistentContainer(vm); err != nil {
m.logger.Errorf("Failed to start container for %s: %v", language, err)
continue
}
m.vms[language] = vm
}
return nil
}
Language-Specific Execution
Each language requires different setup and execution commands:
func (m *Manager) executeInPersistentContainer(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) {
var execCmd []string
workspacePath := fmt.Sprintf("/tmp/workspaces/%s/%s", vm.Language, req.JobID)
switch vm.Language {
case "python":
entryPoint := req.EntryPoint
if entryPoint == "" {
entryPoint = "main.py"
}
execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
fmt.Sprintf("cd %s && if [ -f requirements.txt ]; then pip install -r requirements.txt; fi && timeout 30 python %s", workspacePath, entryPoint)}
case "nodejs":
entryPoint := req.EntryPoint
if entryPoint == "" {
entryPoint = "index.js"
}
execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
fmt.Sprintf("cd %s && if [ -f package.json ]; then npm install; fi && timeout 30 node %s", workspacePath, entryPoint)}
case "go":
execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
fmt.Sprintf("cd %s && go mod tidy 2>/dev/null || true && timeout 30 go run .", workspacePath)}
case "rust":
execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
fmt.Sprintf("cd %s && timeout 30 cargo run", workspacePath)}
case "java":
entryPoint := req.EntryPoint
if entryPoint == "" {
entryPoint = "Main.java"
}
className := strings.TrimSuffix(entryPoint, ".java")
execCmd = []string{"docker", "exec", vm.ContainerName, "bash", "-c",
fmt.Sprintf("cd %s && javac %s && timeout 30 java %s", workspacePath, entryPoint, className)}
}
// Execute with timeout
execCtx, cancel := context.WithTimeout(ctx, 30*time.Second)
defer cancel()
cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...)
output, err := cmd.CombinedOutput()
result := &ExecutionResult{
JobID: req.JobID,
WorkspaceID: req.WorkspaceID,
Output: string(output),
ExitCode: 0,
}
if err != nil {
result.Error = err.Error()
result.ExitCode = 1
}
return result, nil
}
Key Implementation Details:
- Workspace Isolation: Each job gets its own directory within the container
- Dependency Management: Automatic installation of requirements.txt, package.json, etc.
- Timeout Protection: 30-second execution limit prevents infinite loops
- Error Handling: Capture both stdout and stderr for complete output
Message Queue Integration
Why RabbitMQ for Code Execution?
Code execution jobs have specific requirements that make RabbitMQ ideal:
- Reliability: Jobs must not be lost if a worker crashes
- Durability: Job queue survives server restarts
- Fair Distribution: Distribute jobs evenly across worker instances
- Dead Letter Queues: Handle failed jobs gracefully
- Priority Queues: Support urgent job execution
RabbitMQ Topology
Our message queue setup uses a topic exchange for flexible routing:
# Exchange Configuration
exchanges:
jobs:
name: "code-execution.jobs"
type: "topic"
durable: true
auto_delete: false
results:
name: "code-execution.results"
type: "topic"
durable: true
auto_delete: false
dead_letter:
name: "code-execution.dead-letter"
type: "direct"
durable: true
auto_delete: false
# Queue Configuration
queues:
job_prefix: "jobs"
result_prefix: "results"
dead_letter_suffix: "dlq"
Routing Keys Pattern:
-
jobs.python
- Python execution jobs -
jobs.nodejs
- Node.js execution jobs -
jobs.go
- Go execution jobs -
results.python
- Python execution results -
results.nodejs
- Node.js execution results
Job Message Format
Standardized job messages ensure compatibility across services:
interface ExecutionJob {
job_id: string; // Unique identifier
workspace_id: string; // User workspace
language: string; // Target language
source_code: string; // Code to execute
entry_point?: string; // Main file (optional)
dependencies?: string[]; // Package dependencies
timeout?: number; // Execution timeout
memory_limit?: number; // Memory limit in MB
}
Producer Implementation (Node.js Backend)
const amqp = require('amqplib');
class JobProducer {
constructor(rabbitmqUrl) {
this.rabbitmqUrl = rabbitmqUrl;
this.connection = null;
this.channel = null;
}
async connect() {
this.connection = await amqp.connect(this.rabbitmqUrl);
this.channel = await this.connection.createChannel();
// Declare exchanges
await this.channel.assertExchange('code-execution.jobs', 'topic', {
durable: true
});
await this.channel.assertExchange('code-execution.results', 'topic', {
durable: true
});
}
async submitJob(job) {
const routingKey = `jobs.${job.language}`;
const jobMessage = {
job_id: job.job_id,
workspace_id: job.workspace_id,
language: job.language,
url: this.createDataURL(job.source_code, job.entry_point),
entry_point: job.entry_point
};
await this.channel.publish(
'code-execution.jobs',
routingKey,
Buffer.from(JSON.stringify(jobMessage)),
{
persistent: true,
messageId: job.job_id,
timestamp: Date.now()
}
);
console.log(`Job ${job.job_id} submitted for ${job.language}`);
}
createDataURL(sourceCode, filename = 'main.py') {
// Create inline data URL for source code
return `data:text/plain;filename=${filename},${sourceCode}`;
}
async close() {
if (this.channel) await this.channel.close();
if (this.connection) await this.connection.close();
}
}
Consumer Implementation (Go Runner Service)
func (m *Manager) consumeRabbitMQMessages(config RabbitMQConfig) error {
conn, err := amqp.Dial(config.URL)
if err != nil {
return fmt.Errorf("failed to connect to RabbitMQ: %w", err)
}
defer conn.Close()
ch, err := conn.Channel()
if err != nil {
return fmt.Errorf("failed to open channel: %w", err)
}
defer ch.Close()
// Declare unified queue for all languages
jobQueue, err := ch.QueueDeclare(
"jobs.all-languages", // name
true, // durable
false, // delete when unused
false, // exclusive
false, // no-wait
nil, // arguments
)
if err != nil {
return fmt.Errorf("failed to declare job queue: %w", err)
}
// Bind queue to exchange for each language
languages := []string{"python", "nodejs", "go", "rust", "java"}
for _, lang := range languages {
err = ch.QueueBind(
jobQueue.Name,
fmt.Sprintf("jobs.%s", lang),
"code-execution.jobs",
false,
nil,
)
if err != nil {
return fmt.Errorf("failed to bind queue for %s: %w", lang, err)
}
}
// Set QoS for fair distribution
err = ch.Qos(1, 0, false)
if err != nil {
return fmt.Errorf("failed to set QoS: %w", err)
}
// Start consuming
msgs, err := ch.Consume(
jobQueue.Name,
"", // consumer tag
false, // auto-ack
false, // exclusive
false, // no-local
false, // no-wait
nil, // args
)
if err != nil {
return fmt.Errorf("failed to register consumer: %w", err)
}
for msg := range msgs {
go m.processJob(msg)
}
return nil
}
func (m *Manager) processJob(msg amqp.Delivery) {
var execReq ExecutionRequest
if err := json.Unmarshal(msg.Body, &execReq); err != nil {
m.logger.WithError(err).Error("Failed to parse job message")
msg.Nack(false, false) // Don't requeue invalid messages
return
}
// Execute the job
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
result, err := m.ExecuteCode(ctx, execReq)
cancel()
if err != nil {
m.logger.WithError(err).Error("Job execution failed")
result = &ExecutionResult{
JobID: execReq.JobID,
Output: "",
Error: err.Error(),
ExitCode: 1,
}
}
// Publish result
resultBytes, _ := json.Marshal(result)
err = msg.Channel.Publish(
"code-execution.results",
fmt.Sprintf("results.%s", execReq.Language),
false, // mandatory
false, // immediate
amqp.Publishing{
ContentType: "application/json",
Body: resultBytes,
})
if err != nil {
m.logger.WithError(err).Error("Failed to publish result")
msg.Nack(false, true) // Requeue on publish failure
} else {
msg.Ack(false) // Acknowledge successful processing
}
}
Real-time Streaming
The Challenge of Live Output
Unlike traditional job systems that return results after completion, code execution benefits from real-time output streaming:
- User Experience: Immediate feedback as code runs
- Long-Running Jobs: Show progress for lengthy operations
- Debugging: See output line-by-line for easier debugging
- Interactive Input: Support for programs requiring user input
Redis Pub/Sub for Streaming
Redis provides excellent pub/sub capabilities for real-time streaming:
// Redis streaming setup
const redis = require('redis');
class StreamingService {
constructor(redisUrl) {
this.publisher = redis.createClient({ url: redisUrl });
this.subscriber = redis.createClient({ url: redisUrl });
}
async connect() {
await this.publisher.connect();
await this.subscriber.connect();
}
// Publish output line from runner service
async publishOutput(jobId, line, stream = 'stdout') {
const message = {
job_id: jobId,
line: line,
stream: stream,
timestamp: new Date().toISOString()
};
await this.publisher.publish(
`execution:${jobId}`,
JSON.stringify(message)
);
}
// Subscribe to job output
async subscribeToJob(jobId, callback) {
await this.subscriber.subscribe(`execution:${jobId}`, (message) => {
const data = JSON.parse(message);
callback(data);
});
}
async unsubscribeFromJob(jobId) {
await this.subscriber.unsubscribe(`execution:${jobId}`);
}
}
WebSocket Integration
Connect Redis streams to frontend via WebSocket:
// WebSocket server integration
const WebSocket = require('ws');
class WebSocketManager {
constructor(server, streamingService) {
this.wss = new WebSocket.Server({ server });
this.streamingService = streamingService;
this.connections = new Map(); // jobId -> Set of WebSocket connections
this.setupWebSocketHandlers();
}
setupWebSocketHandlers() {
this.wss.on('connection', (ws) => {
ws.on('message', async (data) => {
const message = JSON.parse(data);
switch (message.type) {
case 'subscribe':
await this.subscribeToJobOutput(ws, message.job_id);
break;
case 'unsubscribe':
await this.unsubscribeFromJobOutput(ws, message.job_id);
break;
}
});
ws.on('close', () => {
this.cleanupConnection(ws);
});
});
}
async subscribeToJobOutput(ws, jobId) {
// Add connection to job subscription
if (!this.connections.has(jobId)) {
this.connections.set(jobId, new Set());
// Subscribe to Redis stream for this job
await this.streamingService.subscribeToJob(jobId, (data) => {
// Broadcast to all subscribed WebSocket connections
const connections = this.connections.get(jobId);
if (connections) {
connections.forEach(conn => {
if (conn.readyState === WebSocket.OPEN) {
conn.send(JSON.stringify({
type: 'output',
data: data
}));
}
});
}
});
}
this.connections.get(jobId).add(ws);
}
async unsubscribeFromJobOutput(ws, jobId) {
const connections = this.connections.get(jobId);
if (connections) {
connections.delete(ws);
// If no more connections, unsubscribe from Redis
if (connections.size === 0) {
await this.streamingService.unsubscribeFromJob(jobId);
this.connections.delete(jobId);
}
}
}
cleanupConnection(ws) {
// Remove connection from all job subscriptions
for (const [jobId, connections] of this.connections.entries()) {
connections.delete(ws);
if (connections.size === 0) {
this.streamingService.unsubscribeFromJob(jobId);
this.connections.delete(jobId);
}
}
}
}
Enhanced Runner with Streaming
Modify the runner service to stream output line-by-line:
func (m *Manager) executeWithStreaming(ctx context.Context, vm *LanguageVM, req ExecutionRequest) (*ExecutionResult, error) {
// ... command setup ...
cmd := exec.CommandContext(execCtx, execCmd[0], execCmd[1:]...)
// Create pipes for real-time output capture
stdout, err := cmd.StdoutPipe()
if err != nil {
return nil, fmt.Errorf("failed to create stdout pipe: %w", err)
}
stderr, err := cmd.StderrPipe()
if err != nil {
return nil, fmt.Errorf("failed to create stderr pipe: %w", err)
}
// Start the command
if err := cmd.Start(); err != nil {
return nil, fmt.Errorf("failed to start command: %w", err)
}
// Stream output in real-time
var outputBuffer strings.Builder
var wg sync.WaitGroup
wg.Add(2)
// Stream stdout
go func() {
defer wg.Done()
scanner := bufio.NewScanner(stdout)
for scanner.Scan() {
line := scanner.Text()
outputBuffer.WriteString(line + "\n")
// Publish to Redis for real-time streaming
m.publishStreamingOutput(req.JobID, line, "stdout")
}
}()
// Stream stderr
go func() {
defer wg.Done()
scanner := bufio.NewScanner(stderr)
for scanner.Scan() {
line := scanner.Text()
outputBuffer.WriteString(line + "\n")
// Publish to Redis for real-time streaming
m.publishStreamingOutput(req.JobID, line, "stderr")
}
}()
// Wait for command completion
err = cmd.Wait()
wg.Wait() // Wait for all output to be processed
result := &ExecutionResult{
JobID: req.JobID,
WorkspaceID: req.WorkspaceID,
Output: outputBuffer.String(),
ExitCode: 0,
}
if err != nil {
result.Error = err.Error()
result.ExitCode = 1
}
return result, nil
}
func (m *Manager) publishStreamingOutput(jobID, line, stream string) {
// This would integrate with your Redis client
message := StreamingOutput{
JobID: jobID,
Line: line,
Stream: stream,
Timestamp: time.Now(),
}
// Publish to Redis (implementation depends on your Redis client)
// m.redisClient.Publish(fmt.Sprintf("execution:%s", jobID), message)
}
This streaming approach provides real-time feedback to users, making the code execution feel immediate and interactive rather than a black-box operation.
Backend Implementation
Node.js API Server
The backend serves as the orchestration layer between the frontend and the execution infrastructure:
const express = require('express');
const WebSocket = require('ws');
const { v4: uuidv4 } = require('uuid');
const cors = require('cors');
class CodeExecutionAPI {
constructor() {
this.app = express();
this.server = null;
this.jobProducer = null;
this.streamingService = null;
this.wsManager = null;
this.activeJobs = new Map(); // Track running jobs
this.setupMiddleware();
this.setupRoutes();
}
setupMiddleware() {
this.app.use(cors());
this.app.use(express.json({ limit: '10mb' }));
this.app.use(express.urlencoded({ extended: true }));
// Request logging
this.app.use((req, res, next) => {
console.log(`${req.method} ${req.path} - ${new Date().toISOString()}`);
next();
});
}
setupRoutes() {
// Health check
this.app.get('/health', (req, res) => {
res.json({ status: 'ok', timestamp: new Date().toISOString() });
});
// Submit code execution job
this.app.post('/api/execute', async (req, res) => {
try {
const result = await this.handleCodeExecution(req.body);
res.json(result);
} catch (error) {
console.error('Execution error:', error);
res.status(500).json({
error: 'Execution failed',
message: error.message
});
}
});
// Get job status
this.app.get('/api/jobs/:jobId', (req, res) => {
const jobId = req.params.jobId;
const job = this.activeJobs.get(jobId);
if (!job) {
return res.status(404).json({ error: 'Job not found' });
}
res.json(job);
});
// List active jobs
this.app.get('/api/jobs', (req, res) => {
const jobs = Array.from(this.activeJobs.values());
res.json({ jobs, count: jobs.length });
});
// Cancel job
this.app.delete('/api/jobs/:jobId', async (req, res) => {
const jobId = req.params.jobId;
await this.cancelJob(jobId);
res.json({ message: 'Job cancelled' });
});
}
async handleCodeExecution(requestBody) {
const {
language,
source_code,
entry_point,
workspace_id = 'default',
timeout = 30
} = requestBody;
// Validate request
if (!language || !source_code) {
throw new Error('Language and source_code are required');
}
const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java'];
if (!supportedLanguages.includes(language)) {
throw new Error(`Unsupported language: ${language}`);
}
// Generate unique job ID
const jobId = uuidv4();
// Create job record
const job = {
job_id: jobId,
workspace_id,
language,
source_code,
entry_point,
timeout,
status: 'queued',
created_at: new Date().toISOString(),
updated_at: new Date().toISOString()
};
this.activeJobs.set(jobId, job);
// Submit to RabbitMQ
await this.jobProducer.submitJob(job);
// Update status
job.status = 'submitted';
job.updated_at = new Date().toISOString();
return {
job_id: jobId,
status: 'submitted',
message: 'Job submitted for execution',
stream_url: `/stream/${jobId}`
};
}
async cancelJob(jobId) {
const job = this.activeJobs.get(jobId);
if (job && job.status === 'running') {
// Send cancellation signal (implementation depends on your setup)
job.status = 'cancelled';
job.updated_at = new Date().toISOString();
}
}
async start(port = 3001) {
// Initialize services
await this.initializeServices();
// Start HTTP server
this.server = this.app.listen(port, () => {
console.log(`Code execution API running on port ${port}`);
});
// Setup WebSocket manager
this.wsManager = new WebSocketManager(this.server, this.streamingService);
// Setup result consumer
this.setupResultConsumer();
}
async initializeServices() {
// Initialize RabbitMQ producer
this.jobProducer = new JobProducer(process.env.RABBITMQ_URL);
await this.jobProducer.connect();
// Initialize Redis streaming
this.streamingService = new StreamingService(process.env.REDIS_URL);
await this.streamingService.connect();
console.log('All services initialized successfully');
}
setupResultConsumer() {
// Consumer for job results from RabbitMQ
const amqp = require('amqplib');
amqp.connect(process.env.RABBITMQ_URL)
.then(conn => conn.createChannel())
.then(ch => {
// Declare results queue
return ch.assertQueue('results.all', { durable: true })
.then(() => {
// Bind to results exchange
return ch.bindQueue('results.all', 'code-execution.results', 'results.*');
})
.then(() => {
// Consume results
return ch.consume('results.all', (msg) => {
if (msg) {
this.handleJobResult(JSON.parse(msg.content.toString()));
ch.ack(msg);
}
});
});
})
.catch(console.error);
}
handleJobResult(result) {
const job = this.activeJobs.get(result.job_id);
if (job) {
job.status = result.exit_code === 0 ? 'completed' : 'failed';
job.result = result;
job.updated_at = new Date().toISOString();
// Optionally clean up completed jobs after some time
setTimeout(() => {
this.activeJobs.delete(result.job_id);
}, 300000); // 5 minutes
}
}
async stop() {
if (this.server) {
this.server.close();
}
if (this.jobProducer) {
await this.jobProducer.close();
}
if (this.streamingService) {
await this.streamingService.close();
}
}
}
// Environment configuration
const config = {
port: process.env.PORT || 3001,
rabbitmq_url: process.env.RABBITMQ_URL || 'amqp://admin:admin123@localhost:5672',
redis_url: process.env.REDIS_URL || 'redis://localhost:6379'
};
// Start the server
const api = new CodeExecutionAPI();
api.start(config.port);
// Graceful shutdown
process.on('SIGINT', async () => {
console.log('Shutting down gracefully...');
await api.stop();
process.exit(0);
});
Express Middleware for Code Validation
Add security and validation middleware:
const rateLimit = require('express-rate-limit');
const validator = require('validator');
// Rate limiting for code execution
const executionLimiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 10, // Maximum 10 executions per minute per IP
message: 'Too many execution requests, please try again later',
standardHeaders: true,
legacyHeaders: false
});
// Code validation middleware
const validateCodeRequest = (req, res, next) => {
const { language, source_code, entry_point } = req.body;
// Check required fields
if (!language || !source_code) {
return res.status(400).json({
error: 'Missing required fields',
required: ['language', 'source_code']
});
}
// Validate language
const supportedLanguages = ['python', 'nodejs', 'go', 'rust', 'java'];
if (!supportedLanguages.includes(language)) {
return res.status(400).json({
error: 'Unsupported language',
supported: supportedLanguages
});
}
// Check code length (prevent abuse)
if (source_code.length > 100000) { // 100KB limit
return res.status(400).json({
error: 'Source code too large',
max_size: '100KB'
});
}
// Validate entry point if provided
if (entry_point && !validator.isAlphanumeric(entry_point.replace(/[._-]/g, ''))) {
return res.status(400).json({
error: 'Invalid entry point format'
});
}
// Basic security checks (can be expanded)
const dangerousPatterns = [
/rm\s+-rf/,
/sudo/,
/passwd/,
/\/etc\/passwd/,
/mkfs/,
/format/,
/del\s+\/[a-z]/i
];
for (const pattern of dangerousPatterns) {
if (pattern.test(source_code)) {
return res.status(400).json({
error: 'Code contains potentially dangerous operations'
});
}
}
next();
};
// Apply middleware to execution endpoint
app.post('/api/execute',
executionLimiter,
validateCodeRequest,
async (req, res) => {
// ... execution logic
}
);
Frontend Development
React Code Execution Component
Based on the CodeRunnerDemo.tsx, here's a production-ready implementation:
import React, { useState, useEffect, useCallback, useRef } from 'react';
import { Button } from '@/components/ui/button';
import { Card, CardContent } from '@/components/ui/card';
import { Badge } from '@/components/ui/badge';
import { Tabs, TabsList, TabsTrigger, TabsContent } from '@/components/ui/tabs';
import { Play, Square, Copy, Download, Settings } from 'lucide-react';
import CodeMirror from '@uiw/react-codemirror';
import { githubDark } from '@uiw/codemirror-theme-github';
import { javascript } from '@codemirror/lang-javascript';
import { python } from '@codemirror/lang-python';
import { rust } from '@codemirror/lang-rust';
import { go } from '@codemirror/lang-go';
import { java } from '@codemirror/lang-java';
interface ExecutionResult {
job_id: string;
status: string;
output?: string;
error?: string;
exit_code?: number;
duration?: number;
}
interface OutputLine {
line: string;
stream: 'stdout' | 'stderr';
timestamp: string;
}
const LANGUAGES = {
python: { ext: python(), icon: '🐍', name: 'Python', starter: 'print("Hello, World!")' },
javascript: { ext: javascript(), icon: '⚡', name: 'JavaScript', starter: 'console.log("Hello, World!");' },
go: { ext: go(), icon: '🔵', name: 'Go', starter: 'package main\n\nimport "fmt"\n\nfunc main() {\n fmt.Println("Hello, World!")\n}' },
rust: { ext: rust(), icon: '🦀', name: 'Rust', starter: 'fn main() {\n println!("Hello, World!");\n}' },
java: { ext: java(), icon: '☕', name: 'Java', starter: 'public class Main {\n public static void main(String[] args) {\n System.out.println("Hello, World!");\n }\n}' }
};
export function CodeExecutor() {
const [selectedLanguage, setSelectedLanguage] = useState('python');
const [code, setCode] = useState(LANGUAGES.python.starter);
const [isExecuting, setIsExecuting] = useState(false);
const [output, setOutput] = useState<OutputLine[]>([]);
const [executionResult, setExecutionResult] = useState<ExecutionResult | null>(null);
const [currentJobId, setCurrentJobId] = useState<string | null>(null);
const wsRef = useRef<WebSocket | null>(null);
const outputRef = useRef<HTMLDivElement>(null);
// WebSocket connection for real-time output
const connectWebSocket = useCallback((jobId: string) => {
const wsUrl = `ws://localhost:3001/stream/${jobId}`;
wsRef.current = new WebSocket(wsUrl);
wsRef.current.onopen = () => {
console.log('WebSocket connected for job:', jobId);
wsRef.current?.send(JSON.stringify({
type: 'subscribe',
job_id: jobId
}));
};
wsRef.current.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'output') {
const outputLine: OutputLine = {
line: message.data.line,
stream: message.data.stream,
timestamp: message.data.timestamp
};
setOutput(prev => [...prev, outputLine]);
// Auto-scroll to bottom
setTimeout(() => {
if (outputRef.current) {
outputRef.current.scrollTop = outputRef.current.scrollHeight;
}
}, 10);
}
};
wsRef.current.onclose = () => {
console.log('WebSocket disconnected');
};
wsRef.current.onerror = (error) => {
console.error('WebSocket error:', error);
};
}, []);
// Execute code
const executeCode = useCallback(async () => {
if (isExecuting) return;
setIsExecuting(true);
setOutput([]);
setExecutionResult(null);
try {
const response = await fetch('http://localhost:3001/api/execute', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
language: selectedLanguage,
source_code: code,
workspace_id: 'web-editor'
})
});
if (!response.ok) {
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const result = await response.json();
setCurrentJobId(result.job_id);
// Connect WebSocket for streaming output
connectWebSocket(result.job_id);
// Poll for final result
pollJobResult(result.job_id);
} catch (error) {
console.error('Execution failed:', error);
setOutput([{
line: `Error: ${error.message}`,
stream: 'stderr',
timestamp: new Date().toISOString()
}]);
setIsExecuting(false);
}
}, [code, selectedLanguage, isExecuting, connectWebSocket]);
// Poll for job completion
const pollJobResult = useCallback(async (jobId: string) => {
const pollInterval = 1000; // 1 second
const maxPolls = 60; // 60 seconds timeout
let polls = 0;
const poll = async () => {
try {
const response = await fetch(`http://localhost:3001/api/jobs/${jobId}`);
const job = await response.json();
if (job.status === 'completed' || job.status === 'failed') {
setExecutionResult(job.result);
setIsExecuting(false);
// Close WebSocket
if (wsRef.current) {
wsRef.current.close();
}
return;
}
polls++;
if (polls < maxPolls) {
setTimeout(poll, pollInterval);
} else {
// Timeout
setIsExecuting(false);
setOutput(prev => [...prev, {
line: 'Execution timeout - job may still be running',
stream: 'stderr',
timestamp: new Date().toISOString()
}]);
}
} catch (error) {
console.error('Polling error:', error);
setIsExecuting(false);
}
};
setTimeout(poll, pollInterval);
}, []);
// Stop execution
const stopExecution = useCallback(async () => {
if (currentJobId) {
try {
await fetch(`http://localhost:3001/api/jobs/${currentJobId}`, {
method: 'DELETE'
});
} catch (error) {
console.error('Failed to cancel job:', error);
}
}
if (wsRef.current) {
wsRef.current.close();
}
setIsExecuting(false);
setCurrentJobId(null);
}, [currentJobId]);
// Copy code to clipboard
const copyCode = useCallback(() => {
navigator.clipboard.writeText(code);
}, [code]);
// Download output
const downloadOutput = useCallback(() => {
const outputText = output.map(line =>
`[${line.timestamp}] ${line.stream}: ${line.line}`
).join('\n');
const blob = new Blob([outputText], { type: 'text/plain' });
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `execution-output-${Date.now()}.txt`;
a.click();
URL.revokeObjectURL(url);
}, [output]);
// Language change handler
useEffect(() => {
setCode(LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].starter);
}, [selectedLanguage]);
// Cleanup WebSocket on unmount
useEffect(() => {
return () => {
if (wsRef.current) {
wsRef.current.close();
}
};
}, []);
return (
<div className="max-w-6xl mx-auto p-4 space-y-4">
{/* Header */}
<div className="flex items-center justify-between">
<h1 className="text-2xl font-bold">Online Code Compiler</h1>
<Badge variant="outline" className="text-sm">
Real-time execution with Docker containers
</Badge>
</div>
{/* Language Selection */}
<Card>
<CardContent className="p-4">
<Tabs value={selectedLanguage} onValueChange={setSelectedLanguage}>
<TabsList className="grid w-full grid-cols-5">
{Object.entries(LANGUAGES).map(([lang, config]) => (
<TabsTrigger key={lang} value={lang} className="flex items-center gap-2">
<span>{config.icon}</span>
<span className="hidden sm:inline">{config.name}</span>
</TabsTrigger>
))}
</TabsList>
</Tabs>
</CardContent>
</Card>
{/* Code Editor */}
<Card>
<CardContent className="p-0">
<div className="border-b p-4 flex items-center justify-between">
<h3 className="font-semibold">Code Editor</h3>
<div className="flex items-center gap-2">
<Button
variant="outline"
size="sm"
onClick={copyCode}
>
<Copy className="h-4 w-4 mr-2" />
Copy
</Button>
<Button
onClick={isExecuting ? stopExecution : executeCode}
disabled={!code.trim()}
variant={isExecuting ? "destructive" : "default"}
>
{isExecuting ? (
<>
<Square className="h-4 w-4 mr-2" />
Stop
</>
) : (
<>
<Play className="h-4 w-4 mr-2" />
Run {LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].name}
</>
)}
</Button>
</div>
</div>
<CodeMirror
value={code}
onChange={(value) => setCode(value)}
theme={githubDark}
extensions={[LANGUAGES[selectedLanguage as keyof typeof LANGUAGES].ext]}
className="text-sm"
basicSetup={{
lineNumbers: true,
foldGutter: true,
dropCursor: false,
allowMultipleSelections: false
}}
/>
</CardContent>
</Card>
{/* Output Panel */}
<Card>
<CardContent className="p-0">
<div className="border-b p-4 flex items-center justify-between">
<div className="flex items-center gap-2">
<h3 className="font-semibold">Output</h3>
{isExecuting && (
<Badge variant="secondary" className="animate-pulse">
Executing...
</Badge>
)}
{executionResult && (
<Badge
variant={executionResult.exit_code === 0 ? "default" : "destructive"}
>
Exit code: {executionResult.exit_code}
</Badge>
)}
</div>
<div className="flex items-center gap-2">
{output.length > 0 && (
<Button
variant="outline"
size="sm"
onClick={downloadOutput}
>
<Download className="h-4 w-4 mr-2" />
Download
</Button>
)}
<Button
variant="outline"
size="sm"
onClick={() => setOutput([])}
>
Clear
</Button>
</div>
</div>
<div
ref={outputRef}
className="h-96 overflow-y-auto p-4 bg-gray-950 text-gray-100 font-mono text-sm"
>
{output.length === 0 ? (
<div className="text-gray-500 italic">
Click "Run" to execute your code. Output will appear here in real-time.
</div>
) : (
output.map((line, index) => (
<div
key={index}
className={`whitespace-pre-wrap ${
line.stream === 'stderr' ? 'text-red-400' : 'text-gray-100'
}`}
>
{line.line}
</div>
))
)}
</div>
</CardContent>
</Card>
{/* Execution Statistics */}
{executionResult && (
<Card>
<CardContent className="p-4">
<h3 className="font-semibold mb-2">Execution Statistics</h3>
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 text-sm">
<div>
<span className="text-gray-500">Duration:</span>
<div className="font-mono">{executionResult.duration || 0}ms</div>
</div>
<div>
<span className="text-gray-500">Exit Code:</span>
<div className="font-mono">{executionResult.exit_code}</div>
</div>
<div>
<span className="text-gray-500">Language:</span>
<div className="font-mono">{selectedLanguage}</div>
</div>
<div>
<span className="text-gray-500">Job ID:</span>
<div className="font-mono text-xs">{executionResult.job_id}</div>
</div>
</div>
</CardContent>
</Card>
)}
</div>
);
}
Security & Isolation
The Security Challenge
Running arbitrary user code presents significant security risks:
- Host System Access: Malicious code could access the host filesystem
- Network Attacks: Code could scan internal networks or launch attacks
- Resource Exhaustion: Infinite loops or memory bombs could crash servers
- Data Exfiltration: Code could attempt to steal sensitive information
- Privilege Escalation: Attempts to gain root or admin access
Docker Security Model
Docker provides multiple layers of security isolation:
# Docker container security configuration
version: '3.8'
services:
runner-python:
image: python:3.11-slim
security_opt:
- no-new-privileges:true
- seccomp:unconfined # May need custom seccomp profile
cap_drop:
- ALL
cap_add:
- SETGID
- SETUID
read_only: true
tmpfs:
- /tmp:exec,size=100m
- /var/tmp:exec,size=100m
ulimits:
nproc: 64 # Limit number of processes
nofile: 1024 # Limit open files
memlock: 67108864 # Limit locked memory
memory: 256m # Memory limit
cpus: '0.5' # CPU limit
pids_limit: 100 # Process limit
networks:
- isolated_network
networks:
isolated_network:
driver: bridge
internal: true # No external network access
Advanced Container Security
Implement additional security measures in the runner service:
func (m *Manager) createSecureContainer(language, jobID string) error {
containerConfig := &container.Config{
Image: m.config.Environments[language],
Cmd: []string{"tail", "-f", "/dev/null"},
Env: []string{
"HOME=/tmp",
"USER=runner",
"SHELL=/bin/bash",
},
WorkingDir: "/tmp/workspace",
User: "1000:1000", // Non-root user
// Resource limits
Memory: 256 * 1024 * 1024, // 256MB
MemorySwap: 256 * 1024 * 1024, // No swap
CpuShares: 512, // Half CPU priority
// Security options
SecurityOpts: []string{
"no-new-privileges:true",
"seccomp=unconfined", // or custom profile
},
// Network isolation
NetworkDisabled: true,
}
hostConfig := &container.HostConfig{
// Resource constraints
Resources: container.Resources{
Memory: 256 * 1024 * 1024,
MemorySwap: 256 * 1024 * 1024,
CPUShares: 512,
PidsLimit: 50,
Ulimits: []*units.Ulimit{
{Name: "nproc", Soft: 32, Hard: 32},
{Name: "nofile", Soft: 256, Hard: 256},
},
},
// Security
ReadonlyRootfs: true,
CapDrop: []string{"ALL"},
CapAdd: []string{"SETGID", "SETUID"},
// Temporary filesystems
Tmpfs: map[string]string{
"/tmp": "exec,size=100m",
"/var/tmp": "exec,size=100m",
},
// No privileged access
Privileged: false,
// Network isolation
NetworkMode: "none",
}
// Create container
resp, err := m.dockerClient.ContainerCreate(
context.Background(),
containerConfig,
hostConfig,
nil, // networking config
nil, // platform
fmt.Sprintf("runner-%s-%s", language, jobID),
)
if err != nil {
return fmt.Errorf("failed to create container: %w", err)
}
// Start container
if err := m.dockerClient.ContainerStart(
context.Background(),
resp.ID,
types.ContainerStartOptions{},
); err != nil {
return fmt.Errorf("failed to start container: %w", err)
}
return nil
}
Code Sanitization
Implement static analysis for dangerous patterns:
class CodeSanitizer {
constructor() {
this.dangerousPatterns = {
filesystem: [
/\bopen\s*\(\s*['"][\/\\]/, // File system access
/\bfile\s*\(\s*['"][\/\\]/, // File operations
/\bos\.system/, // OS commands
/\bsubprocess/, // Process execution
/\beval\s*\(/, // Code evaluation
/\bexec\s*\(/, // Code execution
],
network: [
/\bsocket\s*\(/, // Network sockets
/\burllib/, // URL operations
/\brequests\./, // HTTP requests
/\bhttplib/, // HTTP library
/\bfetch\s*\(/, // Fetch API
],
system: [
/\bos\.getenv/, // Environment variables
/\bprocess\.env/, // Node.js environment
/\b__import__/, // Dynamic imports
/\brequire\s*\(\s*['"]child_process['"]/, // Child process
]
};
}
analyze(code, language) {
const issues = [];
for (const [category, patterns] of Object.entries(this.dangerousPatterns)) {
for (const pattern of patterns) {
const matches = code.match(pattern);
if (matches) {
issues.push({
category,
pattern: pattern.toString(),
match: matches[0],
severity: this.getSeverity(category)
});
}
}
}
return {
safe: issues.length === 0,
issues,
score: this.calculateSafetyScore(issues)
};
}
getSeverity(category) {
const severityMap = {
filesystem: 'high',
network: 'medium',
system: 'high'
};
return severityMap[category] || 'low';
}
calculateSafetyScore(issues) {
const weights = { high: 10, medium: 5, low: 1 };
const totalWeight = issues.reduce((sum, issue) =>
sum + weights[issue.severity], 0);
return Math.max(0, 100 - totalWeight);
}
}
// Usage in API
app.post('/api/execute', validateCodeRequest, async (req, res) => {
const sanitizer = new CodeSanitizer();
const analysis = sanitizer.analyze(req.body.source_code, req.body.language);
if (!analysis.safe && analysis.score < 50) {
return res.status(400).json({
error: 'Code contains potentially dangerous operations',
issues: analysis.issues,
safety_score: analysis.score
});
}
// Proceed with execution...
});
Network Isolation
Implement network restrictions at multiple levels:
# Docker network setup with restrictions
docker network create --driver bridge \
--subnet=172.20.0.0/16 \
--opt com.docker.network.bridge.enable_icc=false \
--opt com.docker.network.bridge.enable_ip_masquerade=false \
isolated-execution
# Firewall rules for container network
iptables -I DOCKER-USER -s 172.20.0.0/16 -j DROP
iptables -I DOCKER-USER -s 172.20.0.0/16 -d 172.20.0.0/16 -j ACCEPT
Performance Optimization
Container Lifecycle Management
Optimize container startup and cleanup:
type ContainerPool struct {
pools map[string]*LanguagePool
mutex sync.RWMutex
logger *logrus.Logger
}
type LanguagePool struct {
language string
containers []*ContainerInstance
available chan *ContainerInstance
maxSize int
currentSize int
mutex sync.Mutex
}
type ContainerInstance struct {
ID string
Language string
CreatedAt time.Time
LastUsed time.Time
InUse bool
}
func NewContainerPool(maxSize int, logger *logrus.Logger) *ContainerPool {
return &ContainerPool{
pools: make(map[string]*LanguagePool),
logger: logger,
}
}
func (cp *ContainerPool) GetContainer(language string) (*ContainerInstance, error) {
cp.mutex.RLock()
pool, exists := cp.pools[language]
cp.mutex.RUnlock()
if !exists {
cp.mutex.Lock()
pool = &LanguagePool{
language: language,
available: make(chan *ContainerInstance, 10),
maxSize: 10,
}
cp.pools[language] = pool
cp.mutex.Unlock()
}
// Try to get from available pool
select {
case container := <-pool.available:
container.InUse = true
container.LastUsed = time.Now()
return container, nil
default:
// Create new container if under limit
return cp.createNewContainer(pool)
}
}
func (cp *ContainerPool) ReturnContainer(container *ContainerInstance) {
cp.mutex.RLock()
pool := cp.pools[container.Language]
cp.mutex.RUnlock()
container.InUse = false
container.LastUsed = time.Now()
// Clean container workspace
cp.cleanContainerWorkspace(container)
// Return to pool
select {
case pool.available <- container:
// Successfully returned to pool
default:
// Pool is full, destroy container
cp.destroyContainer(container)
}
}
func (cp *ContainerPool) cleanContainerWorkspace(container *ContainerInstance) {
// Execute cleanup commands in container
cleanupCmd := []string{
"docker", "exec", container.ID,
"bash", "-c", "rm -rf /tmp/workspace/* 2>/dev/null || true"
}
exec.Command(cleanupCmd[0], cleanupCmd[1:]...).Run()
}
Memory Management
Implement intelligent memory management:
type MemoryManager struct {
totalMemory uint64
usedMemory uint64
containerMem map[string]uint64
mutex sync.RWMutex
logger *logrus.Logger
}
func (mm *MemoryManager) AllocateMemory(containerID string, requested uint64) error {
mm.mutex.Lock()
defer mm.mutex.Unlock()
// Check if allocation would exceed limits
if mm.usedMemory + requested > mm.totalMemory * 80 / 100 { // 80% threshold
return fmt.Errorf("insufficient memory: %d MB requested, %d MB available",
requested/1024/1024, (mm.totalMemory-mm.usedMemory)/1024/1024)
}
mm.usedMemory += requested
mm.containerMem[containerID] = requested
mm.logger.Infof("Allocated %d MB to container %s", requested/1024/1024, containerID)
return nil
}
func (mm *MemoryManager) ReleaseMemory(containerID string) {
mm.mutex.Lock()
defer mm.mutex.Unlock()
if allocated, exists := mm.containerMem[containerID]; exists {
mm.usedMemory -= allocated
delete(mm.containerMem, containerID)
mm.logger.Infof("Released %d MB from container %s", allocated/1024/1024, containerID)
}
}
func (mm *MemoryManager) GetMemoryStats() map[string]interface{} {
mm.mutex.RLock()
defer mm.mutex.RUnlock()
return map[string]interface{}{
"total_mb": mm.totalMemory / 1024 / 1024,
"used_mb": mm.usedMemory / 1024 / 1024,
"available_mb": (mm.totalMemory - mm.usedMemory) / 1024 / 1024,
"utilization": float64(mm.usedMemory) / float64(mm.totalMemory) * 100,
"active_containers": len(mm.containerMem),
}
}
Load Balancing
Implement intelligent load balancing:
type LoadBalancer struct {
workers []*WorkerNode
roundRobin int
mutex sync.Mutex
healthChecker *HealthChecker
}
type WorkerNode struct {
ID string
Address string
CPU float64
Memory float64
ActiveJobs int
MaxJobs int
LastSeen time.Time
Healthy bool
}
func (lb *LoadBalancer) SelectWorker(job *ExecutionJob) (*WorkerNode, error) {
lb.mutex.Lock()
defer lb.mutex.Unlock()
healthyWorkers := lb.getHealthyWorkers()
if len(healthyWorkers) == 0 {
return nil, fmt.Errorf("no healthy workers available")
}
// Sort by load (CPU + Memory + Active Jobs)
sort.Slice(healthyWorkers, func(i, j int) bool {
loadI := lb.calculateLoad(healthyWorkers[i])
loadJ := lb.calculateLoad(healthyWorkers[j])
return loadI < loadJ
})
// Select least loaded worker
selected := healthyWorkers[0]
selected.ActiveJobs++
lb.logger.Infof("Selected worker %s (load: %.2f)", selected.ID, lb.calculateLoad(selected))
return selected, nil
}
func (lb *LoadBalancer) calculateLoad(worker *WorkerNode) float64 {
// Weighted load calculation
cpuWeight := 0.3
memoryWeight := 0.3
jobWeight := 0.4
cpuLoad := worker.CPU / 100.0
memoryLoad := worker.Memory / 100.0
jobLoad := float64(worker.ActiveJobs) / float64(worker.MaxJobs)
return cpuWeight*cpuLoad + memoryWeight*memoryLoad + jobWeight*jobLoad
}
Production Deployment
Docker Compose Production Setup
version: '3.8'
services:
# RabbitMQ cluster
rabbitmq:
image: rabbitmq:3.12-management
hostname: rabbitmq-main
environment:
RABBITMQ_ERLANG_COOKIE: ${RABBITMQ_COOKIE}
RABBITMQ_DEFAULT_USER: ${RABBITMQ_USER}
RABBITMQ_DEFAULT_PASS: ${RABBITMQ_PASS}
RABBITMQ_DEFAULT_VHOST: /
volumes:
- rabbitmq_data:/var/lib/rabbitmq
- ./rabbitmq.conf:/etc/rabbitmq/rabbitmq.conf
networks:
- backend
deploy:
replicas: 1
resources:
limits:
memory: 1G
cpus: '0.5'
# Redis cluster
redis:
image: redis:7-alpine
command: redis-server --appendonly yes --maxmemory 512mb --maxmemory-policy allkeys-lru
volumes:
- redis_data:/data
networks:
- backend
deploy:
replicas: 1
resources:
limits:
memory: 512M
cpus: '0.25'
# API Backend
api-backend:
build:
context: ./backend
dockerfile: Dockerfile.production
environment:
NODE_ENV: production
RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/
REDIS_URL: redis://redis:6379
LOG_LEVEL: info
RATE_LIMIT_WINDOW: 60000
RATE_LIMIT_MAX: 10
depends_on:
- rabbitmq
- redis
networks:
- backend
- frontend
deploy:
replicas: 2
resources:
limits:
memory: 512M
cpus: '0.5'
update_config:
order: start-first
failure_action: rollback
# Runner Service
runner-service:
build:
context: ./runner
dockerfile: Dockerfile.production
environment:
RABBITMQ_URL: amqp://${RABBITMQ_USER}:${RABBITMQ_PASS}@rabbitmq:5672/
REDIS_URL: redis://redis:6379
LOG_LEVEL: info
MAX_CONCURRENT_JOBS: 5
WORKSPACE_DIR: /tmp/workspaces
volumes:
- /var/run/docker.sock:/var/run/docker.sock
- runner_workspaces:/tmp/workspaces
depends_on:
- rabbitmq
- redis
networks:
- backend
- execution
deploy:
replicas: 3
resources:
limits:
memory: 2G
cpus: '1.0'
placement:
constraints:
- node.role == worker
# Frontend
frontend:
build:
context: ./frontend
dockerfile: Dockerfile.production
environment:
REACT_APP_API_URL: http://api-backend:3001
REACT_APP_WS_URL: ws://api-backend:3001
depends_on:
- api-backend
networks:
- frontend
deploy:
replicas: 2
resources:
limits:
memory: 256M
cpus: '0.25'
# Load Balancer
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
- ./ssl:/etc/ssl/certs
depends_on:
- frontend
- api-backend
networks:
- frontend
deploy:
replicas: 1
resources:
limits:
memory: 128M
cpus: '0.1'
volumes:
rabbitmq_data:
redis_data:
runner_workspaces:
networks:
frontend:
driver: overlay
backend:
driver: overlay
execution:
driver: overlay
internal: true
Kubernetes Deployment
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: code-compiler
---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: code-compiler
data:
RABBITMQ_URL: "amqp://admin:password@rabbitmq:5672/"
REDIS_URL: "redis://redis:6379"
LOG_LEVEL: "info"
MAX_CONCURRENT_JOBS: "5"
---
# runner-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: runner-service
namespace: code-compiler
spec:
replicas: 3
selector:
matchLabels:
app: runner-service
template:
metadata:
labels:
app: runner-service
spec:
containers:
- name: runner
image: your-registry/runner-service:latest
envFrom:
- configMapRef:
name: app-config
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeMounts:
- name: docker-sock
mountPath: /var/run/docker.sock
- name: workspaces
mountPath: /tmp/workspaces
securityContext:
runAsNonRoot: true
runAsUser: 1000
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
- name: workspaces
emptyDir:
sizeLimit: 10Gi
---
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: runner-service-hpa
namespace: code-compiler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: runner-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Monitoring and Observability
# monitoring-stack.yaml
version: '3.8'
services:
# Prometheus
prometheus:
image: prom/prometheus:latest
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
networks:
- monitoring
# Grafana
grafana:
image: grafana/grafana:latest
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./grafana/datasources:/etc/grafana/provisioning/datasources
networks:
- monitoring
# ELK Stack for Logs
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
environment:
discovery.type: single-node
xpack.security.enabled: false
volumes:
- elasticsearch_data:/usr/share/elasticsearch/data
networks:
- logging
logstash:
image: docker.elastic.co/logstash/logstash:8.11.0
volumes:
- ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
depends_on:
- elasticsearch
networks:
- logging
kibana:
image: docker.elastic.co/kibana/kibana:8.11.0
environment:
ELASTICSEARCH_HOSTS: http://elasticsearch:9200
depends_on:
- elasticsearch
networks:
- logging
volumes:
prometheus_data:
grafana_data:
elasticsearch_data:
networks:
monitoring:
logging:
Key Learnings
1. Container Management is Complex
Key Challenges:
- Cold Start Problem: Container creation takes 2-5 seconds
- Resource Leaks: Containers not properly cleaned up
- State Management: Persistent vs ephemeral container strategies
- Network Isolation: Balancing security with functionality
Solutions Implemented:
- Container pooling with pre-warmed instances
- Automatic cleanup with garbage collection
- Persistent containers with workspace isolation
- Network-isolated execution environments
2. Real-time Streaming Requires Careful Architecture
Technical Insights:
- WebSocket Management: Connection pooling and cleanup crucial
- Message Ordering: Ensure output lines arrive in sequence
- Buffer Management: Handle high-frequency output efficiently
- Connection Recovery: Graceful handling of network issues
Best Practices:
- Use Redis pub/sub for scalable streaming
- Implement connection heartbeats
- Buffer and batch small messages
- Provide fallback to polling for unreliable connections
3. Security Cannot Be an Afterthought
Critical Security Measures:
- Defense in Depth: Multiple security layers
- Principle of Least Privilege: Minimal container permissions
- Resource Limits: Prevent resource exhaustion attacks
- Code Analysis: Static analysis before execution
Security Architecture:
┌─────────────────┐
│ Code Input │
├─────────────────┤
│ Static Analysis │ ← First line of defense
├─────────────────┤
│ Rate Limiting │ ← Prevent abuse
├─────────────────┤
│ Docker Sandbox │ ← Isolation layer
├─────────────────┤
│ Resource Limits │ ← Resource protection
├─────────────────┤
│ Network Filter │ ← Network restrictions
└─────────────────┘
4. Performance Optimization is Multi-Faceted
Optimization Areas:
- Container Lifecycle: Pool management and reuse
- Resource Allocation: Dynamic scaling based on load
- Queue Management: Fair distribution and priority handling
- Caching: Language environment and dependency caching
Performance Metrics to Track:
- Container startup time
- Execution latency
- Queue depth
- Resource utilization
- Success/failure rates
5. Production Reliability Requires Operational Excellence
Observability Stack:
- Metrics: Prometheus + Grafana for system health
- Logging: ELK stack for centralized log analysis
- Tracing: Distributed tracing for request flows
- Alerting: PagerDuty integration for critical issues
Deployment Strategies:
- Blue-green deployments for zero downtime
- Canary releases for gradual rollouts
- Circuit breakers for fault tolerance
- Auto-scaling based on queue depth and CPU usage
6. Language-Specific Considerations
Each programming language has unique requirements:
Python:
- Dependency management with pip
- Virtual environment isolation
- Import path security
- Package installation caching
Node.js:
- npm/yarn dependency resolution
- Module loading restrictions
- Event loop management
- Memory garbage collection
Go:
- Module system (go.mod)
- Build caching for faster compilation
- Static binary advantages
- Goroutine resource management
Rust:
- Cargo package management
- Compilation time optimization
- Memory safety guarantees
- Target architecture handling
Java:
- Classpath management
- JVM startup optimization
- Garbage collection tuning
- Security manager configuration
Conclusion
Building a production-ready online code compiler is a journey that touches every aspect of modern distributed systems engineering. From container orchestration to real-time streaming, from security isolation to performance optimization, each component requires careful consideration and robust implementation.
The key to success lies in:
- Robust Architecture: Design for failure and scale from day one
- Security First: Implement security at every layer
- Performance Focus: Optimize for user experience and resource efficiency
- Operational Excellence: Monitor, measure, and continuously improve
- Incremental Development: Start simple and add complexity gradually
The result should be a platform that feels immediate and reliable, allowing developers to focus on code rather than infrastructure. When users can execute code with the same confidence they have in their local development environment, you've achieved the goal of a truly powerful online code compiler.
Learning Resources
Essential Reading
Distributed Systems:
- "Designing Data-Intensive Applications" by Martin Kleppmann - Comprehensive guide to distributed system patterns
- "Building Microservices" by Sam Newman - Microservice architecture and communication patterns
- "Site Reliability Engineering" by Google - Production system reliability practices
Container Technologies:
- "Docker Deep Dive" by Nigel Poulton - Comprehensive Docker guide
- "Kubernetes in Action" by Marko Lukša - Kubernetes orchestration patterns
- "Container Security" by Liz Rice - Security best practices for containers
Real-time Systems:
- "High Performance Browser Networking" by Ilya Grigorik - WebSocket and real-time communication
- "Redis in Action" by Josiah Carlson - Redis patterns for real-time applications
Documentation and Specifications
Container Security:
Message Queues:
Performance Optimization:
Open Source Projects
Code Execution Platforms:
- Judge0 - Online code execution system
- HackerEarth API - Commercial code execution platform
- Glot.io - Simple code execution service
Container Management:
- Docker - Container runtime
- Podman - Alternative container runtime
- gVisor - Application kernel for containers
Message Queue Solutions:
- RabbitMQ - Feature-rich message broker
- Apache Kafka - High-throughput distributed streaming
- Redis - In-memory data structure store
Tools and Development Environment
Development Tools:
- Docker Desktop - Local container development
- Kubernetes KIND - Local Kubernetes development
- Minikube - Local Kubernetes cluster
Monitoring and Observability:
- Prometheus - Metrics collection and alerting
- Grafana - Metrics visualization and dashboards
- ELK Stack - Centralized logging and analysis
Testing Frameworks:
- Testcontainers - Integration testing with containers
- k6 - Load testing for APIs and WebSockets
- Artillery - Performance testing toolkit
With love from the Toki Space team
This tutorial represents our collective experience building Toki's code execution platform. The architecture and lessons shared here will help you build your own robust online code compiler. For questions or contributions, reach out to our engineering team at [email protected]
Top comments (0)