DEV Community

Cover image for What I Would Want to Know When Interviewing an AI Engineer
Hasanul Mukit
Hasanul Mukit

Posted on

What I Would Want to Know When Interviewing an AI Engineer

Hiring an AI Engineer?
Sure, flashy RAG flows and multi-agent demos look cool—but the real challenge is building a reliable, cost-effective system that works in production. Here’s what I would actually want to know during interviews.

End-to-End System Design

Question: Can you design data ingestion → preprocessing → model inference → sserving?

  • What I’m looking for:
    • Data pipelines (ETL tools, streaming vs batch)
    • Model hosting (serverless vs containerized)
    • API layers (REST/gRPC, WebSockets)
    • Bottlenecks (I/O, network, compute) and mitigation (caching, sharding)

Cost Estimation & Optimization

Question: How would you estimate hosting, inference, and storage costs? How can you reduce them?

  • Details:
    • Pricing models (per-token, per-hour GPU, storage IOPS)
    • Trade-offs: smaller models, mixed precision, spot instances
    • Auto-scaling strategies and cost alerts

Latency vs. Quality Trade-offs

Question: How would you reduce latency? What’s an acceptable latency vs. quality compromise?

  • Techniques:
    • Quantization, distillation, pruning
    • Caching frequent responses
    • Async pre-warming of models
    • SLAs: 100ms vs 500ms vs 1s thresholds

Self-Hosted vs. API LLMs

Question: Do you really need self-hosted LLMs? When is it justified?

  • Considerations:
    • Data privacy/regulatory requirements
    • Cost at scale vs. API convenience
    • Custom fine-tuning needs
    • Maintenance overhead (updates, scaling)

Fine-Tuning on User Behavior

Question: How would you collect user data, fine-tune models, and serve them?

  • Stack:
    • Data capture (logs, feedback widgets)
    • Frameworks (Hugging Face Trainer, LoRA, PEFT)
    • Serving (SageMaker, KFServing, custom FastAPI endpoints)

Dataset Construction & MLOps

Question: How would you design the training dataset, loss function, and MLOps pipeline?

  • Key points:
    • Labeling strategy (manual, weak supervision)
    • Loss choices (cross-entropy, contrastive loss)
    • CI/CD for models (GitHub Actions + DVC + Kubernetes)

Database Selection

Question: Which database(s) would you choose for embeddings, metadata, and user data—and why?

  • Options:
    • Vector DB (e.g., Pinecone, Qdrant) for similarity search
    • SQL (PostgreSQL) for transactional data
    • NoSQL (MongoDB, Redis) for fast key-value or session stores
    • Hybrid architectures and consistency considerations

Metrics & Monitoring

Question: What metrics would you track, and how?

  • Examples:
    • Model performance: accuracy, perplexity, latency, throughput
    • Business metrics: conversion rate, user engagement
    • Tooling: Prometheus + Grafana, MLflow, Weights & Biases

System Debugging & Observability

Question: How would you monitor failures and debug them?

  • Tactics:
    • Centralized logging (Elastic Stack, Splunk)
    • Distributed tracing (OpenTelemetry)
    • Alerting on error rates, timeouts, resource exhaustion

Feedback Loops & Continuous Improvement

Question: How would you collect, track, and evaluate user feedback?

  • Approach:
    • Online A/B testing frameworks
    • User rating widgets and sentiment analysis
    • Automated retraining triggers based on drift detection

Determinism & Reproducibility

Question: How would you make the system more deterministic?

  • Strategies:
    • Seed control in tokenizers and sampling
    • Version-pinning models and dependencies (Conda, Poetry)
    • Immutable artifacts (Docker images, model hashes)

Embedding Updates Without Downtime

Question: How would you swap embedding models and backfill vectors seamlessly?

  • Pattern:
    • Blue/green deployment of new embeddings
    • Incremental reindexing in vector DBs
    • Feature-flag gating for gradual rollout

Fallback & Resilience

Question: What fallback mechanisms would you implement?

  • Ideas:
    • Rule-based or keyword search backup
    • Cached answers for common queries
    • Circuit breakers to degrade gracefully under load

The “Bonus” Fundamental Questions

  • Without LLMs/Vector DBs: How would you solve the problem using classical IR, rules, or heuristics?
  • Deep Dive: Explain tokenization and embeddings from first principles.
  • Fine-Tuning Mechanics: What happens during training—optimizers, learning rates, layer freezing?

Why these matter:

Too many engineers build complex demos that never ship. I want candidates who understand the fundamentals, can design resilient systems, and can adapt when hype tools don’t fit.

Ready to build production-ready AI? Share your thoughts below!

Top comments (0)