Hasanul Mukit

Posted on Jun 14

What I Would Want to Know When Interviewing an AI Engineer

#ai #devops #career #interview

Hiring an AI Engineer?
Sure, flashy RAG flows and multi-agent demos look cool—but the real challenge is building a reliable, cost-effective system that works in production. Here’s what I would actually want to know during interviews.

End-to-End System Design

Question: Can you design data ingestion → preprocessing → model inference → sserving?

What I’m looking for:
- Data pipelines (ETL tools, streaming vs batch)
- Model hosting (serverless vs containerized)
- API layers (REST/gRPC, WebSockets)
- Bottlenecks (I/O, network, compute) and mitigation (caching, sharding)

Cost Estimation & Optimization

Question: How would you estimate hosting, inference, and storage costs? How can you reduce them?

Details:
- Pricing models (per-token, per-hour GPU, storage IOPS)
- Trade-offs: smaller models, mixed precision, spot instances
- Auto-scaling strategies and cost alerts

Latency vs. Quality Trade-offs

Question: How would you reduce latency? What’s an acceptable latency vs. quality compromise?

Techniques:
- Quantization, distillation, pruning
- Caching frequent responses
- Async pre-warming of models
- SLAs: 100ms vs 500ms vs 1s thresholds

Self-Hosted vs. API LLMs

Question: Do you really need self-hosted LLMs? When is it justified?

Considerations:
- Data privacy/regulatory requirements
- Cost at scale vs. API convenience
- Custom fine-tuning needs
- Maintenance overhead (updates, scaling)

Fine-Tuning on User Behavior

Question: How would you collect user data, fine-tune models, and serve them?

Stack:
- Data capture (logs, feedback widgets)
- Frameworks (Hugging Face Trainer, LoRA, PEFT)
- Serving (SageMaker, KFServing, custom FastAPI endpoints)

Dataset Construction & MLOps

Question: How would you design the training dataset, loss function, and MLOps pipeline?

Key points:
- Labeling strategy (manual, weak supervision)
- Loss choices (cross-entropy, contrastive loss)
- CI/CD for models (GitHub Actions + DVC + Kubernetes)

Database Selection

Question: Which database(s) would you choose for embeddings, metadata, and user data—and why?

Options:
- Vector DB (e.g., Pinecone, Qdrant) for similarity search
- SQL (PostgreSQL) for transactional data
- NoSQL (MongoDB, Redis) for fast key-value or session stores
- Hybrid architectures and consistency considerations

Metrics & Monitoring

Question: What metrics would you track, and how?

Examples:
- Model performance: accuracy, perplexity, latency, throughput
- Business metrics: conversion rate, user engagement
- Tooling: Prometheus + Grafana, MLflow, Weights & Biases

System Debugging & Observability

Question: How would you monitor failures and debug them?

Tactics:
- Centralized logging (Elastic Stack, Splunk)
- Distributed tracing (OpenTelemetry)
- Alerting on error rates, timeouts, resource exhaustion

Feedback Loops & Continuous Improvement

Question: How would you collect, track, and evaluate user feedback?

Approach:
- Online A/B testing frameworks
- User rating widgets and sentiment analysis
- Automated retraining triggers based on drift detection

Determinism & Reproducibility

Question: How would you make the system more deterministic?

Strategies:
- Seed control in tokenizers and sampling
- Version-pinning models and dependencies (Conda, Poetry)
- Immutable artifacts (Docker images, model hashes)

Embedding Updates Without Downtime

Question: How would you swap embedding models and backfill vectors seamlessly?

Pattern:
- Blue/green deployment of new embeddings
- Incremental reindexing in vector DBs
- Feature-flag gating for gradual rollout

Fallback & Resilience

Question: What fallback mechanisms would you implement?

Ideas:
- Rule-based or keyword search backup
- Cached answers for common queries
- Circuit breakers to degrade gracefully under load

The “Bonus” Fundamental Questions

Without LLMs/Vector DBs: How would you solve the problem using classical IR, rules, or heuristics?
Deep Dive: Explain tokenization and embeddings from first principles.
Fine-Tuning Mechanics: What happens during training—optimizers, learning rates, layer freezing?

Why these matter:

Too many engineers build complex demos that never ship. I want candidates who understand the fundamentals, can design resilient systems, and can adapt when hype tools don’t fit.

Ready to build production-ready AI? Share your thoughts below!

DEV Community