Train deep learning models that consume camera formats natively supported by target edge platforms, avoiding costly runtime conversions.
When deploying computer vision models to edge devices, there's often a mismatch between:
- Training data format: RGB images from standard datasets (ImageNet, COCO, etc.)
- Inference input format: Native camera/hardware formats (YUV, Bayer, BGR, RGBA)
Traditional approach: Convert camera output → RGB → Model inference
Problem: This conversion requires hardware (ISP, GPU, 2D accelerator) and adds latency.
Solution: Train the model to expect the native camera format directly.
graph LR
subgraph training ["Training Time"]
direction LR
A1["RGB Dataset"] --> B1["CameraAdaptorTransform"] --> C1["Target Format<br>(e.g., YUYV, BGR)"] --> D1["Model with<br>CameraAdaptor"]
end
subgraph inference ["Inference Time"]
direction LR
A2["Camera/<br>Hardware"] --> D2["Model<br>(native format)"]
end
style training fill:#e3f2fd,stroke:#1976d2
style inference fill:#e8f5e9,stroke:#4caf50
| Component | Purpose |
|---|---|
CameraAdaptorTransform |
Preprocessing: Convert RGB training data to target format |
CameraAdaptor (PyTorch) |
Model layer: Handle format-specific input processing |
CameraAdaptor (TensorFlow) |
Model layer: Handle format-specific input processing |
CameraAdaptorConfig |
Configuration and metadata for model export |
The CameraAdaptor layer does NOT perform color space conversion. Color conversion
is handled by CameraAdaptorTransform during training data loading:
- Training:
CameraAdaptorTransformconverts RGB images → target format (e.g., YUYV, BGR) - Inference: Camera/ISP provides data directly in target format → no conversion needed
The CameraAdaptor layer only performs:
- Layout permutation (NHWC ↔ NCHW) when
channels_last/channels_firstis enabled - Alpha channel dropping for RGBA/BGRA inputs
EdgeFirst CameraAdaptor is part of the EdgeFirst AI ecosystem:
- EdgeFirst HAL: Runtime library with optimized pre-processing pipelines for edge deployment. Use HAL for on-target inference and benchmarking of models trained with CameraAdaptor.
- EdgeFirst CameraAdaptor: Training library (this project) for creating models that accept native camera formats.
On-target benchmarks use edgefirst-hal to benchmark pre-processing pipelines with various CameraAdaptor configurations.
# Core library (numpy only)
pip install edgefirst-cameraadaptor
# With preprocessing support (OpenCV)
pip install edgefirst-cameraadaptor[transform]
# With PyTorch support
pip install edgefirst-cameraadaptor[torch]
# With TensorFlow support
pip install edgefirst-cameraadaptor[tensorflow]
# With PyTorch Lightning support
pip install edgefirst-cameraadaptor[lightning]
# Everything
pip install edgefirst-cameraadaptor[all]Convert training images to your target camera format:
from edgefirst.cameraadaptor import CameraAdaptorTransform
# Create transform for BGR format (RGB source by default)
transform = CameraAdaptorTransform("bgr")
bgr_frame = transform(rgb_frame)
# If using OpenCV's default BGR loading
transform = CameraAdaptorTransform("yuyv", source_format="bgr")
yuyv_frame = transform(bgr_frame) # cv2.imread() returns BGRAdd the adaptor as the first layer of your model:
from edgefirst.cameraadaptor.pytorch import CameraAdaptor
import torch.nn as nn
class MyModel(nn.Module):
def __init__(self, adaptor="rgb"):
super().__init__()
self.adaptor = CameraAdaptor(adaptor)
self.backbone = nn.Sequential(
nn.Conv2d(CameraAdaptor.compute_output_channels(adaptor), 64, 3),
# ... rest of your model
)
def forward(self, x):
x = self.adaptor(x)
return self.backbone(x)
# Model for RGBA input (4 channels -> 3 channels after adaptor)
model = MyModel(adaptor="rgba")from edgefirst.cameraadaptor.tensorflow import CameraAdaptor
import tensorflow as tf
inputs = tf.keras.Input(shape=(224, 224, 4)) # RGBA input
x = CameraAdaptor("rgba")(inputs) # Drops alpha -> 3 channels
x = tf.keras.layers.Conv2D(64, 3, padding="same")(x)
# ... rest of your modelFor models receiving data directly from camera pipelines in NHWC format:
# PyTorch: accept channels-last input, convert to channels-first internally
adaptor = CameraAdaptor("yuyv", channels_last=True)
x = torch.randn(1, 224, 224, 2) # NHWC from camera
y = adaptor(x) # Output: (1, 2, 224, 224) in NCHW
# TensorFlow: accept channels-first input if needed
from edgefirst.cameraadaptor.tensorflow import CameraAdaptor
layer = CameraAdaptor("yuyv", channels_first=True)
x = tf.random.normal((1, 2, 224, 224)) # NCHW
y = layer(x) # Output: (1, 224, 224, 2) in NHWC# YOLOv8 model with RGBA input
backbone:
- [-1, 1, CameraAdaptor, [rgba]] # First layer
- [-1, 1, Conv, [64, 3, 2]]
# ... rest of backboneDifferent image loading libraries return different formats:
| Library | Default Format | Transform Setup |
|---|---|---|
| PIL/Pillow | RGB | source_format="rgb" (default) |
| torchvision | RGB | source_format="rgb" (default) |
| OpenCV cv2.imread() | BGR | source_format="bgr" |
| OpenCV cv2.IMREAD_UNCHANGED | BGRA | source_format="bgra" |
| imageio | RGB | source_format="rgb" (default) |
| skimage | RGB | source_format="rgb" (default) |
Important: OpenCV loads images as BGR by default. If you're using cv2.imread() without explicit conversion, set source_format="bgr":
import cv2
from edgefirst.cameraadaptor import CameraAdaptorTransform
# CORRECT: Tell the transform your source is BGR
img = cv2.imread("image.jpg")
transform = CameraAdaptorTransform("yuyv", source_format="bgr")
yuyv = transform(img)| Format | Input Channels | Output Channels | Description |
|---|---|---|---|
| RGB | 3 | 3 | Standard RGB |
| BGR | 3 | 3 | OpenCV native |
| RGBA | 4 | 3 | RGB + alpha (dropped) |
| BGRA | 4 | 3 | BGR + alpha (dropped) |
| YUYV | 2 | 2 | YUV 4:2:2, ch0=Y, ch1=UV |
- Roadmap: NV12, NV21 (semi-planar YUV 4:2:0)
- Roadmap: Bayer patterns (RGGB, BGGR, GRBG, GBRG)
See FORMATS.md for detailed format documentation.
See PLATFORMS.md for i.MX platform-specific recommendations:
- i.MX 93: PXP outputs BGR - train models with BGR format
- i.MX 8M Plus: G2D outputs RGBA - use RGBA to auto-slice alpha
- i.MX 95: ISI/ISP pipeline considerations
Use CameraAdaptorConfig for model metadata:
from edgefirst.cameraadaptor import CameraAdaptorConfig
config = CameraAdaptorConfig(
adaptor="yuyv",
input_dtype="uint8", # For quantized models
output_dtype="uint8",
)
# Embed in model metadata
metadata = config.to_metadata()from pytorch_lightning import Trainer
from edgefirst.cameraadaptor.pytorch.lightning import create_callback
callback = create_callback("yuyv")
trainer = Trainer(callbacks=[callback])# Before
from ultralytics.edgefirst.camera.adaptor import CameraAdaptorTransform
from ultralytics.edgefirst.nn.modules import CameraAdaptor
# After
from edgefirst.cameraadaptor import CameraAdaptorTransform
from edgefirst.cameraadaptor.pytorch import CameraAdaptor# Before
from deepview.modelpack.datasets.color import ColorAdaptor
from deepview.modelpack.layers.conv2d import ColorAdaptor as TFColorAdaptor
# After
from edgefirst.cameraadaptor import CameraAdaptorTransform
from edgefirst.cameraadaptor.tensorflow import CameraAdaptorPreprocessing transform for converting images to target formats.
transform = CameraAdaptorTransform(
adaptor="yuyv", # Target format
source_format="rgb", # Source format (default: "rgb")
)
output = transform(image) # or transform.convert(image)Parameters:
adaptor: Target color space (str or ColorSpace enum)source_format: Source color space from data loader (str or ColorSpace enum, default: "rgb")
Properties:
adaptor: Target adaptor name (str)source_format: Source format name (str)channels: Output channel countinput_channels: Source format channel countoutput_channels: Channels model backbone receives
from edgefirst.cameraadaptor.pytorch import CameraAdaptor
adaptor = CameraAdaptor(
adaptor="yuyv", # Target format
channels_last=False, # True for NHWC input
)
output = adaptor(input_tensor)Parameters:
adaptor: Target color space (str or ColorSpace enum)channels_last: If True, input is NHWC, permuted to NCHW (default: False)
Static Methods:
compute_input_channels(args): Get input channels from YAML argscompute_output_channels(args): Get output channels from YAML args
from edgefirst.cameraadaptor.tensorflow import CameraAdaptor
layer = CameraAdaptor(
adaptor="yuyv", # Target format (None for auto-detect)
channels_first=False, # True for NCHW input
)
output = layer(input_tensor)Parameters:
adaptor: Target color space (str, None for auto-detect)channels_first: If True, input is NCHW, permuted to NHWC (default: False)
from edgefirst.cameraadaptor import CameraAdaptorConfig
config = CameraAdaptorConfig(
adaptor="yuyv",
input_dtype="float32",
output_dtype="float32",
)Properties:
input_channels: Input channel countoutput_channels: Output channel countis_quantized: Whether config uses quantized dtypes
Methods:
to_dict(): Convert to dictionaryto_metadata(): Convert to model metadata formatfrom_dict(data): Create from dictionaryfrom_metadata(metadata): Create from model metadata
Apache 2.0