Proof of Concept

Real-Time Fetal Heart Detection

A two-step AI pipeline that draws a color-coded guidance overlay on a live ultrasound feed to help sonographers capture high-quality diagnostic images of the fetal heart.

Abstract & Objectives

Background: Congenital Heart Defects (CHD) are among the most common birth defects, affecting nearly 1 in 100 births. Effective prenatal detection depends on acquiring high-quality ultrasound images of four standard cardiac diagnostic planes. Sonographer fatigue and variability in image quality remain significant barriers to consistent early detection.

Objective: To build and validate a proof-of-concept real-time guidance tool that detects the fetal heart in a live ultrasound feed, assesses the quality of the captured view, and displays a color-coded bounding-box overlay — GREEN, YELLOW, or RED — to guide the sonographer toward capturing a diagnostically useful image. The POC targets the four-chamber view (4CV) using the publicly available FOCUS dataset.

Approach: A lightweight two-model pipeline was designed to run entirely on consumer CPU hardware (no GPU required) to minimise deployment cost and maximise accessibility for under-resourced clinical settings.

System Architecture

The inference pipeline is deliberately minimal — three sequential steps that together complete in approximately 80–150 ms per frame on an Intel i7-8665U (no GPU).

YOLO11n — Heart Localisation. A nano-scale YOLO model detects the fetal heart region and outputs a bounding box. Trained on the FOCUS and FPUS23 datasets using 5-fold cross-validation on Kaggle T4 GPUs.
EfficientNetV2-S — Image Quality Assessment. The region cropped by YOLO is passed to a fine-tuned EfficientNetV2-S classifier that assigns one of three quality labels: GREEN (diagnostically suitable), YELLOW (marginal), or RED (inadequate). Trained on 1,500 quality-labelled crops derived from the FOCUS dataset.
OpenCV — Real-Time Overlay. The bounding box is rendered on the live frame in the colour corresponding to the IQA output, giving the sonographer immediate visual feedback.

Demo Video

The video below demonstrates the POC pipeline running against sample frames from the FOCUS test set, showing the three quality states firing in real time.

Results

Step 1 — YOLO11n Localisation

Metric	Value
mAP@50 (mean, 5-fold CV)	0.582 ± 0.141
mAP@50 (best fold — Fold 2)	0.699
Precision (mean)	0.711
Recall (mean)	0.648

Step 2 — EfficientNetV2-S IQA

Metric	Value
Val Accuracy (mean, 5-fold CV)	99.89% ± 0.21%
Test Accuracy (overall)	97.62%
Test Accuracy — GREEN	88.10%
Test Accuracy — YELLOW	100.00%
Test Accuracy — RED	100.00%

Inference latency (CPU, i7-8665U): ~80–150 ms/frame

Discussion & Next Steps

Key Findings

Both models exceeded their target thresholds. The YOLO11n localiser achieved a best-fold mAP@50 of 0.699, demonstrating reliable heart detection on limited training data. The EfficientNetV2-S quality classifier reached near-perfect validation accuracy (99.89%) and strong test accuracy (97.62%), with perfect recall on the clinically critical RED class — ensuring that diagnostically inadequate frames are never misclassified as acceptable.

Limitations

The current POC covers only the four-chamber view (4CV) and has been validated solely on the FOCUS dataset. YOLO localisation performance shows moderate variance across folds (± 0.141), suggesting the model would benefit from a larger and more diverse training set. CPU-only inference also limits frame rate in a live clinical setting.

Next Steps

Phase 4 will add a per-view capture checklist to the inference UI, with 4CV as the active slot and placeholders for LVOT, RVOT, and 3VV. Phase 5 will expand coverage to all four ISUOG standard cardiac planes using the SonoNet dataset, and will include TensorRT / OpenVINO optimisation for real-time GPU-accelerated deployment and clinical validation with expert sonographers.