Experiment 2 Final Report: Iterative Improvements in Fetal Ultrasound Anomaly Detection
The objective of this experiment was to systematically improve upon the baseline Autoencoder (AE) model for anomaly detection in fetal ultrasound images by introducing and evaluating a series of advanced features and techniques.
We conducted a series of controlled experiments, where each experiment introduced a single change to the model or training process. This allowed us to isolate the impact of each change and make data-driven decisions about which improvements to incorporate into the final model. We used MLflow to track all experiments, parameters, and metrics.
Experiment | Model | ROC AUC | PR AUC |
---|---|---|---|
Baseline | Autoencoder | 0.705 | 0.953 |
Exp 2.1 | U-Net | 0.700 | 0.952 |
Exp 2.2 | Attention U-Net | 0.744 | 0.958 |
Exp 2.3 | Advanced Augmentation | 0.722 | 0.954 |
Exp 2.4 | Curriculum Learning | 0.730 | 0.956 |
Detailed Experiment Breakdowns
Autoencoder Error Distribution

Autoencoder PR Curve

Autoencoder ROC Curve

UNet Error Distribution

UNet PR Curve

UNet ROC Curve

Attention UNet Error Distribution

Attention UNet PR Curve

Attention UNet ROC Curve

Advanced Augmentation Error Distribution

Advanced Augmentation PR Curve

Advanced Augmentation ROC Curve

Curriculum Learning Error Distribution

Curriculum Learning PR Curve

Curriculum Learning ROC Curve

Analysis
The results of our experiments show that the most significant improvement in performance came from the introduction of the Attention U-Net in Experiment 2.2. This model achieved a ROC AUC of 0.744, a notable increase from the baseline AE's 0.705. This suggests that the attention mechanism was effective in helping the model to focus on relevant features in the ultrasound images.
The other experiments did not yield the expected improvements. The standard U-Net in Experiment 2.1 performed similarly to the baseline AE, and the advanced data augmentations in Experiment 2.3 and curriculum learning in Experiment 2.4 resulted in a decrease in performance compared to the Attention U-Net.
Conclusion
Based on the results of this experiment series, the Attention U-Net from Experiment 2.2 is the best performing model. It provides a significant improvement over the baseline model and represents the new state-of-the-art for this task.
For future work, we recommend further investigation into the following areas:
- Hyperparameter Tuning: A more thorough hyperparameter search for the Attention U-Net could lead to further performance gains.
- Alternative Architectures: Exploring other advanced architectures, such as Vision Transformers (ViTs), could also be a promising direction.
- Ensemble Methods: Combining the predictions of multiple models could lead to a more robust and accurate anomaly detection system.