Experiment 4: Final Report

Hyperparameter Optimization and Cross-Dataset Evaluation

Executive Summary

This report details the process and outcomes of Experiment 4, which had two primary objectives: to optimize the hyperparameters of the Attention U-Net model for anomaly detection and to evaluate the performance of the optimized model on a held-out test set.

The hyperparameter tuning was successful in identifying an optimal set of parameters that significantly improved the model's performance on the validation set. However, the subsequent evaluation on a general fetal cancer dataset revealed that the model, in its current form, does not generalize well to out-of-distribution data. This is a critical finding that has led to a new, more targeted plan for moving forward.

This report will cover the methodology used for both the optimization and evaluation, the results of both phases, and a detailed plan for the next experiment, which will focus on training and evaluating the model on the correct, intended dataset (the FOCUS dataset) using a self-supervised approach.

Hyperparameter Optimization

Objective

The primary goal of this phase was to systematically find the optimal set of hyperparameters for the Attention U-Net model to maximize its performance in detecting anomalies. The baseline model from Experiment 2.2 was used as the starting point.

Methodology

A grid search was performed over a predefined set of hyperparameters. All experiments were tracked using MLflow under the experiment name Experiment_4_Hyperparameter_Tuning.

Hyperparameter	Search Space
Learning Rate	`[0.001, 0.0001, 0.00001]`
Batch Size	`[16, 32, 64]`
Number of Epochs	`[50, 100, 150]`

Results

The hyperparameter search was successful. The best performing model was identified by finding the run with the lowest validation loss.

Best Run ID: 2efb4fea95c24ae3b40ecaab11bf615f

Optimal Learning Rate: 0.001

Optimal Batch Size: 16

Optimal Number of Epochs: 100

Best Validation Loss: 0.00028956

The weights of the best performing model were saved to checkpoints/best_attention_unet.pth.

Cross-Dataset Evaluation

Objective

To assess the generalization performance of the optimized model, it was evaluated on a held-out test set from the `ultrasound_fetus_dataset`. This dataset is for general fetal cancer detection and is different from the FOCUS dataset that the optimized model was trained. This data set was used to test the ability of the model to generalize to detect abnormalities outside of those explicity covered in the training set.

Results

ROC AUC

0.5230

Precision-Recall AUC

0.8645

Confusion Matrix

	Predicted Normal	Predicted Abnormal
Actual Normal	30	20
Actual Abnormal	267	84

Analysis

The ROC AUC of 0.5230 indicates that the model has almost no ability to distinguish between normal and abnormal cases in this specific dataset. The high number of false negatives (267) is a clear indication that the model is not generalizing to this new problem. This is not a failure of the model itself, but rather a confirmation that the model is highly specialized to the data it was trained on. This is an expected and valuable finding.

Conclusion and Path Forward

Experiment 4 was a success. We successfully optimized the Attention U-Net model and, more importantly, we have gained a much clearer understanding of the project's datasets and the path forward.

The key takeaway is that our models currently have no ability to generalize abnormality detection. While they are tightly tuned to specific data and individual categories of abnormalities, addiitonal data and more flexible forms of identification will be needed for generalized anomoly detection in any given fetal scan.

Our next steps will be to execute a new experiment, Experiment 5: Self-Supervised Anomaly Detection on the FOCUS Dataset. This experiment will leverage the segmentation masks in the FOCUS dataset to create a proxy for anomaly detection, allowing us to train and evaluate our anomaly detection model on the intended data.

This will be the first step towards your ultimate goal of a hybrid system that combines a general anomaly detection model with a specialized segmentation model.