Experiment 4: Final Report
Hyperparameter Optimization and Cross-Dataset Evaluation
This report details the process and outcomes of Experiment 4, which had two primary objectives: to optimize the hyperparameters of the Attention U-Net model for anomaly detection and to evaluate the performance of the optimized model on a held-out test set.
The hyperparameter tuning was successful in identifying an optimal set of parameters that significantly improved the model's performance on the validation set. However, the subsequent evaluation on a general fetal cancer dataset revealed that the model, in its current form, does not generalize well to out-of-distribution data. This is a critical finding that has led to a new, more targeted plan for moving forward.
This report will cover the methodology used for both the optimization and evaluation, the results of both phases, and a detailed plan for the next experiment, which will focus on training and evaluating the model on the correct, intended dataset (the FOCUS dataset) using a self-supervised approach.
Objective
The primary goal of this phase was to systematically find the optimal set of hyperparameters for the Attention U-Net model to maximize its performance in detecting anomalies. The baseline model from Experiment 2.2 was used as the starting point.
Methodology
A grid search was performed over a predefined set of hyperparameters. All experiments were tracked using MLflow under the experiment name Experiment_4_Hyperparameter_Tuning
.
Hyperparameter | Search Space |
---|---|
Learning Rate | [0.001, 0.0001, 0.00001] |
Batch Size | [16, 32, 64] |
Number of Epochs | [50, 100, 150] |
Results
The hyperparameter search was successful. The best performing model was identified by finding the run with the lowest validation loss.
Best Run ID: 2efb4fea95c24ae3b40ecaab11bf615f
Optimal Learning Rate: 0.001
Optimal Batch Size: 16
Optimal Number of Epochs: 100
Best Validation Loss: 0.00028956
The weights of the best performing model were saved to checkpoints/best_attention_unet.pth
.
Objective
To assess the generalization performance of the optimized model, it was evaluated on a held-out test set from the `ultrasound_fetus_dataset`. This dataset is for general fetal cancer detection and is different from the FOCUS dataset that the optimized model was trained. This data set was used to test the ability of the model to generalize to detect abnormalities outside of those explicity covered in the training set.
Results
ROC AUC
0.5230
Precision-Recall AUC
0.8645
Confusion Matrix
Predicted Normal | Predicted Abnormal | |
---|---|---|
Actual Normal | 30 | 20 |
Actual Abnormal | 267 | 84 |
Analysis
The ROC AUC of 0.5230 indicates that the model has almost no ability to distinguish between normal and abnormal cases in this specific dataset. The high number of false negatives (267) is a clear indication that the model is not generalizing to this new problem. This is not a failure of the model itself, but rather a confirmation that the model is highly specialized to the data it was trained on. This is an expected and valuable finding.
Experiment 4 was a success. We successfully optimized the Attention U-Net model and, more importantly, we have gained a much clearer understanding of the project's datasets and the path forward.
The key takeaway is that our models currently have no ability to generalize abnormality detection. While they are tightly tuned to specific data and individual categories of abnormalities, addiitonal data and more flexible forms of identification will be needed for generalized anomoly detection in any given fetal scan.
Our next steps will be to execute a new experiment, Experiment 5: Self-Supervised Anomaly Detection on the FOCUS Dataset. This experiment will leverage the segmentation masks in the FOCUS dataset to create a proxy for anomaly detection, allowing us to train and evaluate our anomaly detection model on the intended data.
This will be the first step towards your ultimate goal of a hybrid system that combines a general anomaly detection model with a specialized segmentation model.