Technical depth / education

Master's thesis: breast cancer from histological images

A research prototype (not deployed) that lets a patient, family member, or X-ray tech upload a histology image and get an early read on whether a lump looks malignant or benign. Built as an MSc thesis because waiting on a doctor's appointment for a scan or a second opinion in Europe takes too long.

Employer / client: Dublin City University
Duration: MSc capstone, DCU
Project type: Master's thesis

GitHub breast-cancer-detection PDF Final Report PDF

How it works

How this works

Bar chart comparing validation accuracy of CNN, VGG, ResNet, CNN-to-VGG hybrid, and VGG-to-CNN hybrid models. CNN scores 0.83, VGG 0.70, ResNet 0.69, CNN+VGG hybrid 0.80, VGG+CNN hybrid 0.77. — Validation accuracy across all five models. CNN was the strongest single architecture, the CNN-to-VGG hybrid the strongest stack.
Figure 7, Final Report

Bar chart comparing validation loss across the same five models. CNN 0.39, VGG 0.67, ResNet 0.62, CNN+VGG 0.51, VGG+CNN 0.51. — Validation loss across the same five models. The hybrids reduced loss versus standalone VGG and ResNet, but CNN held the lowest loss overall.
Figure 8, Final Report

Process flow

How I work the steps

01
before Dataset
Kaggle histopathology images, binary benign vs malignant labels.
Public dataset
02
control Preprocessing
Resize, normalise, augment (rotation, scaling, flipping), histogram equalisation.
BA / Researcher
03
control Model training
Custom CNN, transfer-learned VGG16 and ResNet50, two CNN/VGG hybrid stacks.
BA / Researcher
04
handoff Validation
Stratified k-fold cross-validation, statistical significance tests across architectures.
BA / Researcher
05
after Output
Benign vs malignant read with accuracy, sensitivity, specificity, and overfitting reporting.
Patient / clinician

How I built it

Pulled a labelled histopathology image set from Kaggle, split 80/20 into train and test, then layered on resize, normalisation, rotation, scaling, flipping, and histogram-equalisation preprocessing to broaden the training distribution.
Built a custom CNN from scratch (convolutional + ReLU + pooling + fully-connected layers, tuned filter and kernel sizes for cell-level texture).
Fine-tuned pretrained VGG16 and ResNet50 with transfer learning, customising the heads for binary benign-vs-malignant classification.
Trained two hybrid stacks: CNN to VGG (CNN feature extraction feeding VGG depth) and VGG to CNN (the reverse order), to test whether mixing the architectures beat the best single model.
Used stratified k-fold cross-validation so every fold was used for both training and validation, and ran statistical-significance tests across architectures so the comparison wasn't just "this number was bigger".

Measured results

What I measured

83% accuracy

CNN (best single model)

Validation accuracy, validation loss 0.35.

80% accuracy

CNN to VGG hybrid

Validation accuracy, validation loss 0.30. Best hybrid.

77% accuracy

VGG to CNN hybrid

Validation accuracy, validation loss 0.32.

70% accuracy

VGG16 (transfer learning)

Validation accuracy, 0.45 loss. Volatile loss curves.

69% accuracy

ResNet50 (transfer learning)

Validation accuracy, 0.50 loss. Irregular training behaviour.

Findings

CNN came out on top at 83% validation accuracy with 0.35 validation loss, the best single model in the study.
CNN-to-VGG hybrid hit 80% accuracy at 0.30 loss, the second strongest and the best of the hybrids.
VGG-to-CNN hybrid landed at 77% accuracy with 0.32 loss.
Standalone VGG and ResNet50 came in at 70% and 69% respectively, with more volatile loss curves and clear overfitting signals.