AI-Enabled Oral Cancer Detection
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Oral cancer is a major health concern, especially in less-developed or resource constrained environments. The main causes of this cancer include tobacco use, betel chewing, poor oral hygiene, and other factors. Early detection of oral cancer symptoms can greatly improve survival rates. However, existing diagnostic methods heavily rely on limited uni-modal data and may miss subtle indicators in resource-constrained settings. Current methods require large labelled datasets, which are difficult to obtain due to security and privacy concerns; hence, there is a need for multimodal approaches that combine different data types (e.g., images with patient clinical data), which helps the model generalize and perform better. This thesis aims to develop and evaluate a transformer-based multimodal pipeline that fuses histopathological images with structured metadata to improve oral cancer detection performance. Histopathological images are pre-processed and fed into pretrained Shifted Window Vision Transformer(Swin)and Data-Efficient Image Transformers (DeiT) pipelines to generate embedding’s, alongside structured metadata (e.g., demographics, risk factors), which are cleaned, encoded, and embedded. A fusion network integrates these embedding’s, and the combined model is trained end-to-end on a curated dataset. We used a publicly available dataset called NDB, which contains 237 histopathological images and related metadata. These images are converted into patches for model training. Both multimodal approaches performed well, achieving accuracies of 92% for DeiT and 93% for Swin Transformer. These results demonstrate that transformer-based fusion of images and metadata can mitigate data scarcity and enhance diagnostic reliability, offering a promising direction for practical oral cancer detection tools. Future work may integrate additional modalities (e.g., acoustic or IoT sensor data) and validate the pipeline on larger, multi-center cohorts.
