HETEROGENEOUS CNN-TRANSFORMER ENSEMBLE WITH PROGRESSIVE MULTI-RESOLUTION TRAINING FOR MUSCULOSKELETAL RADIOGRAPH CLASSIFICATION
| dc.contributor.author | Adity | |
| dc.contributor.supervisor | Kumar, Sumit | |
| dc.contributor.supervisor | Mehta, Rajesh | |
| dc.date.accessioned | 2026-06-24T06:56:37Z | |
| dc.date.issued | 2026-06-23 | |
| dc.description.abstract | Musculoskeletal disorders affect one billion people worldwide, which results in large volumes of X-ray radiographs which challenge radiologists and lead to non-trivial interpretation errors. The Current single-model approaches face challenges when learning distinguishing features across seven anatomical regions with different class distributions simultaneously. Body region-specific class imbalance issues make this even more challenging, as the classifiers are typically biased towards the majority classes if no measures are taken during training. The proposed method addresses this by combining complementary CNN and transformer backbones into a heterogeneous ensemble. In this paper, an ablation study on the MURA dataset for fracture detection using EfficientNet-B4-NoisyStudent, ConvNeXt-Small, and Swin Transformer is performed. Comparative analysis demonstrates that feature extraction in the ensembling approach is more robust than in the individual model, even when all three backbones are trained under identical progressive multi-resolution schedules, bone-specific augmentation, and focal-loss-based regularisation. Before feeding into CNN models, the MURA dataset is preprocessed to reduce noise using Gaussian filtering, and images are enhanced using adaptive contrast-limited adaptive histogram equalisation to extract more relevant features and improve classification accuracy. Six ensemble fusion strategies, including weighted soft voting, rank-based averaging, stacking, and Nelder-Mead optimisation, are systematically evaluated and compared. Gradient-based saliency maps interpret the more influential features for model prediction. The experimental results show an accuracy of 84.78% and a sensitivity of 0.9152 across all seven classes of the dataset. The proposed ensemble approach achieves higher classification accuracy than the individual models. | |
| dc.identifier.uri | https://hdl.handle.net/10266/7284 | |
| dc.language.iso | en | |
| dc.title | HETEROGENEOUS CNN-TRANSFORMER ENSEMBLE WITH PROGRESSIVE MULTI-RESOLUTION TRAINING FOR MUSCULOSKELETAL RADIOGRAPH CLASSIFICATION | |
| dc.type | Thesis |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- METHESIS-8024320008-ADITY (1).pdf
- Size:
- 2.85 MB
- Format:
- Adobe Portable Document Format
Loading...
- Name:
- METHESIS-8024320008-ADITY (1).pdf
- Size:
- 2.85 MB
- Format:
- Adobe Portable Document Format
Loading...
- Name:
- METHESIS-8024320008-ADITY (1).pdf
- Size:
- 2.85 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.87 KB
- Format:
- Item-specific license agreed upon to submission
- Description:
