Interpreting decisions of AI models using XAI techniques

Abstract

This thesis explores the intersection of Artificial Intelligence, interpretability, and application-specific challenges across diverse domains, including hate speech detection, AI text classification, medical image analysis, and feature interpretability. The research underscores the critical need for robust, transparent, and high-performing AI models to address pressing challenges in these fields. In hate speech detection, this study evaluates various word embedding techniques—CountVectorizer, GloVe, and Bidirectional Encoder Representations from Transformers (BERT)—combined with machine learning and deep learning classifiers. The BERT-BiGRU model achieves a notable accuracy of 92%, with interpretability enhanced through eXplainable AI (XAI) techniques such as Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlainations (SHAP), providing actionable insights into model predictions. Similarly, the growing prevalence of AI-generated content motivated an investigation into distinguishing human-authored text from AI-modified text. A comparative analysis highlights the effectiveness of word embeddings and text-level feature extraction techniques when integrated with machine learning classifiers, achieving high accuracy and F1-scores exceeding 91%. Transparency and interpretability were further improved using SHAP and LIME explanations. In medical image analysis, the integration of XAI techniques such as LIME and Gradient-weighted Class Activation Mapping (Grad-CAM) with deep learning models enhances model transparency while maintaining clinical reliability. Performance metrics across tasks emphasize the importance of explainable models in healthcare, strengthening trust and applicability. To address the limitations of traditional interpretability methods, this thesis introduces two novel enhancements to LIME. First, Radial Basis Function Based LIME (RBF-LIME) incorporates RBF interpolation to redefine local boundary assumptions as nonlinear relationships. Applied to stroke prediction using healthcare datasets, RBF-LIME demonstrates superior interpretability and effectiveness, outperforming conventional methods. Second, l_2 + l_1$-LIME integrates Ridge and Lasso regression to improve feature selection, enhancing LIME’s stability and interpretability. This method reduces sensitivity to perturbations, particularly in categorical datasets, making it more reliable for high-stakes applications such as healthcare. Evaluations across multiple datasets confirm improved consistency in feature attributions, contributing to more transparent AI decision-making. By integrating XAI techniques with cutting-edge AI architectures, this thesis advances the development of transparent, reliable, and high-performing AI systems across domains. The proposed innovations pave the way for future research into generalizability, computational efficiency, and broader applicability in real-world scenarios.

Description

Citation

Endorsement

Review

Supplemented By

Referenced By