Abstract
Generative artificial intelligence (AI), especially large language models (LLMs), is increasingly
deployed in domains such as recruitment, content creation, and education. While these systems
accelerate productivity, they also risk reproducing and amplifying societal biases (Ahuchogu et
al., 2025). This project addresses the urgent challenge of identifying, quantifying, and mitigating
gender bias in text-generative AI outputs, with a focus on job narratives. Building on my
independent study of 11,000+ AI-generated job narratives, which we generated using Gemini AI,
we introduce a bias quantification framework using mean bias, mean absolute bias, sentiment
skew (via TextBlob), and distributional measures (via Kullback-Leibler divergence and related
distances). Preliminary results show measurable gendered patterns across generated narratives,
validating the hypothesis of proposed gender bias in LLM.
The proposed work extends this foundation in three directions: expanding bias quantification
using probabilistic distribution distances (Devisetti, 2024)(Chung et al., 1989), evaluating
prompt-construction bias and multi-model comparisons across GPT-3, GPT-4, Gemini, and
open-source LLMs (Blodgett et al., 2020), and integrating interpretable embedding methods
(e.g., SPINE)(Subramanian et al., 2017) for transparency in downstream debiasing.
The expected contribution is both theoretical and practical: a robust bias quantification pipeline
grounded in probability theory, and actionable strategies to mitigate bias in LLM-generated
recruitment texts(Ferrara, 2024). Beyond recruitment, the proposed methodology aims to serve
as a standard for bias evaluation in generative AI applications more broadly.