Abstract
Legal documents are notorious for their length, density, and jargon-heavy language, making them challenging to navigate and comprehend. This highlights a strong need for clear and accessible documentation for a diverse audience. Text simplification at multiple levels, tailored to individuals with diverse backgrounds and expertise, is essential in making legal content universally accessible. To this effect, in this work, we focus on paragraph-level simplification of legal contracts and introduce Graded Simplification for Legal Data, a framework that adapts contract clauses across three competency levels: Skilled, Intermediate, and Basic. We employ Large language models (LLMs) to perform graded simplification, supported by a Token efficient Compression mechanism that incrementally encodes document context across paragraphs within fixed tokens, making it well suited to lengthy contracts. To address the challenge of reliably evaluating legal simplification at scale, we design a multi-criteria evaluation framework that jointly assesses readability, lexical simplicity, semantic preservation, and entailment. This framework enables the creation of our key resource, the SimpLegal dataset, an English-language preference dataset of paragraph-level contract simplifications. Using this dataset for Direct Preference Optimization (DPO), we achieve notable gains (
5 points) in readability and simplicity over zero-shot prompting-based baselines. Collectively, these contributions underscore the importance of graded, paragraph-level simplification for contracts and demonstrate that small and medium-scale LLMs, when fine-tuned on preference data, can achieve performance comparable to larger models, providing a scalable pathway for accessible and comprehensible legal documentation. Our code and dataset are made available at https://github.com/GSLD-SimpLegal/FromJargonToClarity.git.