Abstract
Background
Protest music has historically shaped collective identity and transformed struggle into shared resistance. While prior research has examined its usage and impact (Bianchi, 2018; Mondak, 1988), little is known about the linguistic and acoustic markers that drive its perceptual distinctiveness. This study investigates whether protest songs exhibit consistent structural and expressive patterns that differentiate them from non-protest music.
Methodology
We analyzed 458 protest songs extracted from Wikipedia (Jiang & Jin, 2022) and 370 non-protest songs matched by time period. The following linguistic features were computed from lyrics: repetition rate (repeated bigrams/trigrams per lyrical line), lexical diversity (TTR = unique word types divided by total word tokens, reflecting vocabulary variation), unique word ratio (unique word types divided by lyrical lines, capturing how many new words appear per line), valence (emotional positivity or negativity), and rhyme density (number of rhyming word pairs per line). We also analyzed acoustic features obtained from the Spotify API: speechiness (degree of spoken-word content), danceability (rhythmic stability; a stable rhythm makes the brain follow the beats), acousticness (likelihood of acoustic over electric timbre), and instrumentalness (absence of vocals).
Results
Protest songs showed significantly higher rhyme density, repetition rate, greater unique word ratio, lower lexical diversity, and more negative valence than non-protest songs (all p < .001). A logistic regression classifier trained on these features achieved 90% accuracy, confirming their discriminative strength. Protest songs had significantly higher values for all acoustic features as well (p < .001). Deep learning was used to classify protest and non-protest songs; using lyrics with XLRoBERTa (94% accuracy) outperformed the audio model CLAP (89% accuracy). However, both are significant, so we can infer both the message and medium are responsible for the distinctive features of protest music. Genre may act as a confounding factor, as protest songs tend to cluster in rock, metal, country, and hip-hop, unlike non-protest songs which skew toward pop and disco. This limitation can be addressed in future work.
Discussion
The highly negative valence suggests that protest songs revolve mostly around negative themes. High repetition aids attention and recall. Low lexical diversity with high unique word ratio reflects figurative, chant-like repetition—where key phrases recur for impact, while new words enrich new lines. High rhyme density indicates phonetic structuring and lyrical crafting. Protest songs exhibit high speechiness, indicating a spoken-word style delivery. Their high danceability reflects stable, rhythmic patterns that facilitate collective engagement. Elevated acousticness suggests a preference for raw, organic textures. Meanwhile, high instrumentalness points to moments of vocal sparseness, where instruments alone carry emotional weight.
Conclusion
This study sheds light on how protest songs use structural and acoustic cues to create emotionally charged experiences. It contributes to a deeper understanding of the perceptual features of protest music and offers empirical grounding of the key distinguishing features. These findings deepen our understanding of the structural and expressive elements of protest music, providing valuable insights for research in music, linguistics, and social movements.