TY - UNPB
T1 - Whose LLM is it Anyway?
T2 - Linguistic Comparison and LLM Attribution for GPT-3.5, GPT-4 and Bard
AU - Rosenfeld, Ariel
AU - Lazebnik, Teddy
PY - 2024/2/22
Y1 - 2024/2/22
N2 - Large Language Models (LLMs) are capable of generating text that is similar to or surpasses human quality. However, it is unclear whether LLMs tend to exhibit distinctive linguistic styles akin to how human authors do. Through a comprehensive linguistic analysis, we compare the vocabulary, Part-Of-Speech (POS) distribution, dependency distribution, and sentiment of texts generated by three of the most popular LLMS today (GPT-3.5, GPT-4, and Bard) to diverse inputs. The results point to significant linguistic variations which, in turn, enable us to attribute a given text to its LLM origin with a favorable 88\% accuracy using a simple off-the-shelf classification model. Theoretical and practical implications of this intriguing finding are discussed.
AB - Large Language Models (LLMs) are capable of generating text that is similar to or surpasses human quality. However, it is unclear whether LLMs tend to exhibit distinctive linguistic styles akin to how human authors do. Through a comprehensive linguistic analysis, we compare the vocabulary, Part-Of-Speech (POS) distribution, dependency distribution, and sentiment of texts generated by three of the most popular LLMS today (GPT-3.5, GPT-4, and Bard) to diverse inputs. The results point to significant linguistic variations which, in turn, enable us to attribute a given text to its LLM origin with a favorable 88\% accuracy using a simple off-the-shelf classification model. Theoretical and practical implications of this intriguing finding are discussed.
KW - cs.CL
U2 - 10.48550/arXiv.2402.14533
DO - 10.48550/arXiv.2402.14533
M3 - פרסום מוקדם
BT - Whose LLM is it Anyway?
PB - arXiv preprint
ER -