WSEAS Transactions on Information Science and Applications
Print ISSN: 1790-0832, E-ISSN: 2224-3402
Volume 22, 2025
A Comparative Pattern Analysis of Qwen 2.5 and Gemma 3 Text Generation
Authors: ,
Abstract: This study examines whether two instruction-tuned language models, Qwen 2.5 (32B) and Gemma 3 (27B), exhibit distinct linguistic patterns for accurate, automated text attribution. We created a dataset of 6,000 LLM-generated text outputs (3,000 per model) to 300 prompts across ten categories for diverse contextual analysis. Afterward, we trained four classifiers: Logistic Regression; Support Vector Machine (SVM); Random Forest; and Gradient Boosting, using Term Frequency-Inverse Document Frequency (TF-IDF) features and various syntactic and stylistic cues. Findings show that TF-IDF features alone are effective for text attribution and that SVM is the most accurate attribution tool, achieving a 99% success rate. At the same time, the ensemble methods of Random Forest and Gradient Boosting were improved by the addition of syntactic and stylistic markers. However, lexical frequency patterns remained the primary predictor, which indicates that simple methods can effectively categorize text. Further analysis also revealed that Qwen 2.5 typically produces structured, formal outputs, while Gemma 3 favors a more expressive, narrative style. Our final results show that all classifiers can effectively identify AI-generated text, which may have future implications for academic integrity, content moderation, and automated plagiarism detection. Considering the constant evolution of Large Language Models (LLMs), better benchmarking methods and additional features are required to precisely attribute AI-generated text across different scenarios.
Search Articles
Keywords: Qwen 2.5, Gemma 3, Large Language Models (LLMs), Logistic Regression, Support Vector Machine, Random Forest, Gradient Boosting, AI-generated content, pattern analysis
Pages: 604-615
DOI: 10.37394/23209.2025.22.50