Abstract
Predicting the future influence of scholarly articles remains a critical challenge in bibliometrics and research evaluation. Decisions about funding, hiring, promotion and venue selection are often based on citation-based indicators that are only observed years after publication. This lag encourages the use of crude proxies such as journal-level metrics or author reputation, which can amplify existing inequities and do not reflect the intrinsic contribution of individual papers. In this work, we propose AI-CITE, a hybrid machine learning pipeline that provides an early, article-level estimate of citation influence at publication time, while remaining transparent about its inputs and limitations. AI-CITE jointly exploits three complementary sources of information available for new manuscripts: The article title, the full abstract and the author list. Textual content is encoded through advanced neural embeddings specifically trained on scientific discourse, and author-centric features summarise historical citation behaviour and authorship positions. We benchmark three state-of-The-Art encoders designed for scientific texts, namely SciBERT, SciNCL and SPECTER2, under a controlled, single-journal setup that removes venue and discipline variability. Among these models, SPECTER2, which encodes titles and abstracts using citation-Aware training objectives, yields the most informative representations and is adopted as the core encoder in our final system. On top of these representations, AI-CITE formulates both a classification task, which predicts whether an article is likely to become highly cited, and a regression task, which estimates log-Transformed citation counts after a fixed evaluation horizon. We compare several families of predictive models, including linear baselines, support vector machines, feed-forward neural networks and tree ensembles. Across all settings, content-based neural embeddings substantially outperform traditional bag-of-words text features and simple bibliometric heuristics. Ablation studies further show that author features improve performance, especially for early-career versus established authors, but that title and abstract alone already provide strong predictive signals when author metadata are noisy, incomplete or intentionally down-weighted for fairness considerations. Beyond raw accuracy, the AI-CITE pipeline includes probability calibration, confidence gauges and SHAP-based explanations that identify which parts of the input most influence the prediction. This allows editors, reviewers and research evaluators to interpret the model as a decision-support tool rather than a black box score. Trained on a curated dataset of 100,000 peer-reviewed articles, AI-CITE demonstrates the feasibility of early citation forecasting using interpretable, content-Aware artificial intelligence. We discuss design choices that reduce the risk of misuse, and outline how such systems can support more responsible, article-focused research assessment and venue-specific impact modelling. We also highlight limitations related to field coverage, data quality and the evolving nature of citation practices, and discuss how these factors should temper over-reliance on automated scores. Although our experiments focus on a single journal, the design of AI-CITE is generic and can be extended to multi-journal or cross-discipline settings with appropriate re-Training. The framework can incorporate additional signals such as social attention or open access status, providing a flexible foundation for future research on early impact prediction. Finally, we release our code and experimental protocol to encourage reproducibility, critical scrutiny and further methodological advances in this emerging area.
| Original language | English |
|---|---|
| Pages (from-to) | 217675-217691 |
| Number of pages | 17 |
| Journal | IEEE Access |
| Volume | 13 |
| DOIs | |
| State | Published - 2025 |
Bibliographical note
Publisher Copyright:© 2013 IEEE.
Keywords
- Academic impact prediction
- artificial intelligence
- bibliometrics
- citation analysis
- deep learning
- machine learning
- natural language processing
- probability calibration
- scholarly influence
- semantic embeddings
Fingerprint
Dive into the research topics of 'AI-Driven Prediction of Scholarly Article Influence'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver