AI-Driven Prediction of Scholarly Article Influence

Research output: Contribution to journalArticlepeer-review

Abstract

Predicting the future influence of scholarly articles remains a critical challenge in bibliometrics and research evaluation. Decisions about funding, hiring, promotion and venue selection are often based on citation-based indicators that are only observed years after publication. This lag encourages the use of crude proxies such as journal-level metrics or author reputation, which can amplify existing inequities and do not reflect the intrinsic contribution of individual papers. In this work, we propose AI-CITE, a hybrid machine learning pipeline that provides an early, article-level estimate of citation influence at publication time, while remaining transparent about its inputs and limitations. AI-CITE jointly exploits three complementary sources of information available for new manuscripts: The article title, the full abstract and the author list. Textual content is encoded through advanced neural embeddings specifically trained on scientific discourse, and author-centric features summarise historical citation behaviour and authorship positions. We benchmark three state-of-The-Art encoders designed for scientific texts, namely SciBERT, SciNCL and SPECTER2, under a controlled, single-journal setup that removes venue and discipline variability. Among these models, SPECTER2, which encodes titles and abstracts using citation-Aware training objectives, yields the most informative representations and is adopted as the core encoder in our final system. On top of these representations, AI-CITE formulates both a classification task, which predicts whether an article is likely to become highly cited, and a regression task, which estimates log-Transformed citation counts after a fixed evaluation horizon. We compare several families of predictive models, including linear baselines, support vector machines, feed-forward neural networks and tree ensembles. Across all settings, content-based neural embeddings substantially outperform traditional bag-of-words text features and simple bibliometric heuristics. Ablation studies further show that author features improve performance, especially for early-career versus established authors, but that title and abstract alone already provide strong predictive signals when author metadata are noisy, incomplete or intentionally down-weighted for fairness considerations. Beyond raw accuracy, the AI-CITE pipeline includes probability calibration, confidence gauges and SHAP-based explanations that identify which parts of the input most influence the prediction. This allows editors, reviewers and research evaluators to interpret the model as a decision-support tool rather than a black box score. Trained on a curated dataset of 100,000 peer-reviewed articles, AI-CITE demonstrates the feasibility of early citation forecasting using interpretable, content-Aware artificial intelligence. We discuss design choices that reduce the risk of misuse, and outline how such systems can support more responsible, article-focused research assessment and venue-specific impact modelling. We also highlight limitations related to field coverage, data quality and the evolving nature of citation practices, and discuss how these factors should temper over-reliance on automated scores. Although our experiments focus on a single journal, the design of AI-CITE is generic and can be extended to multi-journal or cross-discipline settings with appropriate re-Training. The framework can incorporate additional signals such as social attention or open access status, providing a flexible foundation for future research on early impact prediction. Finally, we release our code and experimental protocol to encourage reproducibility, critical scrutiny and further methodological advances in this emerging area.

Original languageEnglish
Pages (from-to)217675-217691
Number of pages17
JournalIEEE Access
Volume13
DOIs
StatePublished - 2025

Bibliographical note

Publisher Copyright:
© 2013 IEEE.

Keywords

  • Academic impact prediction
  • artificial intelligence
  • bibliometrics
  • citation analysis
  • deep learning
  • machine learning
  • natural language processing
  • probability calibration
  • scholarly influence
  • semantic embeddings

Fingerprint

Dive into the research topics of 'AI-Driven Prediction of Scholarly Article Influence'. Together they form a unique fingerprint.

Cite this