Abstract
Citation indicators can be used to measure the impact or usefulness of research results in a scientific article; however, this usage can be controversial. Intrinsic and extrinsic factors influence the citation of an article, not to mention that citation behavior can differ between thematic areas, which hinders the comparison between articles and disciplines. Understanding that context can affect citation analysis is essential to interpret indicators properly; for this reason, we want to recognize the factors that influence the citation of Colombian biomedical journals indexed in Scopus using Machine Learning algorithms. With ‘Gradient Boosting Classifier’ and ‘Light Gradient Boosting Machine’ algorithms, we find characteristics of importance such as the h-index of the first and last author, open access, number of authors and keywords of the article, in addition to identifying the number of pages. These characteristics are relevant to the area of interest and can provide context for future analyses, always considering that what should be relevant about an article is not how many citations it attracts but whether it helps to fill gaps in knowledge.
References
Aksnes, Dag, Liv Langfeldt y Paul Wouters. 2019. “Citations, Citation Indicators, and Research Quality: An Overview of Basic Concepts and Theories”. SAGE Open 9 (1): 1-17. https://doi.org/10.1177/2158244019829575
Alohali, Yousef, Mahmoud Samir Fayed, Tamer Mesallam, Yassin Abdelsamad, Fida Almuhawas y Abdulrahman Hagr. 2022. “A Machine Learning Model to Predict Citation Counts of Scientific Papers in Otology Field”. BioMed Research International 2022: 1-12. https://doi.org/10.1155/2022/2239152
Anderson, Caleb, Kenneth Nugent y Christopher Peterson. 2021. “Academic Journal Retractions and the COVID-19 Pandemic”. Journal of Primary Care & Community Health 12: 1-6 https://doi.org/10.1177/21501327211015592
Aphinyanaphongs, Yindalon, Alexander Statnikov y Constantin Aliferis. 2006. “A Comparison of Citation Metrics to Machine Learning Filters for the Identification of High Quality MEDLINE Documents”. Journal of the American Medical Informatics Association 13 (4): 446-55. https://doi.org/10.1197/jamia.M2031
Arrizabalaga, Olatz, David Otaegui, Itziar Vergara, Julio Arrizabalaga y Eva Méndez. 2020. “Open Access of COVID-19-Related Publications in the First Quarter of 2020: A Preliminary Study Based in PubMed”. F1000Research 9 (649): 1-34. https://doi.org/10.12688/f1000research.24136.2
Basson, Isabel, Jaco Blanckenberg y Heidi Prozesky. 2021. “Do Open Access Journal Articles Experience a Citation Advantage? Results and Methodological Reflections of an Application of Multiple Measures to an Analysis by WoS Subject Areas”. Scientometrics 126 (1): 459-84. https://doi.org/10.1007/s11192-020-03734-9
Bordons, María, Javier Aparicio y Rodrigo Costas. 2013. “Heterogeneity of Collaboration and Its Relationship with Research Impact in a Biomedical Field.” Scientometrics 96 (2): 443-66. https://doi.org/10.1007/s11192-012-0890-7
Cáceres Castellanos, Gustavo. 2014. “La Importancia de publicar los resultados de investigación”. Revista Facultad de Ingeniería 23 (37): 7-8. https://www.redalyc.org/articulo.oa?id=413937008001
Crespo, Juan, Yungrong Li y Javier Ruiz-Castillo. 2012. “Differences in Citation Impact across Scientific Fields”. Working Papers Economic Series 12 (6): 1-32. https://e-archivo.uc3m.es/bitstream/handle/10016/14771/we1206.pdf?sequence=1
Crespo, Juan, Yungrong Li y Javier Ruiz-Castillo. 2013. “The Measurement of the Effect on Citation Inequality of Differences in Citation Practices across Scientific Fields”. PLOS ONE 8 (3): 1-9. https://doi.org/10.1371/journal.pone.0058727
Cronin, Blaise, y Lokman Meho. 2006. “Using the H-index to Rank Influential Information Scientists”. Journal of the American Society for Information Science and Technolog y 57 (9): 1275-78. https://doi.org/10.1002/asi.20354
Datos Abiertos Colombia. 2022. “Revistas Indexadas, Índice Nacional Publindex 2017 - 2022”. Ciencia, Tecnología e Innovación. 28 de noviembre de 2022. https://www.datos.gov.co/Ciencia-Tecnolog-a-e-Innovaci-n/Revistas-Indexadas-ndice-Nacional-Publindex-2017-2/fsjb-9cah
Figg, William, Lara Dunn, David Liewehr, Seth Steinberg, Paul Thurman, Carl Barrett y Julian Birkinshaw. 2006. “Scientific Collaboration Results in Higher Citation Rates of Published Articles”. Pharmacotherapy: The Journal of Human Pharmacolog y and Drug Therapy 26 (6): 759-67. https://doi.org/10.1592/phco.26.6.759
Fu, Lawrence, y Constantin Aliferis. 2010. “Using Content-Based and Bibliometric Features for Machine Learning Models to Predict Citation Counts in the Biomedical Literature”. Scientometrics 85 (1): 257-70. https://doi.org/10.1007/s11192-010-0160-5
Grover, Varun, Roopa Raman y Adam Stubblefield. 2014. “What Affects Citation Counts in MIS Research Articles? An Empirical Investigation”. Communications of the Association for Information Systems 34: 1435-56. https://doi.org/10.17705/1CAIS.03474
Harzing, Anne-Wil. 2007. Publish or Perish. V. 8. Windows. https://harzing.com/resources/publish-or-perish
He, Zi-Lin. 2009. “International Collaboration Does Not Have Greater Epistemic Authority”. Journal of the American Society for Information Science and Technolog y 60 (10): 2151-64. https://doi.org/10.1002/asi.21150
Iqbal, Sehrish, Saeed-Ul Hassan, Naif Radi Aljohani, Salem Alelyani, Raheel Nawaz y Lutz Bornmann. 2021. “A Decade of In-Text Citation Analysis Based on Natural Language Processing and Machine Learning Techniques: An Overview of Empirical Studies”. Scientometrics 126: 6551-99. https://doi.org/10.1007/s11192-021-04055-1
Langham-Putrow, Allison, Caitlin Bakker y Amy Riegelman. 2021. “Is the Open Access Citation Advantage Real? A Systematic Review of the Citation of Open Access and Subscription-Based Articles”. PLOS ONE 16 (6): 1-20. https://doi.org/10.1371/journal.pone.0253129
Martínez-Plumed, Fernando, Lidia Contreras-Ochando, Cesar Ferri, José Hernández-Orallo, Meelis Kull, Nicolas Lachiche, María José Ramírez-Quintana y Peter Flach. 2021. “CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories”. IEEE Transactions on Knowledge and Data Engineering 33 (8): 3048-61. https://doi.org/10.1109/TKDE.2019.2962680
Martinovich, Viviana. 2020. “Indicadores de citación y relevancia científica: genealogía de una representación”. Dados. Revista de Ciências Sociais 63 (2): 2-29. https://doi.org/10.1590/001152582020218
Merton, Robert King. 1988. “The Matthew Effect in Science, II: Cumulative Advantage and the Symbolism of Intellectual Property”. Isis 79 (4): 606-23. https://www.jstor.org/stable/234750
Mingers, John, y Loet Leydesdorff. 2015. “A Review of Theory and Practice in Scientometrics.” European Journal of Operational Research 246 (1): 1-19. https://doi.org/10.1016/j.ejor.2015.04.002
Moez, Ali. 2020. PyCaret: An Open Source, Low-Code Machine Learning Library in Python. V. 1.0.0. https://www.pycaret.org
Navarrete, Luz, y Claudia Pérez. 2019. “Revistas biomédicas: desarrollo y evolución”. Revista Médica Clínica Las Condes 30 (3): 219-25. https://doi.org/10.1016/j.rmclc.2019.04.002
Onodera, Natsuo, y Fuyuki Yoshikane. 2015. “Factors Affecting Citation Rates of Research Articles”. Journal of the Association for Information Science and Technolog y 66 (4): 739-64. https://doi.org/10.1002/asi.23209
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss y Vincent Dubourg. 2011. “Scikit-Learn: Machine Learning in Python”. The Journal of Machine Learning Research 12: 2825-30. https://www.jmlr.org/papers/volume12/pedregosa11a/pedregosa11a.pdf
Piwowar, Heather, Jason Priem, Vincent Larivière, Juan Pablo Alperin, Lisa Matthias, Bree Norlander, Ashley Farley, Jevin West y Stefanie Haustein. 2020. “The State of OA: A Large-Scale Analysis of the Prevalence and Impact of Open Access Articles”. PeerJ 6: 1-23. https://doi.org/10.7717/peerj.4375
Pradhan, Dinesh, Joyita Chakraborty y Subrata Nandi. 2019. “Applications of Machine Learning in Analysis of Citation Network”. En Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 330-33. https://doi.org/10.1145/3297001.3297053
Repiso, Rafael, Alicia Moreno-Delgado e Ignacio Aguaded. 2021. “Factors Affecting the Frequency of Citation of an Article”. Iberoamerican Journal of Science Measurement and Communication 1 (1): 1-6. https://doi.org/10.47909/ijsmc.08
Ronda-Pupo, Guillermo Armando, Nelson Fernández-Vergara, Rodrigo Alda-Varas, Fernando Aurelio Álvarez-Castillo, Carlos Molina y Walter Sergio Terrazas-Núñes. 2022. “Evaluación del desempeño investigativo del Sistema Universitario Chileno 2006-2020”. Investigación Bibliotecológica: archivonomía, bibliotecología e información 36 (91): 109-23. https://doi.org/10.22201/iibi.24488321xe.2022.91.58505
Rose, Michael, y John Kitchin. 2019. “Pybliometrics: Scriptable Bibliometrics Using a Python Interface to Scopus”. SoftwareX 10: 100263. https://doi.org/10.1016/j.softx.2019.100263
Stephan, Paula, Reinhilde Veugelers y Jian Wang. 2017. “Reviewers Are Blinkered by Bibliometrics”. Nature 544: 411-12. https://doi.org/10.1038/544411a
Su, Zhongqi. 2020. “Prediction of Future Citation Count with Machine Learning and Neural Network”. En 2020 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC), 101-4. IEEE. https://doi.org/10.1109/IPEC49694.2020.9114959
The pandas development team. 2023. “pandas-dev/pandas: Pandas (v2.1.1)”. Zenodo, 20 de septiembre de 2023. https://doi.org/10.5281/zenodo.8364959
Authors:
- They must sent the publication authorization letter to Investigación Bibliotecológica: archivonomía, bibliotecología e información.
- They can share the submission with the scientific community in the following ways:
- As teaching support material
- As the basis for lectures in academic conferences
- Self-archiving in academic repositories.
- Dissemination in academic networks.
- Posting to author’s blogs and personal websites
These allowances shall remain in effect as long as the conditions of use of the contents of the journal are duly observed pursuant to the Creative Commons:Attribution-NonCommercial-NoDerivatives 4.0 license that it holds. DOI links for download the full text of published papers are provided for the last three uses.
Self-archiving policy
For self-archiving, authors must comply with the following
a) Acknowledge the copyright held by the journal Investigación Bibliotecológica: archivonomía, bibliotecología e información.
b) Establish a link to the original version of the paper on the journal page, using, for example, the DOI.
c) Disseminate the final version published in the journal.
Licensing of contents
The journal Investigación Bibliotecológica: archivonomía, bibliotecología e información allows access and use of its contents pursuant to the Creative Commons license: Attribution- Non-commercial-NoDerivatives 4.0.
Investigación Bibliotecológica: archivonomía, bibliotecología e información by Universidad Nacional Autónoma de México is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional License.
Creado a partir de la obra en http://rev-ib.unam.mx/ib.
This means that contents can only be read and shared as long as the authorship of the work is acknowledged and cited. The work shall not be exploited for commercial ends nor shall it been modified.
Limitation of liability
The journal is not liable for academic fraud or plagiarism committed by authors, nor for the intellectual criteria they employ. Similarly, the journal shall not be liable for the services offered through third party hyperlinks contained in papers submitted by authors.
In support of this position, the journal provides the Author’s Duties notice at the following link: Responsibilities of authors.
The director or editor of the journal shall notify authors in the event it migrates the contents of the journal’s official website to a distinct IP or domain.