Abstract
Objective: extract the research topics from the summaries and bibliographic data of the articles indexed in the Scopus database that have the department of Chocó (Colombia) as their object of study. Methods: The keywords Chocó AND Colombia were searched in the Scopus database, the bibliographic references were exported to End- Note and the data of author(s), title, periodical publication, volume, number, year and abstract were extracted, they were converted into a text file, references and symbols were removed. The manipulation of the pdf file was carried out with the execution of text preparation, tokenization, lemmatization and obtaining the list of bigrams that were carried out in the integrated development environment (EDI) of RStudio. Results: 668 bibliographic records of indexed documents were found in Scopus. The words with the highest frequency of occurrence are: «species», «Colombia», «Chocó», «forest», «pacific», «tropical», etc. 89 841 bigrams were found, including «new species», «pacific coast», «colombian pacific», etc. Word collocations show that «gold» matches «mining», «mercury», «platinum», etc. «Chocó» combines with «Colombia», «biogeographical», «rain», «tropical», etc. «Biodiversity» combines with «conservation», «tropical», «agricultural», etc. «Climate» combines with «change», «variability», «basin», etc. Conclusions: the most frequent words show that there is a concern for the study of mining, biodiversity, climate change, the tropical forest, the Pacific Ocean, etc.
References
Al-Betar, M. A., Abasi, A. K., Al‑Naymat, G., Arshad K. y Makhadmeh S. N. 2023. Optimization of scientific publications clustering with ensemble approach for topic extraction. Scientometrics, (128): 2819–2877. https://doi-org.biblioteca-colmex.idm.oclc.org/10.1007/s11192-023-04674-w.
Alkan, B. B., Karakuş L. y Direkci B. 2023. Knowledge discovery from the texts of Nobel Prize winners in literature: sentiment analysis and Latent Dirichlet Allocation. Scientometrics, (128): 5311–5334 (2023). https://doi-org.biblioteca-colmex.idm.oclc.org/10.1007/s11192-023-04783-6.
Benoit, K. y Nulty P. 2016. quanteda: Quantitative Analysis of Textual Data. Consultado 2 de agosto, 2023. https://CRAN.R-project.org/package=quanteda
Callon, M., Courtial J. P. y Laville F. 1991. Co-word analysis as a tool for describing the network of interactions between basic and technological research: the case of polymer chemistry. Scientometrics, 22: 155-205.
Csardi, G. y Nepusz, T. 2006. The igraph software package for complex network research. InterJournal Complex Systems, 1695. Consultado 2 de Agosto, 2023. https://igraph.org.
Contreras B., M. 2016. Minería de texto en la clasificación de material bibliográfico. Biblios, (64): 33-43. Consultado 4 de junio, 2023. https://www.redalyc.org/journal/161/16148511003/html
Contreras B., M. 2014. Minería de texto: una visión actual. Biblioteca Universitaria, 17 (2):129-138.
Corpas P., G. 2001. En torno al concepto de colocación. EUSKERA, 46: 89-108.
Eíto B., R. y Senso, J. A. 2004. Minería textual. El Profesional de la Información, 13 (1): 11-27.
Firth, F. R. 1957. Modes of Meaning. Papers in Linguistics,1934-1951. London: Oxford University Press, p. 190-215.
Feinerer, I., K. Hornik. 2023. tm: Text Mining Package. R package version 0.711. Consultado 2 de agosto, 2023. https://CRAN.R-project.org/package=tm
Gobernación del Chocó. 2023. Información general. Quibdó: Gobernación. Consultado 2 de agosto, 2023. https://www.choco.gov.co/departamento/informacion-general.
Lionel, H. y Wickham H. 2018. Purrr: Functional Programming Tools. Consultado 2 de agosto, 2023. https://CRAN.R-project.org/package=purrr.
Hornik, K. 2022. Package nlp. Consultado 2 de agosto, 2023. https://cran.r-roject.org/web/packages/NLP/NLP.pdf
Hotho, A., A. Nürnberger y G. Paaß. 2055. A brief survey of text mining. Journal for Language Technolog y and Computational Linguistics, 20 (1): 19-62.
Hosseini, S., H. Baziyad, R. Norouzi, S. Jabbedari Khiabani, G. Gidófalvi, A. Albadvi, A. Alimohammadi y S. Seyedabrishami. 2021. Mapping the intellectual structure of GIS-T field (2008–2019): a dynamic co-word analysis. Scientometrics, (126): 2667-2688.
Mendoza V., J. B. 2016. Introducción a la minería de textos con R. RPubs. Consultado 2 de Agosto, 2023. https://rpubs.com/jboscomendoza/mineria-de-textos-con-r.
Ma, Yongchao, Ying Teng, Zhongzhun Deng, Li Liu y Yi Zhang Deng. 2023. Does writing style affect gender differences in the research performance of articles? An empirical study of BERT-based textual sentiment analysis. Scientometrics, (128): 2105–2143. https://doi-org.biblioteca-colmex.idm.oclc.org/10.1007/s11192-023-04666-w.
Mariñelarena-Dondena, L., M. L. Errecalde y A. Castro S. 2017. Extracción de conocimiento con técnicas de minería de textos aplicadas a la psicología. Revista Argentina de Ciencias del Comportamiento, 9 (2): 65-76.
Montes-y-Gómez, M. 2001. Minería de texto: un nuevo reto computacional. México: Instituto Politécnico Nacional. https://ccc.inaoep.mx/~mmontesg/publicaciones/2001/MineriaTexto-md01.pdf
Muhr, D., K. Benoit y K. Watanabe. 2023. stopwords: the R package. Consultado 2 de agosto, 2023. https://cran.r-project.org/web/packages/stopwords/readme/README.html
Müller, K. y H. Wickham. 2023. tibble: Simple Data Frames. Consultado 2 de agosto, 2023. https://tibble.tidyverse.org/.
Musabirov, I. y D. Bulygin. 2020. Prototyping text mining and network analysis tools to support netnographic student projects. International Journal of Emerging Technologies in Learning (iJET), 15 (10): 223-232.
Ooms, J. 2023. Package pdftools. Consultado 2 de agosto, 2023. https://cran.r-project.org/web/packages/pdftools/pdftools.pdf.
Pedersen, T. 2022. ggraph: An Implementation of Grammar of Graphics for Graphs and Networks. Consultado 2 de Agosto, 2023. https://github.com/thomasp85/ggraph
Python. 2023. El tutorial de Python. Consultado 18 de octubre, 2023. https://docs.python.org/es/3/tutorial/
Rahimian, M., J. L. Warner, S. K. Jain, R. B. Davis, J. A. Zerillo y R. M. Joyce. 2019. Significant and distinctive n-grams in oncology notes: a text-mining method to analyze the effect of OpenNotes on clinical documentation. JCO Clinical Cancer Informatics, (3): 1-9.
Roychowdhury, K., R. Bahanja y S. Biswas. 2022. Mapping the research landscape of Covid-19 from social sciences perspective: a bibliometric analysis. Scientometrics, 127 (8): 4547-4568.
Russell, M. A. 2013. Mining the social web: data mining Facebook, Twitter, LinkedIn, Google+, GitHub, and more. O’Reilly Media, Inc.
Shen, Si, Jiangfeng Liu, Litao Lin, Ying Huang, Lin Zhang, Chang Liu, Yutong Feng y Dongbo Wang. 2023. SsciBERT: a pre-trained language model for social science texts. Scientometrics, (128): 1241–1263. https://doi-org.biblioteca-colmex.idm.oclc.org/10.1007/s11192-022-04602-4.
Silge, J. 2023. Package tidytext. Consultado 2 de agosto, 2023. https://cran.r-project.org/web/packages/tidytext/tidytext.pdf.
The R Foundation. 2023. What is R? Consultado 18 de octubre, 2023. https://www.r-project.org/about.html
Trask, A., D. Gilmore y M. Russell. 2015. Modeling order in neural word embeddings at scale. Proceedings of the 32nd International Conference on Machine Learning, 2266-2275. Lille, France: MLResearchPres.
Udanor, C. y Ch. C. Anyanwu. 2019. Combating the challenges of social media hate speech in a polarized society: a Twitter ego lexalytics approach. Data Technologies and Applications, 53 (4): 501-552.
Urbizagastegui-Alvarado, R. 2021. La bibliometría brasileña: minería de textos. Revista ACB: Biblioteconomía em Santa Catarina, 26 (1): 8-18.
Urbizagastegui-Alvarado, R. 2022. La minería de textos como subsidio para la organización de la información: un estudio exploratorio. Revista Conhecimento em Ação, 7 (2): 5-26.
Ye, Y. E. y J. C. Na. 2018. To get cited or get tweeted: a study of psychological academic articles. Online Information Review, 42 (7): 1065-1081.
Yin, X., H. Wang, P. Yin, H. Zhu y Z. Zhang. 2020. A co-occurrence-based approach of automatic keyword expansion using mass diffusion. Scientometrics, (124): 1885-1905.
Wickham, H., R. François, L. Henry, K. Müller y D. Vaughan. 2023. dplyr: a grammar of data manipulation. Consultado 2 de Agosto, 2023. https://github.com/tidyverse/dplyr.
Wickham, H., J. Hester y J. Bryan. 2023. readr: Read Rectangular Text Data. Consultado 2 de agosto, 2023. https://cran.r-project.org/web/packages/readr/index.html.
Wickham, H. 2022. Stringr: Simple, Consistent Wrappers for Common String Operations. Consultado 2 de agosto, 2023. https://cran.r-project.org/web/packages/stringr/index.html.
Wickham, H y D. Seidel. 2022. scales: Scale Functions for Visualization. Consultado 2 de Agosto, 2023. https://scales.r-lib.org. https://github.com/r-lib/scales.
Wickham, H. 2009. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag. https://link-springer-com.biblioteca-colmex.idm.oclc.org/book/10.1007/978-0-387-98141-3.
Wickham, Hadley y Francois Romain. 2016. dplyr: A Grammar of Data Manipulation. Consultado 2 de Agosto, 2023. https://CRAN.R-project.org/package=dplyr.
Zhang,Tingting, Baozhen Lee, Qinghua Zhu, Xi Han y Ke Chen. 2023. Document keyword extraction based on semantic hierarchical graph model. Scientometrics, (128): 2623–2647. https://doi-org.biblioteca-colmex.idm.oclc.org/10.1007/s11192-023-04677-7
Authors:
- They must sent the publication authorization letter to Investigación Bibliotecológica: archivonomía, bibliotecología e información.
- They can share the submission with the scientific community in the following ways:
- As teaching support material
- As the basis for lectures in academic conferences
- Self-archiving in academic repositories.
- Dissemination in academic networks.
- Posting to author’s blogs and personal websites
These allowances shall remain in effect as long as the conditions of use of the contents of the journal are duly observed pursuant to the Creative Commons:Attribution-NonCommercial-NoDerivatives 4.0 license that it holds. DOI links for download the full text of published papers are provided for the last three uses.
Self-archiving policy
For self-archiving, authors must comply with the following
a) Acknowledge the copyright held by the journal Investigación Bibliotecológica: archivonomía, bibliotecología e información.
b) Establish a link to the original version of the paper on the journal page, using, for example, the DOI.
c) Disseminate the final version published in the journal.
Licensing of contents
The journal Investigación Bibliotecológica: archivonomía, bibliotecología e información allows access and use of its contents pursuant to the Creative Commons license: Attribution- Non-commercial-NoDerivatives 4.0.

Investigación Bibliotecológica: archivonomía, bibliotecología e información by Universidad Nacional Autónoma de México is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional License.
Creado a partir de la obra en http://rev-ib.unam.mx/ib.
This means that contents can only be read and shared as long as the authorship of the work is acknowledged and cited. The work shall not be exploited for commercial ends nor shall it been modified.
Limitation of liability
The journal is not liable for academic fraud or plagiarism committed by authors, nor for the intellectual criteria they employ. Similarly, the journal shall not be liable for the services offered through third party hyperlinks contained in papers submitted by authors.
In support of this position, the journal provides the Author’s Duties notice at the following link: Responsibilities of authors.
The director or editor of the journal shall notify authors in the event it migrates the contents of the journal’s official website to a distinct IP or domain.

