Methodological Proposal for Document Information Retrieval: Integration of Knowledge Graphs and Neural Networks
PDF (Español (España))

Keywords

Information Retrieval
Knowledge Graphs (KG)
Graph Attention Network (GAT)
Embeddings

How to Cite

Polo-Bautista, L. R., & Casique Vasquez, R. (2025). Methodological Proposal for Document Information Retrieval: Integration of Knowledge Graphs and Neural Networks. Investigación Bibliotecológica. Archivonomía, bibliotecología información, 39(105), 141–163. https://doi.org/10.22201/iibi.24488321xe.2025.105.59051
Métricas de PLUMX

Abstract

The use of graphs to model complex relationships between entities has become a valuable tool in document information retrieval. Thus, this work aims to propose a methodology based on graph neural networks (GNNs) to improve document information retrieval using knowledge graphs (KGs). We transformed the documents into a knowledge graph constructed with lemmas and nounchunks, on which embeddings processed with a graph attention network (GAT) were initialized. When a query is made, the system extracts a subgraph from the global knowledge graph, adjusts the representations, and generates concise and factual responses. We compared its architecture with Llama 3.1, a reference LLM, using three main metrics: number of tokens in the response, similarity to the source document, and processing time. The theoretical and experimental results show improvements due to the accuracy and contextual relevance of the responses obtained.

https://doi.org/10.22201/iibi.24488321xe.2025.105.59051
PDF (Español (España))

References

Ávila-Barrientos, Eder. 2022. “Recuperación de información con linked open data”. Investigación Bibliotecológica: archivonomía, bibliotecología e información 36 (91): 125-46. https://doi.org/10.22201/iibi.24488321xe.2022.91.58567

Chen, Deli, Yankai Lin, Wei Li, Peng Li, Jie Zhou y Xu Sun. 2019. “Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View”. Arxiv. https://arxiv.org/abs/1909.03211

Croft, W. Bruce, Donald Metzler y Trevor Strohman. 2010. Search Engines / Information Retrieval in Practice. Addison-Wesley.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee y Kristina Toutanova. 2019. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. En Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics / Human Language Technologies Volume 1 (Long and Short Papers), editado por Jill Burstein, Christy Doran y Thamar Solorio, 4171-86. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423

Ding, Linyi, Sizhe Zhou, Jinfeng Xiao y Jiawei Han. 2024. “Automated Construction of Theme-Specific Knowledge Graphs”. Arxiv. https://arxiv.org/abs/2404.19146

Gelbukh, Alexander, y Grigori Sidorov. 2006. Procesamiento automático del español con enfoque en recursos léxicos grandes. Centro de Investigación en Computación, Instituto Politécnico Nacional.

Goldberg, Yoav, y Omer Levy. 2014. “Word2vec Explained: Deriving Mikolov et al.’s Negative- Sampling Word-Embedding Method”. Arxiv. https://arxiv.org/abs/1402.3722

Grover, Aditya, y Jure Leskovec. 2016. “Node2vec: Scalable Feature Learning for Networks”. En Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855-64. Association for Computing Machinery. https://doi.org/10.1145/2939672.2939754

Hambarde, Kailash A., y Hugo Proença. 2023. “Information Retrieval: Recent Advances and Beyond”. IEEE Access 11: 76581-604. https://doi.org/10.1109/ACCESS.2023.3295776

Hamilton, William L. 2020. Graph Representation Learning. Springer. Heptalytics. 2025. “Graph Neural Networks: An Efficient Energy Solution for Structured Data Analysis”. Heptalytics. https://www.heptalytics.ai/graph-neural-networks-an-efficient-energy-solution-for-structured-data-analysis

Huang, Xiao, Qingquan Song, Yuening Li y Xia Hu. 2019. “Graph Recurrent Networks with Attributed Random Walks”. En Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 732-40. Association for Computing Machinery. https://doi.org/10.1145/3292500.3330941

Hwang, Sukjun, Brandon Wang y Albert Gu. 2025. “Dynamic Chunking for End-to- End Hierarchical Sequence Modeling”. Arxiv. https://arxiv.org/abs/2507.07955

Kamphuis, Chris. 2020. “Graph Databases for Information Retrieval”. En Advances in Information Retrieval / 42nd European Conference on IR Research, ECIR 2020, editado por Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva y Flávio Martins, 608-12. Springer. https://doi.org/10.1007/978-3-030-45442-5_79

Keramatfar, Abdalsamad, Mohadeseh Rafiee y Hossein Amirkhani. 2022. “Graph Neural Networks: A Bibliometrics Overview”. Machine Learning with Applications 10, e100401. https://doi.org/10.1016/j.mlwa.2022.100401

Kipf, Thomas N., y Max Welling. 2017. “Semi-supervised Classification with Graph Convolutional Networks”. Arxiv. https://arxiv.org/abs/1609.02907

Kostikova, Aida, Zhipin Wang, Deidamea Bajri, Ole Pütz, Benjamin Paaßen y Steffen Eger. 2025. “LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models”. Arxiv. https://arxiv.org/abs/2505.19240

Lai, Siwei, Liheng Xu, Kang Liu y Jun Zhao. 2015. “Recurrent Convolutional Neural Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence 29 (1): 2267-73. https://doi.org/10.1609/aaai.v29i1.9513

Leskovec, Jure. 2023. “Databases as Graphs: Predictive Queries for Declarative Machine Learning”. En Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, 1. Association for Computing Machinery. https://doi.org/10.1145/3584372.3589939

Mavromatis, Costas, y George Karypis. 2024. “GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning”. Arxiv. https://arxiv.org/abs/2405.20139

Metzler, Donald, Yi Tay, Dara Bahri y Marc Najork. 2021. “Rethinking Search: Making Domain Experts out of Dilettantes”. ACM SIGIR Forum 55 (1): 1-27. https://doi.org/10.1145/3476415.3476428

Montani, Ines, Matthew Honnibal, Adriane Boyd, Sofie Van Landeghem y Henning Peters. 2020. Spacy: Industrial-Strength Natural Language Processing in Python [software]. Zenodo. https://doi.org/10.5281/zenodo.1212303

Peng, Ciyuan, Feng Xia, Mehdi Naseriparsa y Francesco Osborne. 2023. “Knowledge Graphs: Opportunities and Challenges”. Artificial Intelligence Review 56 (11): 13071-102. https://doi.org/10.1007/s10462-023-10465-9

Perozzi, Bryan, Rami Al-Rfou y Steven Skiena. 2014. “DeepWalk: Online Learning of Social Representations”. En Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701-10. Association for Computing Machinery. https://doi.org/10.1145/2623330.2623732

Polo-Bautista, Luis Roberto, Sandra Dinora Orantes-Jiménez, Francisco Carrillo-Brenes y Luis M. Vilches-Blázquez. 2025. “Semi-automatic Construction of Knowledge Graphs on Natural Disasters in Mexico Using Large Language Models”. En Geographical Information Systems / 5th Latin American Conference, GIS-LATAM 2024, editado por Miguel Félix Mata-Rivera, Roberto Zagal-Flores, Daniela Elisabeth Ballari y José Antonio León-Borges, 148-67. Springer. https://doi.org/10.1007/978-3-031-80017-7_10

Qiang, Yao, Subhrangshu Nandi, Ninareh Mehrabi, Greg Ver Steeg, Anoop Kumar, Anna Rumshisky y Aram Galstyan. 2024. “Prompt Perturbation Consistency Learning for Robust Language Models”. En Findings of the Association for Computational Linguistic: EACL 2024, editado por Yvette Graham y Matthew Purver, 1357-70. Association for Computational Linguistics. https://aclanthology.org/2024.findings-eacl.91

Ren, Hongyu, Mikhail Galkin, Michael Cochez, Zhaocheng Zhu y Jure Leskovec. 2023. “Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases”. Arxiv. https://arxiv.org/abs/2303.14617

Tang, Jian, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan y Qiaozhu Mei. 2015. “LINE:Large-Scale Information Network Embedding”. En Proceedings of the 24th International Conference on World Wide Web, 1067-77. International World Wide Web Conferences Steering Committee. https://doi.org/10.1145/2736277.2741093

Tao, Yan, Olga Viberg, Ryan S. Baker y René F. Kizilcec. 2024. “Cultural Bias and Cultural Alignment of Large Language Models”. PNAS Nexus 3 (9), pgae346. https://doi.org/10.1093/pnasnexus/pgae346

Vashishth, Shikhar, Soumya Sanyal, Vikram Nitin y Partha Talukdar. 2020. “Composition- Based Multi-relational Graph Convolutional Networks”. Arxiv. https://arxiv.org/abs/1911.03082

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser e Illia Polosukhin. 2023. “Attention Is All You Need”. Arxiv. https://arxiv.org/abs/1706.03762

Veličković, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò y Yoshua Bengio. 2018. “Graph Attention Networks”. Arxiv. https://arxiv.org/abs/1710.10903

Wang, Guan, Jin Li, Yuhao Sun, Xing Chen, Changling Liu, Yue Wu, Meng Lu, Sen Song y Yasin Abbasi Yadkori. 2025. “Hierarchical Reasoning Model”. Arxiv. https://arxiv.org/abs/2506.21734

Yaxue, Qin. 2020. “Convolutional Neural Networks for Literature Retrieval”. En Proceedings of the 2020 International Conference on Computer Vision, Image and Deep Learning (CVIDL), 393-97. Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/CVIDL51233.2020.00-64

Zhou, Jie, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li y Maosong Sun. 2020. “Graph Neural Networks: A Review of Methods and Applications”. AI Open 1: 57-81. https://doi.org/10.1016/j.aiopen.2021.01.001

Zhu, Yutao, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Haonan Chen, Zheng Liu, Zhicheng Dou y Ji-Rong Wen. 2024. “Large Language Models for Information Retrieval: A Survey”. Arxiv. https://arxiv.org/abs/2308.07107

Authors:

  • They must sent the publication authorization letter to Investigación Bibliotecológica: archivonomía, bibliotecología e información.
  • They can share the submission with the scientific community in the following ways:
    • As teaching support material
    • As the basis for lectures in academic conferences
    • Self-archiving in academic repositories.
    • Dissemination in academic networks.
    • Posting to author’s blogs and personal websites

These allowances shall remain in effect as long as the conditions of use of the contents of the journal are duly observed pursuant to the Creative Commons:Attribution-NonCommercial-NoDerivatives 4.0 license that it holds. DOI links for download the full text of published papers are provided for the last three uses.

Self-archiving policy

For self-archiving, authors must comply with the following

a) Acknowledge the copyright held by the journal Investigación Bibliotecológica: archivonomía, bibliotecología e información.

b) Establish a link to the original version of the paper on the journal page, using, for example, the DOI.

c) Disseminate the final version published in the journal.

Licensing of contents

The journal Investigación Bibliotecológica: archivonomía, bibliotecología e información allows access and use of its contents pursuant to the Creative Commons license: Attribution- Non-commercial-NoDerivatives 4.0.

Licencia de Creative Commons


Investigación Bibliotecológica: archivonomía, bibliotecología e información by Universidad Nacional Autónoma de México is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional License.
Creado a partir de la obra en http://rev-ib.unam.mx/ib.

 

This means that contents can only be read and shared as long as the authorship of the work is acknowledged and cited. The work shall not be exploited for commercial ends nor shall it been modified.

Limitation of liability

The journal is not liable for academic fraud or plagiarism committed by authors, nor for the intellectual criteria they employ. Similarly, the journal shall not be liable for the services offered through third party hyperlinks contained in papers submitted by authors.

In support of this position, the journal provides the Author’s Duties notice at the following link: Responsibilities of authors.

The director or editor of the journal shall notify authors in the event it migrates the contents of the journal’s official website to a distinct IP or domain.

 

Downloads

Download data is not yet available.