Advanced Information Retrieval System: Theoretical and Experimental Perspective

Name: Advanced Information Retrieval System: Theoretical and Experimental Perspective
Brand: Bentham Science Publishers Singapore Pte. Ltd.
Price: 51.33 EUR
Availability: OnlineOnly

Urmila Pilania Manoj Kumar Sanjay Singh(Author)

Bentham Science Publishers Singapore Pte. Ltd.

Published on 16. March 2026

150 pages

E-Book

ePUB with Adobe-DRM

System requirements

979-8-89881-366-6 (ISBN)

€51.33incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Content

Cover
Title
Copyright
End User License Agreement
Contents
Foreword
Preface
Evaluating Traditional and Modern Information Retrieval Techniques
INTRODUCTION
RELATED PAPERS
A Comparison of Usability Techniques for Evaluating Information Retrieval System Interfaces (2009)
A Survey on Various Architectures, Models, and Methodologies for Information Retrieval (2013)
Comparative Study of Information Retrieval Models used in the Search Engine (2014)
A Comparison of Information Retrieval Models (2014)
Review: Information Retrieval Techniques and Applications (2015)
Generating Clarifying Questions for Information Retrieval (2020)
Comparison of Basic Information Retrieval Models (2021)
Information Retrieval Method (2021)
Information Retrieval: Recent Advances and Beyond (2023)
A Search Ranking Algorithm for Web Information Retrieval (2023)
METHODOLOGY
Dataset Selection
Boolean Retrieval
TF-IDF
BERT
Precision
Recall T
F1-score
RESULTS
CONCLUSION
Comparative Analysis of Different Information Retrieval Methods
INTRODUCTION
LITERATURE REVIEW
PROPOSED METHODOLOGY
Step 1: Implementation of TF-IDF Representation
Step 2: Implementation of Combined Similarity
Implementation of Cosine Similarity
Implementation of Dot Product Similarity
Combined Similarity Approach
RESULTS AND DISCUSSION
CONCLUSION
Comparative Analysis of Collaborative and Content Filtering Techniques on Web-Scraped Data for a Tourism Recommender System
INTRODUCTION
LITERATURE SURVEY
PROPOSED METHOD
Web Scraping
Pre-Processing
Model Building
RESULTS AND DISCUSSION
CONCLUSION AND FUTURE WORK
An Information Retrieval-Based Framework for Analysing Viewer Sentiments in YouTube Comments
INTRODUCTION
LITERATURE SURVEY
PROBLEM STATEMENT
PROPOSED METHODOLOGY
RESULTS AND DISCUSSION
CONCLUSION AND FUTURE WORK
A Framework for Sentiment Mining in YouTube Comments Using Information Retrieval Methods
INTRODUCTION
LITERATURE SURVEY
PROBLEM STATEMENT
PROPOSED METHODOLOGY
RESULTS AND DISCUSSION
CONCLUSION AND FUTURE WORK
Sentence Interpretation and Semantic Role Classification Using BERT
INTRODUCTION
LITERATURE REVIEW
PROPOSED METHODOLOGY
RESULTS ANALYSIS
CONCLUSION
Image-Audio Based Recommendations System for Information Retrieval
INTRODUCTION
LITERATURE REVIEW
PROPOSED MODEL
RESULT AND DISCUSSION
CONCLUSION
Hybrid Book Recommendation System Integrating Collaborative and Content-Based Filtering Techniques
INTRODUCTION
LITERATURE SURVEY
PROPOSED METHODOLOGY
Content-Based Filtering Model (CBF)
Collaborative Filtering Model (CF)
Integration of Hybrid System
RESULTS AND DISCUSSION
Advantages of the Hybrid Approach
CONCLUSION
Medicine Recommendation System using TF-IDF and Machine Learning
INTRODUCTION
LITERATURE REVIEW
PROPOSED TECHNIQUE
RESULT AND DISCUSSION
CONCLUSION
Image-Based Recommendation System for Various Fashion Styles
INTRODUCTION
LITERATURE SURVEY
PROBLEM STATEMENT
PROPOSED METHODOLOGY
Dataset Preparation
Preprocessing
CNN Architecture Design
Model Structure
Model Training:
RESULT ANALYSIS AND VISUALIZATION
CONCLUSION AND FUTURE WORK
Personalized Web Crawler for Retrieving Patent and Research Paper Information from Google Patents and IEEE Xplore
INTRODUCTION
LITERATURE REVIEW
PROBLEM STATEMENT
PROPOSED METHODOLOGY
CONCLUSION
References
Subject Index
Back Cover

Comparative Analysis of Different Information Retrieval Methods

Urmila Pilania, Manoj Kumar, Sanjay Singh

Abstract

Information Retrieval (IR) techniques are growing continuously from being keyword-based systems to advanced search. These days, IR techniques utilize Machine Learning (ML), Deep Learning (DL), and Natural Language Processing (NLP) for providing more accurate and personalized results. In the proposed research work, the IR techniques are analysed for their merits and demerits. In the work, it has been examined how contemporary research has been transformed into query document matching. This work integrates Term Frequency-Inverse Document Frequency (TF-IDF) into two retrieval metrics-cosine similarity and dot product similarity. Integration aims to provide better results. Cosine similarity is good at capturing vector orientation, while dot product similarity is good for vector magnitude. A combined similarity is weighted at parameter a to enhance the retrieval capacity. From the simulation of work, it has been calculated that the combined method performed well. In the future, authors will incorporate machine learning or deep learning methods to enhance the performance of these IR techniques.

Keywords: Information retrieval, Term frequency-inverse document frequency, Cosine similarity, dot product similarity, Retrieval Augmented Generation (RAG).

INTRODUCTION

As digital information is growing day by day, IR techniques need to be more accurate so that the required information can be retrieved on time. To improve the IR system, the authors analyzed different IR techniques to find the merits and demerits of the existing methods. There is a significant improvement in IR techniques if we consider the growth from traditional techniques to modern techniques. Modern techniques can handle diverse data and retrieve accurate results on time [16]. Due to the exponential growth of digital data, the components of search range from educational content to social media, transport, e-commerce, healthcare, and many more.

The user experience is improved by maintaining scalability and confirming the relevance of the content [17]. Fig. (1) represents some measure functions that are

required to be performed before the process of actual search starts, such as understanding how to formulate the query using some special keywords like OR, AND, NOT, etc. [18]. First, we need to understand the classical methods and then apply the modern methods for information retrieval. The data needs to be stored in a structured way for efficient query retrieval. The text pre-processing includes tokenization, stop-word removal, stemming, and much more needs to be done. Users are also required to capture the semantic relationship in data. The dimensions of data are also required to be reduced so that hidden relationships can be captured on time. Fig. (2) shows the different components of the IR system.

Fig. (1))
Prior work for IR methods [18]. Fig. (2))
Components of IR [19].

The paper is organized into a total of 5 sections. Section 1 is about the introduction of IR. The general prior steps, along with the components of IR, are explained. Section 2 discusses the literature review with the help of a literature summary table. Section 3 is about the proposed methodology, in which the proposed techniques are discussed in detail along with their merits and demerits. Section 4 presents the results and discussion, and graphs are used to explain in detail. Section 5 is the conclusion section, along with the future scope.

LITERATURE REVIEW

In paper [16], novel IR methods are used to employ generative models to link queries to related document identifiers. The work has been analyzed to enhance query generation excellence, examine learnable identifiers, and improve scalability, as well as integrate GR with multi-task learning frameworks. The author [17] proposed a model of integration of NLP and ML. It is based on a court case summary data. The proposed method automates citation retrieval by applying textual and cutting-edge embedding techniques. The proposed work was validated using the Supreme Court of the United States dataset, achieving an accuracy of 90.9%.

Fardin Akhlaghian et al. [20] investigated personalizing search engine results with autonomous fuzzy concept networks, which use ontology ideas to augment a common fuzzy network depending on user profiles. Experiments reveal that personalized search engine results outperform common fuzzy network notions. Javed A. Aslam et al. [21] presented a method for measuring retrieval system performance without making relevant judgments, and it shows that it coincides with actual assessments in the TREC competition. The researchers employ a measure to assess the similarity of retrieval systems and demonstrate that evaluating systems based on average similarity produces results comparable to Soboroff's methodology, demonstrating a preference for popularity over performance.

Patrick Lewis et al. [22] examined a general-purpose method for optimizing RAG, a language generation method that utilizes parametric and non-parametric memory that has already been taught. Task-specific architectures and parametric seq2seq models lose ground to RAG models, which produce more factual, diversified, and specific language than a state-of-the-art parametric-only seq2seq baseline. Pre-trained language models generate state-of-the-art results and retain their factual knowledge when used on subsequent NLP tasks. They perform worse than task-specific designs, however, because of their limited access to and manipulation of knowledge. Yahui Chen [23] focused on identifying related candidates in a query using a multi-label classification problem. Two CNNs were proposed: a parallel CNN and a deep CNN. Both models gather local semantic features and choose global features using a max-over-time pooling layer. Experiments demonstrated that these models outperform classic SVC-based techniques, with Deep CNN doing better because of its greater semantic learning ability.

Wasseem N. Ibrahem Al-Obaydy et al. [24] described a document classification strategy for categorizing research publications into expressive groups based on a common scientific area. The method classified documents using word tokens taken from themes relevant to a single group and the K-means clustering algorithm. The approach categorized papers based on their title, abstract, keywords, and category subjects. Experimental results suggested that this technique outperformed the k-nearest neighbors algorithm in terms of information retrieval accuracy. Akram Roshdi et al. [25] examined a variety of IR models and methodologies, including indexing algorithms and classical models. IR arose in the 1950s as a response to the requirement to archive and locate important information. Over the last 40 years, IR systems have expanded tremendously, and they are now an important study subject in computer science.

Mei Kobayashi et al. [26] examined research on the development of the Internet and information search technology. It has shown persistent tendencies of exponential development in the previous and upcoming decades, with 85% of consumers using search engines. However, users are disappointed with the performance of contemporary search engines, citing slow retrieval speed, communication delays, and poor result quality as major complaints. Aleksander Theo Strand et al. [27] discussed SoccerRAG, which is a methodology for extracting soccer-related information from multimodal datasets by combining RAG with Large Language Models. It enabled dynamic querying, automated data validation, and improved user interaction. The interactive user interface provided a chatbot-like visual experience. Shinnosuke Tanaka et al. [28] developed KnowledgeHub, a program that extracts information from scientific literature and answers questions by converting PDF documents into text and structured representations. It employed a browser-based annotation tool to annotate the information, train Named Entity Recognition and Relation Classification models, and create a knowledge graph. It also included Large Language Models for QA and summarisation, giving customers complete visibility into the knowledge discovery cycle.

Zhiwei Li et al. [29] examined existing methodologies, problems, prospective research prospects, and benchmarks in the FRS field to provide context and assistance for investigating this new topic. Federated Recommendation Systems (FRS) is a potential way to protect user privacy that combines federated learning with recommendation systems. However, FRS has limitations, such as data heterogeneity and paucity. Foundation Models (FM) were models that understand human intent and perform specified tasks, resulting in high-quality content in the image and text...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Advanced Information Retrieval System: Theoretical and Experimental Perspective

Description

More details

Content

Comparative Analysis of Different Information Retrieval Methods

Abstract

INTRODUCTION

LITERATURE REVIEW

System requirements