
Advanced Information Retrieval System: Theoretical and Experimental Perspective
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Content
- Cover
- Title
- Copyright
- End User License Agreement
- Contents
- Foreword
- Preface
- Evaluating Traditional and Modern Information Retrieval Techniques
- INTRODUCTION
- RELATED PAPERS
- A Comparison of Usability Techniques for Evaluating Information Retrieval System Interfaces (2009)
- A Survey on Various Architectures, Models, and Methodologies for Information Retrieval (2013)
- Comparative Study of Information Retrieval Models used in the Search Engine (2014)
- A Comparison of Information Retrieval Models (2014)
- Review: Information Retrieval Techniques and Applications (2015)
- Generating Clarifying Questions for Information Retrieval (2020)
- Comparison of Basic Information Retrieval Models (2021)
- Information Retrieval Method (2021)
- Information Retrieval: Recent Advances and Beyond (2023)
- A Search Ranking Algorithm for Web Information Retrieval (2023)
- METHODOLOGY
- Dataset Selection
- Boolean Retrieval
- TF-IDF
- BERT
- Precision
- Recall T
- F1-score
- RESULTS
- CONCLUSION
- Comparative Analysis of Different Information Retrieval Methods
- INTRODUCTION
- LITERATURE REVIEW
- PROPOSED METHODOLOGY
- Step 1: Implementation of TF-IDF Representation
- Step 2: Implementation of Combined Similarity
- Implementation of Cosine Similarity
- Implementation of Dot Product Similarity
- Combined Similarity Approach
- RESULTS AND DISCUSSION
- CONCLUSION
- Comparative Analysis of Collaborative and Content Filtering Techniques on Web-Scraped Data for a Tourism Recommender System
- INTRODUCTION
- LITERATURE SURVEY
- PROPOSED METHOD
- Web Scraping
- Pre-Processing
- Model Building
- RESULTS AND DISCUSSION
- CONCLUSION AND FUTURE WORK
- An Information Retrieval-Based Framework for Analysing Viewer Sentiments in YouTube Comments
- INTRODUCTION
- LITERATURE SURVEY
- PROBLEM STATEMENT
- PROPOSED METHODOLOGY
- RESULTS AND DISCUSSION
- CONCLUSION AND FUTURE WORK
- A Framework for Sentiment Mining in YouTube Comments Using Information Retrieval Methods
- INTRODUCTION
- LITERATURE SURVEY
- PROBLEM STATEMENT
- PROPOSED METHODOLOGY
- RESULTS AND DISCUSSION
- CONCLUSION AND FUTURE WORK
- Sentence Interpretation and Semantic Role Classification Using BERT
- INTRODUCTION
- LITERATURE REVIEW
- PROPOSED METHODOLOGY
- RESULTS ANALYSIS
- CONCLUSION
- Image-Audio Based Recommendations System for Information Retrieval
- INTRODUCTION
- LITERATURE REVIEW
- PROPOSED MODEL
- RESULT AND DISCUSSION
- CONCLUSION
- Hybrid Book Recommendation System Integrating Collaborative and Content-Based Filtering Techniques
- INTRODUCTION
- LITERATURE SURVEY
- PROPOSED METHODOLOGY
- Content-Based Filtering Model (CBF)
- Collaborative Filtering Model (CF)
- Integration of Hybrid System
- RESULTS AND DISCUSSION
- Advantages of the Hybrid Approach
- CONCLUSION
- Medicine Recommendation System using TF-IDF and Machine Learning
- INTRODUCTION
- LITERATURE REVIEW
- PROPOSED TECHNIQUE
- RESULT AND DISCUSSION
- CONCLUSION
- Image-Based Recommendation System for Various Fashion Styles
- INTRODUCTION
- LITERATURE SURVEY
- PROBLEM STATEMENT
- PROPOSED METHODOLOGY
- Dataset Preparation
- Preprocessing
- CNN Architecture Design
- Model Structure
- Model Training:
- RESULT ANALYSIS AND VISUALIZATION
- CONCLUSION AND FUTURE WORK
- Personalized Web Crawler for Retrieving Patent and Research Paper Information from Google Patents and IEEE Xplore
- INTRODUCTION
- LITERATURE REVIEW
- PROBLEM STATEMENT
- PROPOSED METHODOLOGY
- CONCLUSION
- References
- Subject Index
- Back Cover
Comparative Analysis of Different Information Retrieval Methods
Urmila Pilania, Manoj Kumar, Sanjay Singh
Abstract
Information Retrieval (IR) techniques are growing continuously from being keyword-based systems to advanced search. These days, IR techniques utilize Machine Learning (ML), Deep Learning (DL), and Natural Language Processing (NLP) for providing more accurate and personalized results. In the proposed research work, the IR techniques are analysed for their merits and demerits. In the work, it has been examined how contemporary research has been transformed into query document matching. This work integrates Term Frequency-Inverse Document Frequency (TF-IDF) into two retrieval metrics-cosine similarity and dot product similarity. Integration aims to provide better results. Cosine similarity is good at capturing vector orientation, while dot product similarity is good for vector magnitude. A combined similarity is weighted at parameter a to enhance the retrieval capacity. From the simulation of work, it has been calculated that the combined method performed well. In the future, authors will incorporate machine learning or deep learning methods to enhance the performance of these IR techniques.
Keywords: Information retrieval, Term frequency-inverse document frequency, Cosine similarity, dot product similarity, Retrieval Augmented Generation (RAG).INTRODUCTION
As digital information is growing day by day, IR techniques need to be more accurate so that the required information can be retrieved on time. To improve the IR system, the authors analyzed different IR techniques to find the merits and demerits of the existing methods. There is a significant improvement in IR techniques if we consider the growth from traditional techniques to modern techniques. Modern techniques can handle diverse data and retrieve accurate results on time [16]. Due to the exponential growth of digital data, the components of search range from educational content to social media, transport, e-commerce, healthcare, and many more.
The user experience is improved by maintaining scalability and confirming the relevance of the content [17]. Fig. (1) represents some measure functions that are
required to be performed before the process of actual search starts, such as understanding how to formulate the query using some special keywords like OR, AND, NOT, etc. [18]. First, we need to understand the classical methods and then apply the modern methods for information retrieval. The data needs to be stored in a structured way for efficient query retrieval. The text pre-processing includes tokenization, stop-word removal, stemming, and much more needs to be done. Users are also required to capture the semantic relationship in data. The dimensions of data are also required to be reduced so that hidden relationships can be captured on time. Fig. (2) shows the different components of the IR system.
Fig. (1))Prior work for IR methods [18]. Fig. (2))
Components of IR [19].
The paper is organized into a total of 5 sections. Section 1 is about the introduction of IR. The general prior steps, along with the components of IR, are explained. Section 2 discusses the literature review with the help of a literature summary table. Section 3 is about the proposed methodology, in which the proposed techniques are discussed in detail along with their merits and demerits. Section 4 presents the results and discussion, and graphs are used to explain in detail. Section 5 is the conclusion section, along with the future scope.
LITERATURE REVIEW
In paper [16], novel IR methods are used to employ generative models to link queries to related document identifiers. The work has been analyzed to enhance query generation excellence, examine learnable identifiers, and improve scalability, as well as integrate GR with multi-task learning frameworks. The author [17] proposed a model of integration of NLP and ML. It is based on a court case summary data. The proposed method automates citation retrieval by applying textual and cutting-edge embedding techniques. The proposed work was validated using the Supreme Court of the United States dataset, achieving an accuracy of 90.9%.
Fardin Akhlaghian et al. [20] investigated personalizing search engine results with autonomous fuzzy concept networks, which use ontology ideas to augment a common fuzzy network depending on user profiles. Experiments reveal that personalized search engine results outperform common fuzzy network notions. Javed A. Aslam et al. [21] presented a method for measuring retrieval system performance without making relevant judgments, and it shows that it coincides with actual assessments in the TREC competition. The researchers employ a measure to assess the similarity of retrieval systems and demonstrate that evaluating systems based on average similarity produces results comparable to Soboroff's methodology, demonstrating a preference for popularity over performance.
Patrick Lewis et al. [22] examined a general-purpose method for optimizing RAG, a language generation method that utilizes parametric and non-parametric memory that has already been taught. Task-specific architectures and parametric seq2seq models lose ground to RAG models, which produce more factual, diversified, and specific language than a state-of-the-art parametric-only seq2seq baseline. Pre-trained language models generate state-of-the-art results and retain their factual knowledge when used on subsequent NLP tasks. They perform worse than task-specific designs, however, because of their limited access to and manipulation of knowledge. Yahui Chen [23] focused on identifying related candidates in a query using a multi-label classification problem. Two CNNs were proposed: a parallel CNN and a deep CNN. Both models gather local semantic features and choose global features using a max-over-time pooling layer. Experiments demonstrated that these models outperform classic SVC-based techniques, with Deep CNN doing better because of its greater semantic learning ability.
Wasseem N. Ibrahem Al-Obaydy et al. [24] described a document classification strategy for categorizing research publications into expressive groups based on a common scientific area. The method classified documents using word tokens taken from themes relevant to a single group and the K-means clustering algorithm. The approach categorized papers based on their title, abstract, keywords, and category subjects. Experimental results suggested that this technique outperformed the k-nearest neighbors algorithm in terms of information retrieval accuracy. Akram Roshdi et al. [25] examined a variety of IR models and methodologies, including indexing algorithms and classical models. IR arose in the 1950s as a response to the requirement to archive and locate important information. Over the last 40 years, IR systems have expanded tremendously, and they are now an important study subject in computer science.
Mei Kobayashi et al. [26] examined research on the development of the Internet and information search technology. It has shown persistent tendencies of exponential development in the previous and upcoming decades, with 85% of consumers using search engines. However, users are disappointed with the performance of contemporary search engines, citing slow retrieval speed, communication delays, and poor result quality as major complaints. Aleksander Theo Strand et al. [27] discussed SoccerRAG, which is a methodology for extracting soccer-related information from multimodal datasets by combining RAG with Large Language Models. It enabled dynamic querying, automated data validation, and improved user interaction. The interactive user interface provided a chatbot-like visual experience. Shinnosuke Tanaka et al. [28] developed KnowledgeHub, a program that extracts information from scientific literature and answers questions by converting PDF documents into text and structured representations. It employed a browser-based annotation tool to annotate the information, train Named Entity Recognition and Relation Classification models, and create a knowledge graph. It also included Large Language Models for QA and summarisation, giving customers complete visibility into the knowledge discovery cycle.
Zhiwei Li et al. [29] examined existing methodologies, problems, prospective research prospects, and benchmarks in the FRS field to provide context and assistance for investigating this new topic. Federated Recommendation Systems (FRS) is a potential way to protect user privacy that combines federated learning with recommendation systems. However, FRS has limitations, such as data heterogeneity and paucity. Foundation Models (FM) were models that understand human intent and perform specified tasks, resulting in high-quality content in the image and text...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.