How Machine Learning is Innovating Today's World

Name: How Machine Learning is Innovating Today's World | A Concise Technical Guide
Brand: Wiley
Price: 204.99 EUR
Availability: OnlineOnly

A Concise Technical Guide

Arindam Dey Sukanta Nayak Ranjan Kumar Sachi Nandan Mohanty(Editor)

Wiley (Publisher)

1st Edition

Published on 18. June 2024

797 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-394-21413-6 (ISBN)

€204.99incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.

Alles über E-Books, Kopierschutz & Dateiformate finden Sie in unserem Info- & Hilfebereich.

Provides a comprehensive understanding of the latest advancements and practical applications of machine learning techniques.

Machine learning (ML), a branch of artificial intelligence, has gained tremendous momentum in recent years, revolutionizing the way we analyze data, make predictions, and solve complex problems. As researchers and practitioners in the field, the editors of this book recognize the importance of disseminating knowledge and fostering collaboration to further advance this dynamic discipline. How Machine Learning is Innovating Today's World is a timely book and presents a diverse collection of 25 chapters that delve into the remarkable ways that ML is transforming various fields and industries.

It provides a comprehensive understanding of the practical applications of ML techniques. The wide range of topics include:

An analysis of various tokenization techniques and the sequence-to-sequence model in natural language processing
explores the evaluation of English language readability using ML models
a detailed study of text analysis for information retrieval through natural language processing
the application of reinforcement learning approaches to supply chain management
the performance analysis of converting algorithms to source code using natural language processing in Java
presents an alternate approach to solving differential equations utilizing artificial neural networks with optimization techniques
a comparative study of different techniques of text-to-SQL query conversion
the classification of livestock diseases using ML algorithms
ML in image enhancement techniques
the efficient leader selection for inter-cluster flying ad-hoc networks
a comprehensive survey of applications powered by GPT-3 and DALL-E
recommender systems' domain of application
reviews mood detection, emoji generation, and classification using tokenization and CNN
variations of the exam scheduling problem using graph coloring
the intersection of software engineering and machine learning applications
explores ML strategies for indeterminate information systems in complex bipolar neutrosophic environments
ML applications in healthcare, in battery management systems, and the rise of AI-generated news videos
how to enhance resource management in precision farming through AI-based irrigation optimization.

Audience

The book will be extremely useful to professionals, post-graduate research scholars, policymakers, corporate managers, and anyone with technical interests looking to understand how machine learning and artificial intelligence can benefit their work.

More details

Other editions

Persons

Content

Preface xvii

Part 1: Natural Language Processing (NLP) Applications 1

1 A Comprehensive Analysis of Various Tokenization Techniques and Sequence-to-Sequence Model in Natural Language Processing 3
Kuldeep Vayadande, Ashutosh M. Kulkarni, Gitanjali Bhimrao Yadav, R. Kumar and Aparna R. Sawant

2 A Review on Text Analysis Using NLP 13
Kuldeep Vayadande, Preeti A. Bailke, Lokesh Sheshrao Khedekar, R. Kumar and Varsha R. Dange

3 Text Generation & Classification in NLP: A Review 25
Kuldeep Vayadande, Dattatray Raghunath Kale, Jagannath Nalavade, R. Kumar and Hanmant D. Magar

4 Book Genre Prediction Using NLP: A Review 37
Kuldeep Vayadande, Preeti Bailke, Ashutosh M. Kulkarni, R. Kumar and Ajit B. Patil

5 Mood Detection Using Tokenization: A Review 47
Kuldeep Vayadande, Preeti A. Bailke, Lokesh Sheshrao Khedekar, R. Kumar and Varsha R. Dange

6 Converting Pseudo Code to Code: A Review 57
Kuldeep Vayadande, Preeti A. Bailke, Anita Bapu Dombale, Varsha R. Dange and Ashutosh M. Kulkarni

Part 2: Machine Learning Applications in Specific Domains 69

7 Evaluating the Readability of English Language Using Machine Learning Models 71
Shiplu Das, Abhishikta Bhattacharjee, Gargi Chakraborty and Debarun Joardar

8 Machine Learning in Maximizing Cotton Yield with Special Reference to Fertilizer Selection 89
G. Hannah Grace and Nivetha Martin

9 Machine Learning Approaches to Catalysis 101
Sachidananda Nayak and Selvakumar Karuthapandi

10 Classification of Livestock Diseases Using Machine Learning Algorithms 127
G. Hannah Grace, Nivetha Martin, I. Pradeepa and N. Angel

11 Image Enhancement Techniques to Modify an Image with Machine Learning Application 139
Shiplu Das, Sohini Sen, Debarun Joardar and Gargi Chakraborty

12 Software Engineering in Machine Learning Applications: A Comprehensive Study 159
Kuldeep Vayadande, Komal Sunil Munde, Amol A. Bhosle, Aparna R. Sawant and Ashutosh M. Kulkarni

13 Machine Learning Applications in Battery Management System 173
Ponnaganti Chandana and Ameet Chavan

14 ML Applications in Healthcare 201
Farooq Shaik, Rajesh Yelchurri, Noman Aasif Gudur and Jatindra Kumar Dash

15 Enhancing Resource Management in Precision Farming through AI-Based Irrigation Optimization 221
Salina Adinarayana, Matha Govinda Raju, Durga Prasad Srirangam, Devee Siva Prasad, Munaganuri Ravi Kumar and Sai babu veesam

16 An In-Depth Review on Machine Learning Infusion in an Agricultural Production System 253
Sarthak Dash, Sugyanta Priyadarshini and Sukanya Priyadarshini

Part 3: Artificial Intelligence and Optimization Techniques 271

17 Reinforcement Learning Approach in Supply Chain Management: A Review 273
Rajkanwar Singh, Pratik Mandal and Sukanta Nayak

18 Alternate Approach to Solve Differential Equations Using Artificial Neural Network with Optimization Technique 303
Ramanan R., Sukanta Nayak and Arun Kumar Gupta

19 GPT-3- and DALL-E-Powered Applications: A Complete Survey 329
Kuldeep Vayadande, Chaitanya B. Pednekar, Priya Anup Khune, Vinay Sudhir Prabhavalkar and Varsha R. Dange

20 New Variation of Exam Scheduling Problem Using Graph Coloring 343
Angshu Kumar Sinha, Soumyadip Laha, Debarghya Adhikari, Anjan Koner and Neha Deora

Part 4: Emerging Topics in Machine Learning 353

21 A Comparative Study of Different Techniques of Text-to-SQL Query Converter 355
Kuldeep Vayadande, Preeti A. Bailke, Vikas Janu Nandeshwar, R. Kumar and Varsha R. Dange

22 Trust-Based Leader Election in Flying Ad-Hoc Network 367
Joydeep Kundu, Sahabul Alam and Sukanta Oraw

23 A Survey on Domain of Application of Recommender System 375
Sudipto Dhar

24 New Approach on M/M/c/K Queueing Models via Single Valued Linguistic Neutrosophic Numbers and Perceptionization Using a Non-Linear Programming Technique 383
Antony Crispin Sweety C. and Vennila B.

25 The Rise of AI-Generated News Videos: A Detailed Review 423
Kuldeep Vayadande, Mustansir Bohri, Mohit Chawala, Ashutosh M. Kulkarni and Asif Mursal

References 449

Index 453

1
A Comprehensive Analysis of Various Tokenization Techniques and Sequence-to-Sequence Model in Natural Language Processing

Kuldeep Vayadande1*, Ashutosh M. Kulkarni1, Gitanjali Bhimrao Yadav1, R. Kumar2 and Aparna R. Sawant1

1Vishwakarma Institute of Technology, Pune, India

2VIT-AP University, Inavolu, Beside AP Secretariat, Amaravati AP, India

Abstract

This research paper provides an in-depth examination of various tokenization techniques and Sequence-to-Sequence (Seq2Seq) models, with an emphasis on the LSTM, Transformer, and Attention-based LSTM models. The process of tokenization, which breaks down text into smaller units, plays a vital role in natural language processing (NLP). This study evaluates different tokenization methods, including word-based, character-based, and sub-word-based methods. It also explores the latest advancements in Seq2Seq models, such as the LSTM, Transformer, and Attentionbased LSTM models, which have been successful in tasks like machine translation, text summarization, and dialog systems. The paper compares the performance of different tokenization techniques and Seq2Seq models on benchmark datasets. Additionally, it highlights the strengths and limitations of these models, which helps in understanding their suitability for various NLP applications. The aim of this study is to comprehensively understand the current advancements in tokenization and sequence-to-sequence modeling for NLP, particularly with regard to LSTM, Transformer, and Attention-based LSTM models.

Keywords: RNN, CRNN, LSTM, bidirectional-LSTM, text augmentation, tokenization, attentionbased LSTM

1.1 Introduction

Tokenization is a fundamental step in natural language processing (NLP) that entails breaking down text into smaller units, such as words or characters. This process is critical for many NLP tasks, including text classification, machine translation, and text summarization. Different levels of granularity, such as word-level, character-level, and sub-word-level, can be used for tokenization.

In recent years, various tokenization techniques have been proposed, each with their unique advantages and disadvantages. The Multi-head Self-attention Mechanism in [1] is a type of attention mechanism that allows the model to concentrate on multiple parts of the input text simultaneously. Tokenization is the most straightforward approach, and it is widely used in many NLP tasks. However, it may not be as effective for languages with complex morphological structures, such as agglutinative languages. On the other hand, character-level tokenization can handle such languages better, but it may also introduce more noise into the data. Sub-word-level tokenization, such as byte-pair encoding (BPE) and unigram language modeling (ULM), has been proposed as a compromise between wordlevel and character-level tokenization.

The goal of this study is to provide a thorough understanding of the tokenization methods that have been introduced recently. The research will evaluate different tokenization methods, including word-based, character-based, and sub-word-based methods, and compare their performance on a set of benchmark datasets. Furthermore, the research will delve into the details of these techniques, their working principle, and their performance on various NLP tasks. Additionally, the research will also analyze the advantages and limitations of these techniques, which will assist in understanding their suitability for different types of NLP applications. The objective of this research is to gain a complete insight into the latest developments in tokenization for NLP. This research will be a valuable resource for researchers and practitioners in the field of NLP, supplying students with a thorough comprehension of the most advanced tokenization algorithms available at the time. The popularity of Sequence-to-Sequence (Seq2Seq) models in NLP has grown in recent times because of their capability to process input and output sequences of varying lengths. Seq2Seq models, also known as encoder-decoder models, have produced noteworthy outcomes in a number of NLP applications, including dialogue systems, machine translation, and text summarization. Different Seq2Seq models have been put forth through time, and each has merits and faults of its own. One such model is the Long Short-Term Memory (LSTM), a popular Seq2Seq model that has shown promise in a variety of NLP applications. Its limitation is that it is computationally expensive. An alternative to the LSTM model that has proven to be more effective is the Transformer model, which is built on the attention mechanism. Attention-based LSTM models are also proposed, which combine the advantages of LSTM and attention mechanisms. With an emphasis on the LSTM, Transformer, and Attentionbased LSTM models, this survey seeks to provide a thorough understanding of the many Seq2Seq models that have been suggested in recent years. We will cover the details of these models, their working principle, and their performance on various NLP tasks. Furthermore, we also cover the advantages, limitations, and performance comparison of these models, which helps in understanding their suitability for different types of NLP applications.

Also, some non-tokenization technique as mentioned in [4] is focused on pre-training an efficient encoder that can operate without tokenization.

1.2 Literature Survey

The Multi-head Self-attention Mechanism [1] is a type of attention mechanism that enables the model to concentrate on multiple sections of the input text simultaneously. This is achieved by utilizing multiple "heads" to attend to different segments of the input. This can aid the model's understanding of the context and relationships between different parts of the input text, resulting in more accurate and coherent summaries.

The pointer network is a type of encoder-decoder [2] model that uses an attention mechanism to point to the part of the input text that should be included in the summary. It helps the model to generate the summary by copying relevant words from the input text, rather than generating them from scratch.

In summary, the research paper [3] focuses on improving the language generation performance by calibrating the likelihood of the generated sequences, while the research paper [4] focuses on Long Document Summarization and uses a combination of top-down and bottom-up inference to extract high-level concepts and specific details from the input text. Both papers provide different approaches to improve the performance of NLP tasks.

CANINE [4] is focused on pre-training an efficient encoder that can operate without tokenization, making the training process faster and more scalable. It uses a simple linear-layer-based architecture and employs a binary masking strategy to hide specific words during training, in order to predict them during inference.

FNet [5], on the other hand, introduces a new method of mixing tokens with Fourier transforms to capture long-range dependencies. This method can produce highly expressive representations and has the advantage of being computationally efficient. The paper shows that the approach outperforms traditional pre-training methods on various NLP tasks.

Charformer [6] focuses on the tokenization stage of pre-processing and proposes a novel method of sub-word tokenization that utilizes gradient information to identify the best sub-word splits. The authors show that their method is fast and results in improved performance on several NLP tasks.

The paper [7] proposes a new pre-training method for language models that leverages retrieval-based techniques. The authors show that this approach can effectively pre-train models on large-scale text corpora, leading to improved performance on a variety of NLP benchmarks.

The paper [8] focuses on enhancing the training efficiency of large-scale transformers, which are frequently employed in NLP applications like language modeling, text classification, and machine translation. The authors propose a new training method, "Random-LTD," which involves randomly dropping tokens and layers during the training process to speed up convergence and reduce memory requirements. The authors show that this method can effectively train large-scale transformers with improved efficiency.

Detecting Label Errors in Token Classification Data [9] focuses on a different challenge in NLP, which is detecting label errors in token classification data. The authors propose a method for detecting label errors in token classification datasets, which can negatively impact the performance of NLP models. The authors present experiments demonstrating the effectiveness of their method in detecting label errors in real-world datasets.

1.3 Sequence-to-Sequence Models

1.3.1 Convolutional Seq2Seq Models

Convolutional Seq2Seq (ConvSeq2Seq) models [10] are a variant of Seq2Seq models that incorporate (CNNs) into model architecture. ConvSeq2Seq models are particularly useful for processing sequences of data with a grid-like structure, such as image sequences or spectrograms.

Compared to traditional Seq2Seq models that use recurrent neural networks (RNNs), ConvSeq2Seq models have the advantage of being able to process sequences in parallel, which can lead to faster training and inference times. However, they may be less effective at capturing long-range dependencies in the data compared to RNN-based models.

1.3.2 Pointer Generator Model

Pointer Generator [11] models are...

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

How Machine Learning is Innovating Today's World

Description

More details

Other editions

Additional editions

Persons

Content

1
A Comprehensive Analysis of Various Tokenization Techniques and Sequence-to-Sequence Model in Natural Language Processing

Abstract

1.1 Introduction

1.2 Literature Survey

1.3 Sequence-to-Sequence Models

1.3.1 Convolutional Seq2Seq Models

1.3.2 Pointer Generator Model

System requirements