
How Machine Learning is Innovating Today's World
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Provides a comprehensive understanding of the latest advancements and practical applications of machine learning techniques.
Machine learning (ML), a branch of artificial intelligence, has gained tremendous momentum in recent years, revolutionizing the way we analyze data, make predictions, and solve complex problems. As researchers and practitioners in the field, the editors of this book recognize the importance of disseminating knowledge and fostering collaboration to further advance this dynamic discipline. How Machine Learning is Innovating Today's World is a timely book and presents a diverse collection of 25 chapters that delve into the remarkable ways that ML is transforming various fields and industries.
It provides a comprehensive understanding of the practical applications of ML techniques. The wide range of topics include:
- An analysis of various tokenization techniques and the sequence-to-sequence model in natural language processing
- explores the evaluation of English language readability using ML models
- a detailed study of text analysis for information retrieval through natural language processing
- the application of reinforcement learning approaches to supply chain management
- the performance analysis of converting algorithms to source code using natural language processing in Java
- presents an alternate approach to solving differential equations utilizing artificial neural networks with optimization techniques
- a comparative study of different techniques of text-to-SQL query conversion
- the classification of livestock diseases using ML algorithms
- ML in image enhancement techniques
- the efficient leader selection for inter-cluster flying ad-hoc networks
- a comprehensive survey of applications powered by GPT-3 and DALL-E
- recommender systems' domain of application
- reviews mood detection, emoji generation, and classification using tokenization and CNN
- variations of the exam scheduling problem using graph coloring
- the intersection of software engineering and machine learning applications
- explores ML strategies for indeterminate information systems in complex bipolar neutrosophic environments
- ML applications in healthcare, in battery management systems, and the rise of AI-generated news videos
- how to enhance resource management in precision farming through AI-based irrigation optimization.
Audience
The book will be extremely useful to professionals, post-graduate research scholars, policymakers, corporate managers, and anyone with technical interests looking to understand how machine learning and artificial intelligence can benefit their work.
More details
Other editions
Additional editions

Persons
Arindam Dey, PhD, is an associate professor at the School of Computer Science, VIT-AP University, India. He has published more than 50 research articles in national and international peer-reviewed journals. Dr. Dey has 14 years of teaching and research experience in the areas of optimization and genetic algorithms.
Sukanta Nayak, PhD, is an assistant professor in the Department of Mathematics, School of Advanced Sciences (SAS) at VIT-AP University, Amaravati, Andhra Pradesh, India. He completed his doctoral research at NIT Rourkela, has authored three books, and published numerous research articles in international journals.
Ranjan Kumar, PhD, is an assistant professor in the Department of Mathematics, School of Advanced Sciences (SAS) at VIT-AP University, Amaravati, Andhra Pradesh, India. He has numerous peer-reviewed research articles to his name and is the recipient of numerous awards and titles including an Honorary Professorship from Cypress International Institute University, Texas, USA.
Sachi Nandan Mohanty, PhD, is in the School of Computer Science and Engineering (SCOPE) at VIT-AP University, Amaravati, Andhra, Pradesh, India. He has edited 25 books and published 60 international journals of international repute. His research areas include data mining, big data analysis, cognitive science, fuzzy decision-making, brain-computer interface, cognition, and computational intelligence. In 2015, he was awarded the first prize of the Best Thesis Award by the Computer Society of India.
Content
Preface xvii
Part 1: Natural Language Processing (NLP) Applications 1
1 A Comprehensive Analysis of Various Tokenization Techniques and Sequence-to-Sequence Model in Natural Language Processing 3
Kuldeep Vayadande, Ashutosh M. Kulkarni, Gitanjali Bhimrao Yadav, R. Kumar and Aparna R. Sawant
2 A Review on Text Analysis Using NLP 13
Kuldeep Vayadande, Preeti A. Bailke, Lokesh Sheshrao Khedekar, R. Kumar and Varsha R. Dange
3 Text Generation & Classification in NLP: A Review 25
Kuldeep Vayadande, Dattatray Raghunath Kale, Jagannath Nalavade, R. Kumar and Hanmant D. Magar
4 Book Genre Prediction Using NLP: A Review 37
Kuldeep Vayadande, Preeti Bailke, Ashutosh M. Kulkarni, R. Kumar and Ajit B. Patil
5 Mood Detection Using Tokenization: A Review 47
Kuldeep Vayadande, Preeti A. Bailke, Lokesh Sheshrao Khedekar, R. Kumar and Varsha R. Dange
6 Converting Pseudo Code to Code: A Review 57
Kuldeep Vayadande, Preeti A. Bailke, Anita Bapu Dombale, Varsha R. Dange and Ashutosh M. Kulkarni
Part 2: Machine Learning Applications in Specific Domains 69
7 Evaluating the Readability of English Language Using Machine Learning Models 71
Shiplu Das, Abhishikta Bhattacharjee, Gargi Chakraborty and Debarun Joardar
8 Machine Learning in Maximizing Cotton Yield with Special Reference to Fertilizer Selection 89
G. Hannah Grace and Nivetha Martin
9 Machine Learning Approaches to Catalysis 101
Sachidananda Nayak and Selvakumar Karuthapandi
10 Classification of Livestock Diseases Using Machine Learning Algorithms 127
G. Hannah Grace, Nivetha Martin, I. Pradeepa and N. Angel
11 Image Enhancement Techniques to Modify an Image with Machine Learning Application 139
Shiplu Das, Sohini Sen, Debarun Joardar and Gargi Chakraborty
12 Software Engineering in Machine Learning Applications: A Comprehensive Study 159
Kuldeep Vayadande, Komal Sunil Munde, Amol A. Bhosle, Aparna R. Sawant and Ashutosh M. Kulkarni
13 Machine Learning Applications in Battery Management System 173
Ponnaganti Chandana and Ameet Chavan
14 ML Applications in Healthcare 201
Farooq Shaik, Rajesh Yelchurri, Noman Aasif Gudur and Jatindra Kumar Dash
15 Enhancing Resource Management in Precision Farming through AI-Based Irrigation Optimization 221
Salina Adinarayana, Matha Govinda Raju, Durga Prasad Srirangam, Devee Siva Prasad, Munaganuri Ravi Kumar and Sai babu veesam
16 An In-Depth Review on Machine Learning Infusion in an Agricultural Production System 253
Sarthak Dash, Sugyanta Priyadarshini and Sukanya Priyadarshini
Part 3: Artificial Intelligence and Optimization Techniques 271
17 Reinforcement Learning Approach in Supply Chain Management: A Review 273
Rajkanwar Singh, Pratik Mandal and Sukanta Nayak
18 Alternate Approach to Solve Differential Equations Using Artificial Neural Network with Optimization Technique 303
Ramanan R., Sukanta Nayak and Arun Kumar Gupta
19 GPT-3- and DALL-E-Powered Applications: A Complete Survey 329
Kuldeep Vayadande, Chaitanya B. Pednekar, Priya Anup Khune, Vinay Sudhir Prabhavalkar and Varsha R. Dange
20 New Variation of Exam Scheduling Problem Using Graph Coloring 343
Angshu Kumar Sinha, Soumyadip Laha, Debarghya Adhikari, Anjan Koner and Neha Deora
Part 4: Emerging Topics in Machine Learning 353
21 A Comparative Study of Different Techniques of Text-to-SQL Query Converter 355
Kuldeep Vayadande, Preeti A. Bailke, Vikas Janu Nandeshwar, R. Kumar and Varsha R. Dange
22 Trust-Based Leader Election in Flying Ad-Hoc Network 367
Joydeep Kundu, Sahabul Alam and Sukanta Oraw
23 A Survey on Domain of Application of Recommender System 375
Sudipto Dhar
24 New Approach on M/M/c/K Queueing Models via Single Valued Linguistic Neutrosophic Numbers and Perceptionization Using a Non-Linear Programming Technique 383
Antony Crispin Sweety C. and Vennila B.
25 The Rise of AI-Generated News Videos: A Detailed Review 423
Kuldeep Vayadande, Mustansir Bohri, Mohit Chawala, Ashutosh M. Kulkarni and Asif Mursal
References 449
Index 453
1
A Comprehensive Analysis of Various Tokenization Techniques and Sequence-to-Sequence Model in Natural Language Processing
Kuldeep Vayadande1*, Ashutosh M. Kulkarni1, Gitanjali Bhimrao Yadav1, R. Kumar2 and Aparna R. Sawant1
1Vishwakarma Institute of Technology, Pune, India
2VIT-AP University, Inavolu, Beside AP Secretariat, Amaravati AP, India
Abstract
This research paper provides an in-depth examination of various tokenization techniques and Sequence-to-Sequence (Seq2Seq) models, with an emphasis on the LSTM, Transformer, and Attention-based LSTM models. The process of tokenization, which breaks down text into smaller units, plays a vital role in natural language processing (NLP). This study evaluates different tokenization methods, including word-based, character-based, and sub-word-based methods. It also explores the latest advancements in Seq2Seq models, such as the LSTM, Transformer, and Attentionbased LSTM models, which have been successful in tasks like machine translation, text summarization, and dialog systems. The paper compares the performance of different tokenization techniques and Seq2Seq models on benchmark datasets. Additionally, it highlights the strengths and limitations of these models, which helps in understanding their suitability for various NLP applications. The aim of this study is to comprehensively understand the current advancements in tokenization and sequence-to-sequence modeling for NLP, particularly with regard to LSTM, Transformer, and Attention-based LSTM models.
Keywords: RNN, CRNN, LSTM, bidirectional-LSTM, text augmentation, tokenization, attentionbased LSTM
1.1 Introduction
Tokenization is a fundamental step in natural language processing (NLP) that entails breaking down text into smaller units, such as words or characters. This process is critical for many NLP tasks, including text classification, machine translation, and text summarization. Different levels of granularity, such as word-level, character-level, and sub-word-level, can be used for tokenization.
In recent years, various tokenization techniques have been proposed, each with their unique advantages and disadvantages. The Multi-head Self-attention Mechanism in [1] is a type of attention mechanism that allows the model to concentrate on multiple parts of the input text simultaneously. Tokenization is the most straightforward approach, and it is widely used in many NLP tasks. However, it may not be as effective for languages with complex morphological structures, such as agglutinative languages. On the other hand, character-level tokenization can handle such languages better, but it may also introduce more noise into the data. Sub-word-level tokenization, such as byte-pair encoding (BPE) and unigram language modeling (ULM), has been proposed as a compromise between wordlevel and character-level tokenization.
The goal of this study is to provide a thorough understanding of the tokenization methods that have been introduced recently. The research will evaluate different tokenization methods, including word-based, character-based, and sub-word-based methods, and compare their performance on a set of benchmark datasets. Furthermore, the research will delve into the details of these techniques, their working principle, and their performance on various NLP tasks. Additionally, the research will also analyze the advantages and limitations of these techniques, which will assist in understanding their suitability for different types of NLP applications. The objective of this research is to gain a complete insight into the latest developments in tokenization for NLP. This research will be a valuable resource for researchers and practitioners in the field of NLP, supplying students with a thorough comprehension of the most advanced tokenization algorithms available at the time. The popularity of Sequence-to-Sequence (Seq2Seq) models in NLP has grown in recent times because of their capability to process input and output sequences of varying lengths. Seq2Seq models, also known as encoder-decoder models, have produced noteworthy outcomes in a number of NLP applications, including dialogue systems, machine translation, and text summarization. Different Seq2Seq models have been put forth through time, and each has merits and faults of its own. One such model is the Long Short-Term Memory (LSTM), a popular Seq2Seq model that has shown promise in a variety of NLP applications. Its limitation is that it is computationally expensive. An alternative to the LSTM model that has proven to be more effective is the Transformer model, which is built on the attention mechanism. Attention-based LSTM models are also proposed, which combine the advantages of LSTM and attention mechanisms. With an emphasis on the LSTM, Transformer, and Attentionbased LSTM models, this survey seeks to provide a thorough understanding of the many Seq2Seq models that have been suggested in recent years. We will cover the details of these models, their working principle, and their performance on various NLP tasks. Furthermore, we also cover the advantages, limitations, and performance comparison of these models, which helps in understanding their suitability for different types of NLP applications.
Also, some non-tokenization technique as mentioned in [4] is focused on pre-training an efficient encoder that can operate without tokenization.
1.2 Literature Survey
The Multi-head Self-attention Mechanism [1] is a type of attention mechanism that enables the model to concentrate on multiple sections of the input text simultaneously. This is achieved by utilizing multiple "heads" to attend to different segments of the input. This can aid the model's understanding of the context and relationships between different parts of the input text, resulting in more accurate and coherent summaries.
The pointer network is a type of encoder-decoder [2] model that uses an attention mechanism to point to the part of the input text that should be included in the summary. It helps the model to generate the summary by copying relevant words from the input text, rather than generating them from scratch.
In summary, the research paper [3] focuses on improving the language generation performance by calibrating the likelihood of the generated sequences, while the research paper [4] focuses on Long Document Summarization and uses a combination of top-down and bottom-up inference to extract high-level concepts and specific details from the input text. Both papers provide different approaches to improve the performance of NLP tasks.
CANINE [4] is focused on pre-training an efficient encoder that can operate without tokenization, making the training process faster and more scalable. It uses a simple linear-layer-based architecture and employs a binary masking strategy to hide specific words during training, in order to predict them during inference.
FNet [5], on the other hand, introduces a new method of mixing tokens with Fourier transforms to capture long-range dependencies. This method can produce highly expressive representations and has the advantage of being computationally efficient. The paper shows that the approach outperforms traditional pre-training methods on various NLP tasks.
Charformer [6] focuses on the tokenization stage of pre-processing and proposes a novel method of sub-word tokenization that utilizes gradient information to identify the best sub-word splits. The authors show that their method is fast and results in improved performance on several NLP tasks.
The paper [7] proposes a new pre-training method for language models that leverages retrieval-based techniques. The authors show that this approach can effectively pre-train models on large-scale text corpora, leading to improved performance on a variety of NLP benchmarks.
The paper [8] focuses on enhancing the training efficiency of large-scale transformers, which are frequently employed in NLP applications like language modeling, text classification, and machine translation. The authors propose a new training method, "Random-LTD," which involves randomly dropping tokens and layers during the training process to speed up convergence and reduce memory requirements. The authors show that this method can effectively train large-scale transformers with improved efficiency.
Detecting Label Errors in Token Classification Data [9] focuses on a different challenge in NLP, which is detecting label errors in token classification data. The authors propose a method for detecting label errors in token classification datasets, which can negatively impact the performance of NLP models. The authors present experiments demonstrating the effectiveness of their method in detecting label errors in real-world datasets.
1.3 Sequence-to-Sequence Models
1.3.1 Convolutional Seq2Seq Models
Convolutional Seq2Seq (ConvSeq2Seq) models [10] are a variant of Seq2Seq models that incorporate (CNNs) into model architecture. ConvSeq2Seq models are particularly useful for processing sequences of data with a grid-like structure, such as image sequences or spectrograms.
Compared to traditional Seq2Seq models that use recurrent neural networks (RNNs), ConvSeq2Seq models have the advantage of being able to process sequences in parallel, which can lead to faster training and inference times. However, they may be less effective at capturing long-range dependencies in the data compared to RNN-based models.
1.3.2 Pointer Generator Model
Pointer Generator [11] models are...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.