Mathematics and Computer Science, Volume 2

Name: Mathematics and Computer Science, Volume 2
Brand: Wiley
Price: 173.99 EUR
Availability: OnlineOnly

Sharmistha Ghosh M. Niranjanamurthy Krishanu Deyasi Biswadip Basu Mallik Santanu Das(Herausgeber*in)

Wiley (Verlag)

1. Auflage

Erschienen am 13. Juli 2023

432 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-89669-2 (ISBN)

173,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Sharmistha Ghosh, PhD, is a professor at the Institute of Engineering and Management, India. She received her doctorate in mathematics from the Indian Institute of Technology, India. Her major field of study includes fuzzy and vague databases as well as computational fluid dynamics, and she has published several papers in scientific journals. She is also the editor of several scientific journals and works as a reviewer of journals as well as doctoral theses in India as well abroad.

M. Niranjanamurthy, PhD, is an assistant professor in the Department of Computer Applications, M S Ramaiah Institute of Technology, Bangalore, Karnataka. He earned his PhD in computer science at JJTU, Rajasthan, India. He has over 11 years of teaching experience and two years of industry experience as a software engineer. He has published several books, and he is working on numerous books for Scrivener Publishing. He has published over 60 papers for scholarly journals and conferences, and he is working as a reviewer in 22 scientific journals. He also has numerous awards to his credit.

Krishanu Deyasi is an associate professor in the Department of Basic Sciences and Humanities at the Institute of Engineering and Management, India. He earned his PhD from the Indian Institute of Science Education and Research, and he has postdoctoral experience from The Institute of Mathematical Sciences, India. He has written three books and has published papers in scientific journals. He is also an editor for several scientific journals.

Biswadip Basu Mallik is a senior assistant professor of mathematics in the Department of Basic Sciences and Humanities at the Institute of Engineering and Management, India. He has been involved in teaching and research for more than 21 years and has published several research papers in various scientific journals along with book chapters with various publishers. He has authored five books and has five patents to his credit. He is a managing editor of the Journal of Mathematical Sciences & Computational Mathematics and is an editorial board member and reviewer for several scientific journals.

Santanu Das is as an assistant professor in the Department of Basic Sciences and Humanities at the Institute of Engineering and Management, India. He completed his undergraduate degree from and is pursuing his doctorate from Jadavpur University.

Inhalt

1
A Comprehensive Review on Text Classification and Text Mining Techniques Using Spam Dataset Detection

Tamannas Siddiqui and Abdullah Yahya Abdullah Amer?

Department of Computer Science, Aligarh Muslim University, Aligarh, UP, India

Abstract

Text data mining techniques are an essential tool for dealing with raw text data (future fortune). The Text data mining process of securing exceptional knowledge and information from the unstructured text is a fundamental principle of Text data mining to facilitate relevant insights by analyzing a huge volume of raw data in association with Artificial Intelligence natural language processing NLP Machine Learning algorithms. The salient features of text data mining are attracted by the contemporary business applications to have their extraordinary benefits in global area operations. In this, a brief review of text mining techniques, such as clustering, information extraction, text preprocessing, information retrieval, text classification, and text mining applications, that demonstrate the significance of text mining, the predominant text mining techniques, and the predominant contemporary applications that are using text mining. This review includes various existing algorithms, text feature extractions, compression methods, and evaluation techniques. Finally, we used a spam dataset for classification detection data and a three classifier algorithm with TF-IDF feature extraction and through that model achieved higher accuracy with Naïve Bayes. Illustrations of text classification as an application in areas such as medicine, law, education, etc., are also presented.

Keywords: Text mining, text classification, spam detection, text preprocessing, text analysis

1.1 Introduction

Text data mining techniques are predominantly used for extracting relevant and associated patterns based on specific words or sets of phrases. Text data mining is associated with text clustering, text classification, and the product of granular taxonomy, sentiment analysis, entity relation modeling, and document summarization [1]. Prominent techniques in text mining techniques include extraction, summarization, categorization, retrieval, and clustering. These techniques are used to infer distinguished, quality knowledge from text from previously unknown information and different written resources obtained from books, emails, reviews, emails, and articles with the help of information retrieval, linguistic analysis, pattern recognition, information extraction, or information extraction tagging and annotation [2]. Text preprocessing is the predominant functionality in text data mining. Text preprocessing is essential to bring the text into a form that can be predictable and analyzable for text data mining. Text preprocessing can be done in different phases to formulate the text into predictable and analyzable forms. These are namely lowercasing, lemmatization, stemming, stop word removal, and tokenization. These important text preprocess steps are predominantly performed by machine learning algorithms for natural language processing tasks. These preprocessing steps implement data cleaning and transformation to eliminate outliers and make it standardized to create a suitable model to incorporate the text data mining process [3]. Text data mining techniques are predominantly used for records management, distinct document searches, e-discovery, organizing a large set of a text data, analysis and monitoring of understandable online text in internet communication and blogs, identification of large textual datasets associated with patients during a clinical area, and clarification of knowledge for the readers with more extraordinary search experience [4]. Text data mining techniques are predominantly used in scientific literature mining, business, biomedical, and security applications, computational sociology, and digital humanities as shown in Figure 1.1 below.

Figure 1.1 Overview of text classification.

Table 1.1 Text classification compared model classifiers.

Model classifiers Authors Architecture Features extraction Corpus SVM and KNN C. W. Lee et al. [7] Gravity Inverse Moment Similarity TF-IDF vectorizer Wikipedia Logistic Regression L. Kumar et al. [13] Bayesian Logistic Regression TF-IDF RCV1-v2 Naive Bayes NB A. Swapna et al. [9] Weight Enhancing Method Weights words Reuters-31678 SVM T. Singh et al. [11] String Subsequence Kernel TF-IDF vectorizer 20 Newsgroups

The paper reviews text data mining techniques, various steps involved in text preprocessing, and multiple applications that implement text data mining methods discussed in Table 1.1.

1.2 Text Mining Techniques

Text Mining (TM) indicates informational content involved in several sources like newspapers, books, social media posts, email, and URLs. Text data summary and classification are typical applications of text mining, particularly among different fields. It is appropriate to discuss some of the techniques applied to achieve them through the step set shown in Figure 1.2 below.

1.2.1 Data Mining

Text mining is empowered in big data analytics to analyze unstructured textual data to extract new knowledge and distinguish significant patterns and correlations hidden in the huge amount of data sets. Big data analytics are predominantly used for extracting the information and patterns that are hidden implicitly in the data sets in the form of automatic or semi-automatic unstructured formats or natural language texts. To perform this test, mining operations, unsupervised learning algorithms, and supervised learning algorithms or methods are predominantly used. These methods' functionality is used for classification and prediction by using a set of predictors to reveal hidden structures in the information database [5]. In this process, text mining is performed using pattern matching on regular documents and unstructured manuscripts [6].

Figure 1.2 Text data mining techniques.

1.2.2 Information Retrieval

Information Retrieval [IR] is a prominent method in text data mining techniques. The fundamental principle of IR is identifying documents stored in the database in unstructured formats, which meets the requirements of the information needed from the large collection of documents stored in the datasets. IR is available in three models: Boolean Model, Vector Model, and Probabilistic Model. In text data mining techniques, IR plays a vital role with the indexing system and collection of documents [7]. This method is predominantly used for locating a specific item in natural language documents. IR is used for learned knowledge extraction to convert text within structured data for interesting mining relationships [8]. It has been identified as a big issue to discover the appropriate designs and analyze text records from huge amounts of data. Text data mining technique IR has resolved the issue and successfully selected attractive patterns from the greatest knowledge data sets. IR techniques are predominantly used for choosing the appropriate text documents from the huge volume of databases with enhanced speed within a short period. The text data mining technique IR extracts the exact required text documents from the greatest databases and presents the accuracy and relevance of results [9].

1.2.3 Natural Language Processing (NLP)

NLP linguistics is subfield of computer science and AI. The fundamental principle of NLP is to deal with the connection between computer machines and humans with an assistant of NLP to read, interpret, learn, and make sense of languages spoken by humans in a valuable way. It is powered by AI, which can facilitate the machines to read, understand, interpret, manipulate, and derive meaning from human languages [10]. It is a prominent AI technology used in text data mining to transform the unstructured text depicted in documents and databases into normalized, structured data suitable for performing analysis or implementing machine learning algorithms [11]. Long Short-Term Memory [LSTM] is one of the predominant AI Machine Learning algorithms to remember values with a recurrent neural network's help. Seq2seq model is another predominant model used in the NLP technique, which works with encoder-decoder structure. In this model, it initially built the vocabulary list to identify the correct grammar syntax. It works with some tags to identify the structured and unstructured language identified in the documents. The named entity recognition model is another predominant model to identify relevant names and classify names by their entity. It is used to find the names of people, names of places, and any other important entity in the given dataset in text or documents. The NLP process features a Preferences' Graph [12]. It is utilized to build a set of user preferences. While the document is written, the repetitively chosen tense, adjectives, conjunctions, and prepositions are identified and NLP creates a User Preference Graph. Based on the graph, it...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Mathematics and Computer Science, Volume 2

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 A Comprehensive Review on Text Classification and Text Mining Techniques Using Spam Dataset Detection

Abstract

1.1 Introduction

1.2 Text Mining Techniques

1.2.1 Data Mining

1.2.2 Information Retrieval

1.2.3 Natural Language Processing (NLP)

Systemvoraussetzungen

1
A Comprehensive Review on Text Classification and Text Mining Techniques Using Spam Dataset Detection