
Data-Driven Modeling
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Equip yourself with the essentials of informed decision-making with this practical guide to mastering data-driven modeling and extracting actionable, meaningful patterns from the vast sea of modern data.
In an era defined by data, the ability to transform raw information into actionable insights is a skill set that transcends industries and disciplines. This book is a comprehensive guide designed to unravel the intricacies of extracting meaningful patterns from the vast sea of data that surrounds us. It explores the significance of data-driven modelling, comparing it to traditional approaches and setting the stage for understanding the transformative power and diverse applications of data-driven techniques. This comprehensive resource empowers readers to leverage data for informed decision-making. Whether you are a novice looking to grasp the fundamentals or an experienced professional seeking advanced techniques, this book serves as a practical guide through the dynamic landscape of data-driven modelling. Through clear explanations, hands-on examples, and real-world applications, readers will gain the skills needed to navigate the complexities of modern data analysis.
More details
Other editions
Additional editions

Persons
Arindam Mondal, PhD is a Professor at Dr. B.C. Roy Engineering College with more than 20 years of experience. He has published more than 35 papers for scientific and technical journals and conferences. He has 18 patents to his credit and has won several awards for his scholarship.
Souvik Ganguli, PhD is an Assistant Professor at the Thapar Institute of Engineering and Technology with more than 19 years of teaching experience. He has published more than 50 papers in leading journals, conferences, and book chapters. He has 15 granted patents to his credit and has won several awards for his scholarly activities.
Content
1
Fundamentals of Data Analysis and Preprocessing
Sudipta Hazra1* and Arindam Mondal2
1MCKV Institute of Engineering, Howrah, West Bengal, India
2Dr. B. C. Roy Engineering College, Durgapur, West Bengal, India
Abstract
The general structure for data curation was proposed in this chapter. It covers the many stages of preparation and preprocessing for data. Many other datasets can be fitted by the overall framework that is described. Raw data that have not been cleaned and curated are typically unsuitable for drawing accurate conclusions. Within the topic of data curation and preparation, the most widely used algorithms and strategies are covered in detail in this chapter. The methodology for data curation, imputation, feature extraction, correlation analysis, and realworld implementation of these algorithms is covered in this chapter's framework. We also offered methods that we have created based on our data processing skills. Lastly, we demonstrated with a real-world example how applying various imputation techniques affects support vector machine efficiency and performance. The chapter outlines a process for taking unstructured, unprocessed data and turning it into well-organized data that may be used with sophisticated machine learning algorithms or other advanced data analysis techniques.
Keywords: Data analysis, data preprocessing, data mining
1.1 Introduction
Research in a wide range of disciplines, including science, engineering, management, and process control, starts with data analysis (DA). Symbolic and numeric attributes are used to collect data about a certain topic. These data come from a variety of sources, including sensors and people, all with varying levels of complexity and dependability. A deeper comprehension of the relevant phenomenon results from the analysis of these data. Therefore, the primary goal of any DA is to find information that may be applied to decision-making or problem-solving [1]. Problems with the data, though, might make this impossible. Most of the time, data errors are not detected until the DA phase appears. For instance, DA is carried out in the creation of knowledge-based systems in order to find and produce new facts for assembling a trustworthy and extensive facts base. Therefore, the data determine the dependability of the knowledge base's section created using DA techniques such as induction.
Numerous efforts are being fabricated to either develop an analysis tool or use commercially available solutions for DA. A few initiatives have disregarded the reality that real-world data often have issues and that preparing the data in some way is typically necessary before doing an effective analysis of the data. This means that data preparation features should be available in research or commercial tools so that they can be utilized either before to or throughout the real DA procedure. Data preparation may have multiple goals. Apart from addressing data issues such as tainted data and absent or irrelevant attributes in datasets, there is a chance that someone would also like to know more about the type of data or alter its structure (such as granularity levels) to make it more suitable for a more effective DA.
Authors draw a similar parallel between human information processing and data preprocessing (DP): "Think about the information processing mechanism used by humans" [2]. Through the sense organs, sensory signals are received and processed. The initial phases of analysis are carried out by low level computing structures, after which the data are sent to other processing structures. Events or concepts can be used to drive the processing system. Whereas conceptually driven processing is usually reverse of bottom-up, propelled by objectives, motives, and appropriate data into prospect, processing based on events usually works from the bottom up, looking for structures in which to integrate the input.
Numerous justifications have been offered for the function of and necessity for DP. When it comes to modeling, the desired information can be combined with variations in the data that result from modifications in progression or system circumstances, furthermore in the gathering and transfer of data. These impacts can be avoided in advance with appropriate DP, leading to more frugal models. Although they might not be more accurate predictors, these models should be more reliable [3]. Therefore, fewer phenomena would need to be represented as a result of data preparation, but estimating errors might potentially contribute to an increase in variance. When it comes to learning, data pretreatment would enable users to choose which ideas to learn, how to show the DA results, and how to represent the data in a way that makes them easier to understand and use in the real world.
Preprocessing is typically the primary action performed on any batch of data. DP takes a lot of time and is frequently semiautomated. Effective methods for automatic DP are crucial because the volume of data generated by contemporary process supervision and data collecting systems is increasing and necessitating greater data processing [4]. In this study, we want to address common data-related issues, strategies for resolving them, and the advantages of using these approaches for data pretreatment.
1.2 Data Preprocessing
DP is the act of merely transforming raw data into a form that can be understood. Real-world data are not always complete, consistent, noisy, or redundant. Data preparation comprises several steps that help to organize raw data into a logical format. The Figure 1.1 below illustrates the many stages of data preparation.
Figure 1.1 Steps in DP.
Data cleaning: Data cleansing is the process of locating incorrect and corrupt records in a record set or database table. Finding inconsistent, erroneous, incomplete, and irrelevant data is the main goal of the cleaning process, after which techniques are applied to either update or eliminate it.
Data integration: The primary objectives of data integration are to unify data from many sources and show it in a coherent way. All disputes resulting from merging data with different representations are resolved. This process is crucial because of its many industrial and scientific applications. The importance of data integration increases with the amount and exponential growth of data.
Data transformation: In order to change raw data into a form that can be understood, data transformation is essential. It is made up of generalization, aggregation, and data normalization. Data normalization facilitates the organization of a database's tables and columns to reduce redundancy. This reduces the complexity and processing time. For a quicker overview, data aggregation aids in the creation of a concise summary. Another name for the process of generalizing data is rolling up data. It facilitates the generalization of data and builds assessment databases with various levels of summary.
Data reduction: The practice of organizing and simplifying digital information is known as data reduction. Most of the time, empirical and experimental methods are used to obtain these data. It entails breaking up massive volumes of data into more manageable and insightful pieces.
Data discretization: In situations where you want to classify a lot of numerical data using only nominal values, the idea of data discretization is crucial. The continuous data in this case are divided into discrete forms, and the nominal value is defined as the values of these discrete sets. In essence, it is the process of transferring continuous data properties with the least amount of information loss into a defined collection of intervals.
Everything done before the real DA process begins is known as data preparation. In essence, it is a transformation T that creates a set of new data vectors Yij from the raw real-world data vectors Xik.
(1.1)such that: (i) Yij preserves the "valuable information" in Xik, (ii) Yij eliminates at least one of the problems in Xik and (iii) Yij is more useful than Xik.
In the above relation:
- i = 1, . n where n = number of objects,
- j = 1, . m where m = number of features after preprocessing,
- k = 1, . l where l = number of attributes/features before preprocessing, and in general, m ? 1.
Finding and presenting important facts, such as meaningful patterns in the data, are the aims of DA. Valuable information is knowledge that exists in the data. Four characteristics are defined for meaningful information [5]. These are legitimate, unique, possibly helpful, and finally intelligible. Data difficulties are circumstances that make it difficult to utilize any DA tool effectively or that could lead to outcomes that are not acceptable. Preprocessing data can be done for a number of reasons, including resolving issues with the data that might make it impossible to analyze it in any way, comprehending the character of the data and conducting a more insightful analysis, and deriving deeper insights from a particular set of data. The majority of applications requires more than one type of data preparation. Determining the kind of preprocessing for data is consequently an essential responsibility.
1.2.1 Issues with Data
The real-world data are seldom without issues. The easiest way to display these is in Figure 1.2, which is also covered below. Problems can vary greatly in nature and severity for a...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.