Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Feel confident navigating the fundamentals of data science
Data Science Essentials For Dummies is a quick reference on the core concepts of the exploding and in-demand data science field, which involves data collection and working on dataset cleaning, processing, and visualization. This direct and accessible resource helps you brush up on key topics and is right to the point-eliminating review material, wordy explanations, and fluff-so you get what you need, fast.
Perfect for supplementing classroom learning, reviewing for a certification, or staying knowledgeable on the job, Data Science Essentials For Dummies is a reliable reference that's great to keep on hand as an everyday desk reference.
Lillian Pierson, PE, is the founder and fractional CMO at Data-Mania, as well as a globally recognized growth leader in technology. To date, she has helped educate approximately 2 million professionals on how to leverage AI, data strategy, and data science to drive business growth.
Introduction 1
About This Book 2
Foolish Assumptions 3
Icons Used in This Book 3
Where to Go from Here 4
Chapter 1: Wrapping Your Head Around Data Science 5
Seeing Who Can Make Use of Data Science 6
Inspecting the Pieces of the Data Science Puzzle 8
Collecting, querying, and consuming data 9
Applying mathematical modeling to data science tasks 11
Deriving insights from statistical methods 11
Coding, coding, coding - it's just part of the game 12
Applying data science to a subject area 12
Communicating data insights 14
Chapter 2: Tapping into Critical Aspects of Data Engineering 15
Defining the Three Vs 15
Grappling with data volume 16
Handling data velocity 16
Dealing with data variety 17
Identifying Important Data Sources 18
Grasping the Differences among Data Approaches 18
Defining data science 19
Defining machine learning engineering 20
Defining data engineering 20
Comparing machine learning engineers, data scientists, and data engineers 21
Storing and Processing Data for Data Science 22
Storing data and doing data science directly in the cloud 22
Processing data in real-time 27
Recognizing the Impact of Generative AI 27
The reshaping of data engineering 28
Tools and frameworks for supporting AI workloads 28
Chapter 3: Using a Machine to Learn from Data 29
Defining Machine Learning and Its Processes 29
Walking through the steps of the machine learning process 30
Becoming familiar with machine learning terms 30
Considering Learning Styles 31
Learning with supervised algorithms 31
Learning with unsupervised algorithms 32
Learning with reinforcement 32
Seeing What You Can Do 32
Selecting algorithms based on function 33
Generating real-time analytics with Spark 36
Chapter 4: Math, Probability, and Statistical Modeling 39
Exploring Probability and Inferential Statistics 40
Probability distributions 42
Conditional probability with Naïve Bayes 44
Quantifying Correlation 45
Calculating correlation with Pearson's r 45
Ranking variable pairs using Spearman's rank correlation 47
Reducing Data Dimensionality with Linear Algebra 48
Decomposing data to reduce dimensionality 48
Reducing dimensionality with factor analysis 52
Decreasing dimensionality and removing outliers with PCA 53
Modeling Decisions with Multiple Criteria Decision-Making 54
Turning to traditional MCDM 55
Focusing on fuzzy MCDM 57
Introducing Regression Methods 57
Linear regression 57
Logistic regression 59
Ordinary least squares regression methods 60
Detecting Outliers 60
Analyzing extreme values 60
Detecting outliers with univariate analysis 61
Detecting outliers with multivariate analysis 62
Introducing Time Series Analysis 64
Identifying patterns in time series 64
Modeling univariate time series data 65
Chapter 5: Grouping Your Way into Accurate Predictions 67
Starting with Clustering Basics 68
Getting to know clustering algorithms 69
Examining clustering similarity metrics 71
Identifying Clusters in Your Data 72
Clustering with the k-means algorithm 72
Estimating clusters with kernel density estimation 74
Clustering with hierarchical algorithms 75
Dabbling in the DBScan neighborhood 77
Categorizing Data with Decision Tree and Random Forest Algorithms 79
Drawing a Line between Clustering and Classification 80
Introducing instance-based learning classifiers 81
Getting to know classification algorithms 81
Making Sense of Data with Nearest Neighbor Analysis 84
Classifying Data with Average Nearest Neighbor Algorithms 86
Classifying with K-Nearest Neighbor Algorithms 89
Understanding how the k-nearest neighbor algorithm works 90
Knowing when to use the k-nearest neighbor algorithm 91
Exploring common applications of k-nearest neighbor algorithms 92
Solving Real-World Problems with Nearest Neighbor Algorithms 92
Seeing k-nearest neighbor algorithms in action 92
Seeing average nearest neighbor algorithms in action 93
Chapter 6: Coding Up Data Insights and Decision Engines 95
Seeing Where Python Fits into Your Data Science Strategy 95
Using Python for Data Science 96
Sorting out the various Python data types 98
Putting loops to good use in Python 101
Having fun with functions 103
Keeping cool with classes 104
Checking out some useful Python libraries 107
Chapter 7: Generating Insights with Software Applications 115
Choosing the Best Tools for Your Data Science Strategy 116
Getting a Handle on SQL and Relational Databases 118
Investing Some Effort into Database Design 123
Defining data types 123
Designing constraints properly 124
Normalizing your database 124
Narrowing the Focus with SQL Functions 127
Making Life Easier with Excel 131
Using Excel to quickly get to know your data 132
Reformatting and summarizing with PivotTables 137
Automating Excel tasks with macros 139
Chapter 8: Telling Powerful Stories with Data 143
Data Visualizations: The Big Three 144
Data storytelling for decision-makers 145
Data showcasing for analysts 145
Designing data art for activists 146
Designing to Meet the Needs of Your Target Audience 146
Step 1: Brainstorm (All about Eve) 147
Step 2: Define the purpose 148
Step 3: Choose the most functional visualization type for your purpose 149
Picking the Most Appropriate Design Style 150
Inducing a calculating, exacting response 150
Eliciting a strong emotional response 151
Selecting the Appropriate Data Graphic Type 152
Standard chart graphics 154
Comparative graphics 157
Statistical plots 161
Topology structures 162
Spatial plots and maps 164
Testing Data Graphics 167
Adding Context 168
Creating context with data 169
Creating context with annotations 169
Creating context with graphical elements 169
Chapter 9: Ten Free or Low-Cost Data Science Libraries and Platforms 171
Scraping the Web with Beautiful Soup 171
Wrangling Data with pandas 172
Visualizing Data with Looker Studio 172
Machine Learning with scikit-learn 172
Creating Interactive Dashboards with Streamlit 173
Doing Geospatial Data Visualization with Kepler.gl 173
Making Charts with Tableau Public 173
Doing Web-Based Data Visualization with RAWGraphs 174
Making Cool Infographics with Infogram 174
Making Cool Infographics with Canva 174
Index 175
Chapter 1
IN THIS CHAPTER
Deploying data science methods across various industries
Piecing together the core data science components
Identifying viable data science solutions to business challenges
Exploring data science career alternatives
For over a decade now, everyone has been absolutely deluged by data. It's coming from every computer, every mobile device, every camera, and every imaginable sensor - and now it's even coming from watches and other wearable technologies. Data is generated in every social media interaction we humans make, every file we save, every picture we take, and every query we submit; data is even generated when we do something as simple as ask a favorite search engine for directions to the closest ice cream shop.
If you're anything like I was, you may have wondered, "What's the point of all this data? Why use valuable resources to generate and collect it?" Although even just two decades ago, no one was in a position to make much use of most of the data that's generated, the tides today have definitely turned. Specialists known as data engineers are constantly finding innovative and powerful new ways to capture, collate, and condense unimaginably massive volumes of data. Other specialists known as data scientists are leading change by deriving valuable and actionable insights from that data.
In its truest form, data science represents the optimization of processes and resources. Data science produces data insights - actionable, data-informed conclusions or predictions that you can use to understand and improve your business, your investments, your health, and even your lifestyle and social life. Using data science insights is like being able to see in the dark. For any goal or pursuit you can imagine, you can find data science methods to help you predict the most direct route from where you are to where you want to be - and to anticipate every pothole in the road between both places.
In this chapter, I explain the difference between data science and data engineering.
The terms data science and data engineering are often misused and confused, so let me start off by clarifying that these two fields are, in fact, separate and distinct domains of expertise. Data science is the computational science of extracting meaningful insights from raw data and then effectively communicating those insights to generate value. Data engineering, on the other hand, is an engineering domain that's dedicated to building and maintaining systems that overcome data processing bottlenecks and data handling problems for applications that consume, process, and store large volumes, varieties, and velocities of data.
In both data science and data engineering, you commonly work with the following types of data:
In the past, only large tech companies with massive funding had the skills and computing resources required to implement data science methodologies to optimize and improve their business, but that hasn't been the case for quite a while now. The proliferation of data has created a demand for insights, and this demand is embedded in many aspects of modern culture - from the Uber passenger who expects the driver to show up exactly at the time and location predicted by the Uber app to the online shopper who expects the Amazon platform to recommend the best product alternatives for comparing similar goods before making a purchase. Data and the need for data-informed insights are ubiquitous. Because organizations of all sizes are beginning to recognize that they're immersed in a sink-or-swim, data-driven, competitive environment, data know-how has emerged as a core and requisite function in almost every line of business.
What does this mean for the average knowledge worker? It means that everyday employees are increasingly expected to support a progressively advancing set of technological and data requirements. Why? Because almost all industries are reliant on data technologies and the insights they spur. Consequently, many people are in continuous need of upgrading their data skills, or else they face the real possibility of being replaced by a more data-savvy employee.
The good news is that upgrading data skills doesn't usually require people to go back to college or earn a university degree in statistics, computer science, or data science. The bad news is that, even with professional training or self-teaching, it always takes extra work to stay industry-relevant and tech-savvy. In this respect, the data revolution isn't so different from any other change that has hit industry in the past. The fact is, in order to stay relevant, you need to take the time and effort to acquire the skills that keep you current. When you're learning how to do data science, you can take some courses, educate yourself using online resources, read books like this one, and attend events where you can learn what you need to know to stay on top of the game.
Who can use data science? You can. Your organization can. Your employer can. Anyone who has a bit of understanding and training can begin using data insights to improve their lives, their careers, and the well-being of their businesses. Data science represents a change in the way you approach the world. When determining outcomes, people once used to make their best guess, act on that guess, and then hope for the desired result. With data insights, however, people now have access to the predictive vision that they need to truly drive change and achieve the results they want.
Here are some examples of ways you can use data insights to make the world, and your company, a better place:
To practice data science, in the true meaning of the term, you need the analytical know-how of math and statistics, the coding skills necessary to work with data, and an area of subject matter expertise. Without this expertise, you may as well call yourself a mathematician or a statistician. Similarly, a programmer without subject matter expertise and analytical know-how may better be considered a software engineer or developer, but not a data scientist.
The need for data-informed business and product strategy has been increasing exponentially for about a decade now, forcing all business sectors and industries to adopt a data science approach. As such, different flavors of data science have emerged. The following are just a few titles under which experts of every discipline are required to know and regularly do data science:
Nowadays, it's almost impossible to differentiate between a proper data scientist and a subject matter expert (SME) whose success depends heavily on their ability to use data science to generate insights. Looking at a person's job title may or may not be helpful, simply because many roles are titled data scientist when they may as well be labeled data strategist or product manager, based on the actual requirements. In addition, many knowledge workers are doing daily data science and not working under the title of data scientist. It's an overhyped, often misleading label that's not always helpful if you're trying to find out what a data scientist does by looking at online job boards.
To shed some light,...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.