Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
The first edition of this book set out to explain data analysis from an eminently practical perspective, using the familiar tools of SQL and Excel. The guiding principle of the book was to start with questions and guide the reader through the solutions, both from a business perspective and a technical perspective. This approach proved to be quite successful.
Much has changed in the ten years since I started writing the first edition. The tools themselves have changed. In those days, Excel did not have a Ribbon, for instance. And, window functions were rare in databases. The world that analysts inhabit has also changed, with tools such as Python and R and NoSQL databases becoming more common. However, relational databases are still in widespread use, and SQL is, if anything, even more relevant today as technology spreads through businesses big and small. Excel still seems to be the reporting and presentation tool of choice for many business users. Big data is no longer a future frontier; it is a problem, a challenge, and an opportunity that we face on a daily basis.
The second edition has been revised and updated to reflect the changes in the underlying software, with more examples and more techniques, and an additional chapter on database performance. In doing so, I have strived to keep the strengths from the first edition. The book is still organized around the principles of data, analysis, and presentation-three capabilities that are rarely treated together. Examples are organized around questions, with a discussion of both the business relevance and the technical approaches to the problems. The examples carry through to actual code. The data, the code, and the Excel examples are all available on the companion website.
The motivation for this approach originally came from a colleague, Nick Drake, who is a statistician by training. Once upon a time, he was looking for a book that would explain how to use SQL for the complex queries needed for data analysis. Books on SQL tend to cover either basic query constructs or the details of how databases work. None come strictly from a perspective of analyzing data, and none are structured around answering questions about data. Of the many books on statistics, none address the simple fact that most of the data being used resides in relational databases. This book fills that gap.
My other books on data mining, written with Michael Berry, focus on advanced algorithms and case studies. By contrast, this book focuses on the "how-to." It starts by describing data stored in databases and continues through preparing and producing results. Interspersed are stories based on my experience in the field, explaining how results might be applied and why some things work and other things do not. The examples are so practical that the data used for them is available on the book's companion website (www.wiley.com/go/dataanalysisusingsqlandexcel2e).
One of the truisms about data warehouses and analysis databases in general is that they don't actually do anything. Yes, they store data. Yes, they bring together data from different sources, cleansing and clarifying along the way. Yes, they define business dimensions, store transactions about customers, and, perhaps, summarize important data. (And, yes, all these are very important!) However, data in a database resides on many spinning disks and in complex data structures in a computer's memory. So much data, so little information.
How can we exploit this data, particularly data that describes customers? The many fancy algorithms for statistical modeling and data mining all have a simple rule: "garbage-in, garbage-out." The results of even the most sophisticated techniques are only as good as the data being used (and the assumptions being fed into the model). Data is central to the task of understanding customers, understanding products, and understanding markets.
The chapters in this book cover different aspects of data and several important analytic techniques that are readily supported by SQL and Excel. The analytic techniques range from exploratory data analysis to survival analysis, from market basket analysis to naïve Bayesian models, and from simple animations to regression. Of course, the potential range of possible techniques is much larger than can be presented in one book. These methods have proven useful over time and are applicable in many different areas.
And finally, data and analysis are not enough. Data must be analyzed, and the results must be presented to the right audience. To fully exploit its value, we must transform data into stories and scenarios, charts and metrics and insights.
This book focuses on three key technological areas used for transforming data into actionable information:
These three technologies are presented together because they are all interrelated. SQL answers the question "How do we access data?" Statistics answers the question "How is it relevant?" And Excel makes it possible to convince other people of the veracity of our findings and to provide them results that they can play with.
The description of data processing is organized around the SQL language. Databases such as Oracle, Postgres, MySQL, IBM DB2, and Microsoft SQL Server are common in the business world, storing the vast majority of business data transactions. The good news is that all relational databases support SQL as a query language. However, just as England and the United States have been described as "two countries separated by a common language," each database supports a slightly different dialect of SQL. The Appendix contains a list of how commonly used functionality is represented in various different dialects.
Similarly, beautiful presentation tools and professional graphics packages are available. However, rare and exceptional is the workplace computer that does not have Excel or an equivalent spreadsheet.
Statistics and data mining techniques do not always require advanced tools. Some very important techniques are readily available using the combination of SQL and Excel, including survival analysis, look-alike models, naïve Bayesian models, and association rules. In fact, the methods in this book are often more powerful than the methods available in such tools, precisely because they are close to the data and readily customizable. The explanation of the techniques covers both the basic ideas and the extensions that may not be available in other tools.
The chapters describing the various techniques provide a solid introduction to modeling and data exploration, in the context of familiar tools and data. They also highlight when more advanced tools are useful because the problem exceeds the capabilities of the simpler tools.
The 14 chapters in this book fall into four parts. The first three introduce key concepts of SQL, Excel, and statistics. The seven middle chapters discuss various methods of exploring data and analytic techniques specifically suited to SQL and Excel. More formal ideas about modeling, in the sense of statistics and data mining, are in the next three chapters. And, finally, a new chapter discusses performance issues when writing SQL queries.
Each chapter explains some aspect of data analysis using SQL and Excel from several different perspectives, including:
Examples in the chapters are generally available in Excel at www.wiley.com/go/dataanalysisusingsqlandexcel2e.
SQL is a concise language that is sometimes difficult to follow. Dataflows, graphical representations of data processing, are used to illustrate how SQL works. These dataflow diagrams are a reasonable approximation of how SQL engines actually process the data, although the details necessarily differ based on the underlying engine.
Results are presented in charts and tables, sprinkled throughout the book. In addition, important features of Excel are highlighted, and interesting uses of Excel graphics are explained. Each chapter has technical asides, typically explaining some aspect of a technique or an interesting bit of history associated with the methods described in the chapter.
Chapter 1, "A Data Miner Looks at SQL," introduces SQL from the perspective of data analysis. This is the querying part of the SQL language, used to extract data from databases using SELECT queries.
This chapter introduces entity-relationship diagrams to describe the structure of the data-the tables and columns and how they relate to each other. It also introduces dataflow diagrams to describe the processing of queries; dataflow diagrams give a visual explanation of how data is processed. This chapter introduces the important functionality used throughout the book-such as joins, aggregations, and window...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.