Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
When I was first brainstorming topics for this book, I used two questions to narrow down my list: "Who is my audience?" and "What topic do I know well enough to write a book that would be worth publishing for that audience?"
The first question had an easy initial answer: I already have an audience of data-science-learning Twitter followers with whom I share resources and advice on "Becoming a Data Scientist" that I could keep in mind while narrowing down the topics.
So then I was left to figure out what I know that I could teach to people who want to become data scientists.
I have been designing and querying relational databases professionally for about 17 years: first as a database and web developer, then as a data analyst, and for the last 5 years, as a data scientist. SQL (Structured Query Language) has been a key tool for me throughout-whether I was working with MS Access, MS SQL Server, MySQL, Oracle, or Redshift databases, and whether I was summarizing data into reporting views in a data mart, extracting data to use in a data visualization tool like Tableau, or preparing a dataset for a machine learning project.
Since SQL is a tool I have used throughout my career, and because creating and retrieving datasets for analysis has been such an integral part of my job as a data scientist, I was surprised to learn that some data scientists don't know SQL or don't regularly write SQL code. But in an informal Twitter poll I conducted, which received responses from 979 data scientists, 19% of them reported wanting to learn, or learn more, SQL (74% reported already using SQL professionally). Additionally, 55% of 713 respondents who were working toward becoming data scientists said they wanted to learn, or learn more, SQL. So, my target audience had an interest in this topic.
According to an analysis of online job postings conducted by Jeff Hale of Towards Data Science, SQL is in the top three technology skills that data scientist jobs require. (See towardsdatascience.com/the-most-in-demand-skills-for-data-scientists-4a4a8db896db.) In an Indeed BeSeen article, Joy Garza lists SQL as one of the top-five in-demand tech skills for data scientists. (See https://web.archive.org/web/20200624031802/https://www.beseen.com/blog/talent/data-scientist-skills/.)
towardsdatascience.com/the-most-in-demand-skills-for-data-scientists-4a4a8db896db
https://web.archive.org/web/20200624031802/https://www.beseen.com/blog/talent/data-scientist-skills/
After learning how many working and prospective data scientists wanted to learn SQL, and how much of a need there is in the industry for people who know how to use it, SQL dataset development started to move to the top of the list of topics I could share my knowledge of with others.
There are many SQL books on the market that can be used to learn query syntax and advanced SQL functions-after all, the language has been around for 45 years and has been standardized since the late 1980s-but I hadn't found any definitive resources to refer people to when they asked me if I knew of any books that taught how to use SQL to construct datasets for machine learning, so I decided to write this book to cover SQL from a data scientist's point of view.
So, my goal in writing this book is not only to teach you how to write SQL code but to teach you how to think about summarizing data into analytical datasets that can be used for reports and machine learning: to use SQL like a data scientist does. Like I do.
SQL for Data Scientists is designed to be a learning resource for anyone who wants to become (or who already is) a data analyst or data scientist, and wants to be able to pull data from databases to build their own datasets without having to rely on others in the organization to query the source system and transform it into flat files (or spreadsheets) for them.
There are plenty of SQL books out there, but many are either written as syntax references or written for people in other roles that create, query from, and maintain databases. However, this book is written from the perspective of a data scientist and is aimed at those who will primarily be extracting data from existing databases in order to generate datasets for analysis.
I won't assume that you've ever written SQL queries before, and we'll start with the basics, but I do assume that you have some basic understanding of what databases are and a general idea of how data might be used in reports, analyses, and machine learning algorithms. This book is meant to fill in the steps between finding a database that contains the data you need and starting the analysis. I aim to teach you how to think about structuring datasets for analysis and how to use SQL to extract the data from the database and get it into that form.
If you can use SQL to pull your own datasets, you don't have to rely on others in your organization to pull it for you, enabling you to work more efficiently. Requesting datasets usually involves a process of filling out a form or ticket describing in detail what data you need, waiting for your request to be fulfilled, then often clarifying your request after seeing the initial results, and then waiting again for modifications. If you can edit your own queries, you can not only design and retrieve your own datasets but then also adjust calculations or add fields as needed.
Additionally, running a SQL query that writes to a database table or exports to a file-effectively snapshotting the data in the form you need it in for your analysis-means you don't have to retrieve and reprocess the data in your machine learning script every time you run your code, speeding up the usually iterative model development process.
Some summaries and calculations can be done more efficiently in SQL than in other types of code, as well, so even if you are running the queries "live" each time you run your script, you may be able to lower the computational cost of your code by doing some of the transformations in SQL.
Finally, because it is a high-demand tech skill in data scientist job postings, learning SQL will increase your marketability and value to employers.
My goal is that by the time you finish reading this book and practicing the queries within (ideally both on the provided example database and on another database of your choosing, so you have to modify the example queries and apply them in another context), you will be able to think through the process of creating an analytical dataset and develop the SQL code necessary to generate your intended output.
I hope that even if you end up needing to use a SQL function that's not covered in this book, you will have gained enough baseline knowledge from the book to go look it up online and determine how to best use it in the query you are developing.
I also hope that this book will help you feel confident that you can pull your own data at work and get it into the form you need it in for your report or model without having to wait on others to do it for you.
This book uses MySQL version 8.0-style SQL. No matter what type of database system you use (MS SQL Server, Redshift, PostgreSQL, Oracle, etc.), the query design concepts and syntax are very similar, when not identical across platforms. So, if you work with a database system other than MySQL, you might have to search for the equivalent code syntax for a few functions in the book, but the overall dataset design concepts are platform-independent, and the SQL keywords are cross-platform standards.
When you see code displayed in the following style:
SELECT * FROM Product
that means it is a complete SQL query that you can use to select data from the Farmer's Market database described in Chapter 1, "Data Sources." If you're reading the printed version of this book, you can go to the book's website to get digital versions of the queries that you can copy and paste to try them out yourself.
Reserved SQL keywords like SELECT will appear in all-uppercase throughout the book, and column names will appear in all-lowercase. This isn't a requirement of SQL syntax (neither are line breaks), but is a convention used for readability.
SELECT
Be aware that the Farmer's Market database will continue to evolve, and I will likely continue adding rows to its tables after this book goes to print, so the data values you see in the output when you run the queries yourself may not exactly match the screenshots included in the printed book.
As you work through the examples in this book, you may choose either to type in all the code manually or to use the source code files that accompany the book. All the source code used in this book, along with the Farmer's Market database, is available for download from both sqlfordatascientists.com and www.wiley.com/go/sqlfordatascientists.
sqlfordatascientists.com
www.wiley.com/go/sqlfordatascientists
If you believe you've found a mistake in this book, please bring it to our attention. At John Wiley & Sons, we understand how important it is to provide our customers with accurate content, but even with our best efforts an error may occur.
In order to submit your possible errata, please email it to our Customer Service Team at wileysupport@wiley.com with the subject...
wileysupport@wiley.com
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.