
Hands-On Data Science with SQL Server 2017
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Find, explore, and extract big data to transform into actionable insights
Key Features
- Perform end-to-end data analysis-from exploration to visualization
- Real-world examples, tasks, and interview queries to be a proficient data scientist
- Understand how SQL is used for big data processing using HiveQL and SparkSQL
Book Description
SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features.
Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples.
By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.
What you will learn
- Understand what data science is and how SQL Server is used for big data processing
- Analyze incoming data with SQL queries and visualizations
- Create, train, and evaluate predictive models
- Make predictions using trained models and establish regular retraining courses
- Incorporate data source querying into SQL Server
- Enhance built-in T-SQL capabilities using SQLCLR
- Visualize data with Reporting Services, Power View, and Power BI
- Transform data with R, Python, and Azure
Who this book is for
Hands-On Data Science with SQL Server 2017 is intended for data scientists, data analysts, and big data professionals who want to master their skills learning SQL and its applications. This book will be helpful even for beginners who want to build their career as data science professionals using the power of SQL Server 2017. Basic familiarity with SQL language will aid with understanding the concepts covered in this book.
More details
Other editions
Additional editions

Content
- Cover
- Title Page
- Copyright and Credits
- About Packt
- Contributors
- Table of Contents
- Preface
- Chapter 1: Data Science Overview
- Introducing data science
- Data science project life cycle
- Business understanding
- Getting data
- Modelling and analysis
- Deployment and visualization
- Final acceptance
- Data science domains
- Math and statistics
- Visualizing the types of data
- Statistics 101
- Central tendency
- Skewness
- Variability
- Machine learning
- SQL Server and machine learning
- Choosing the right algorithm
- Big data
- SQL Server and big data
- Summary
- Chapter 2: SQL Server 2017 as a Data Science Platform
- Technical requirements
- SQL Server evolution
- What's available in the pack?
- History of SQL Server
- SQL Server in the cloud
- Azure SQL Database
- Azure SQL Data Warehouse
- SQL Server Services and their use with data science
- SQL Server Integration Services
- SQL Server Analysis Services
- Tabular Mode
- Multidimensional mode
- PowerPivot Mode
- Querying languages
- Reporting Services
- Development tools for Reporting Services
- Power BI Report Server
- Machine Learning Services
- Summary
- Chapter 3: Data Sources for Analytics
- Technical requirements
- Getting data from databases
- Importing data from SQL Server
- Importing data from other database systems
- Importing flat files
- Working with XML data
- Working with JSON
- Retrieve data as JSON
- Processing stored JSON data
- External data with PolyBase
- Installing and configuring
- Summary
- Chapter 4: Data Transforming and Cleaning with T-SQL
- Technical requirements
- The need for data transformation
- Database architectures for data transformations
- Direct source for data analysis
- Staging-target scenario
- Landing-staging-target scenario
- Tools eligible for data movement
- Distributed queries
- SQL Server Integration Services
- Why should we use SSIS?
- What is needed to develop an SSIS solution?
- Where should SSIS be used?
- Is there an alternative to SSIS?
- Transforming data
- Full data load
- Incremental data load
- The MERGE statement
- CHECKSUM
- Temporal tables
- Denormalizing data
- Relational normalization
- First normal form
- Second normal form
- Third normal form
- Need for denormalization
- Ways of denormalization
- Computed columns
- Denormalization using joins
- Using views and stored procedures
- Database applications
- Using views
- Using stored procedures
- Performance considerations
- Writing correct code
- Using indexes
- B-tree indexes
- COLUMNSTORE INDEX
- Summary
- Questions
- Chapter 5: Data Exploration and Statistics with T-SQL
- Technical requirements
- T-SQL aggregate queries
- Common properties of aggregate functions
- Aggregate functions
- COUNT, COUNT(*), and COUNT_BIG
- MIN and MAX
- SUM
- AVG
- VAR and VARP
- STDEV and STDEVP
- Using groups
- Using the HAVING clause
- Ranking, framing, and windowing
- Ranking functions
- ROW_NUMBER
- RANK
- DENSE_RANK
- NTILE
- Running aggregates
- Using aggregate functions in running aggregates
- Using aggregate functions
- Using the LEAD and LAG functions
- Calculating with percentiles
- The PERCENT_RANK and CUME_DIST functions
- The PERCENTILE_CONT and PERCENTILE_DISC functions
- Summary
- Questions
- Chapter 6: Custom Aggregations on SQL Server
- Technical requirements
- Overview of SQLCLR
- Use cases of using SQLCLR
- How to work with SQLCLR
- Instance and database configurations to use with SQLCLR
- Creating CLR aggregations
- Example goal and assignment
- Skeleton of CLR aggregation
- Implementing methods
- Implementing custom serialization
- Implementing the Init method
- Implementing the Accumulate method
- Implementing the Merge method
- Implementing the terminate method
- Deployment and testing
- Limitations and performance considerations
- Development issues and risks
- Maintenance issues and risks
- Performance issues and risks
- Summary
- Questions
- Chapter 7: Data Visualization
- Technical requirements
- Data visualization - preparation phase
- Power BI Report Server
- Starting with Power BI Desktop
- Defining the data source
- Adding visualizations to the Report
- Visual interactions
- Publishing reports
- SQL Server Reporting Services
- Adding charts to Reports
- Using SQL Server Data Tools
- Summary
- Chapter 8: Data Transformations with Other Tools
- Technical requirements
- Categorization, missing values, and normalization
- Categorization
- Missing values
- Normalization
- Z-score
- Feature-scaling
- Using Integration Services for data transformation
- Setting up a SSIS project
- Categorizing the products
- Using R for data transformation
- Preparing client R environment
- R Syntax first steps
- Working example of Z-score computed in R
- Using Data Factory for data transformation
- Creating Azure Data Factory
- Creating simple copy data with ADF
- Summary
- Questions
- Chapter 9: Predictive Model Training and Evaluation
- Technical requirements
- Preparing SQL Server
- Setting up and configuring ML services
- Preparing to install our own R packages
- Creating data structures
- The concept of machine learning in databases
- Creating physical data structures
- Creating common objects
- Creating objects using filestreams
- Creating objects using temporal tables
- Deploying, training, and evaluating a predictive model
- Saving our machine learning model to filestreams
- Saving a machine learning model to temporal tables
- Summary
- Questions
- Chapter 10: Making Predictions
- Technical requirements
- Reading models from a database
- Reading the model from a common table
- Reading the model from a temporal table
- Submitting values to an external script
- Submitting values into the external script
- Deserializing a predictive model
- Making the prediction
- Using the PREDICT keyword
- Making the predictive model self-training
- Re-calculating a predictive model regularly
- Re-calculating a predictive model asynchronously
- Creating a message type
- Creating a contract
- Creating queues and services
- Sending a request to train a new model
- Consuming requests and sending responses
- Testing the asynchronous solution
- Summary
- Questions
- Chapter 11: Getting It All Together - A Real-World Example
- Technical requirements
- Assignment and preparation
- SQL Server
- Data description
- Data exploration
- Exploring data using T-SQL
- Exploring data using the SSIS Data Profiling Task
- Exploring the SourceData.Actions table
- Exploring data using R
- Data transformation
- Training and using predictive models for estimations
- Preparing the schema for the model
- Training the model
- Using the rxLinMod function and finishing the model
- Using the model in predictions
- Summary
- Questions
- Chapter 12: Next Steps with Data Science and SQL
- Data science next steps
- Next steps with SQL Server
- Big data clusters
- Machine learning
- Machine learning services on Linux
- Machine learning high availability
- Data science in the cloud
- Summary
- Other Books You May Enjoy
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.