Fuzzy Data Matching with SQL

Name: Fuzzy Data Matching with SQL
Brand: O'Reilly
Price: 50.49 EUR
Availability: OnlineOnly

Jim Lehmer(Author)

O'Reilly (Publisher)

Published on 3. October 2023

284 pages

E-Book

PDF with Adobe-DRM

System requirements

978-1-0981-5224-6 (ISBN)

€50.49incl. 7% vat

System requirements

for PDF with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Content

Cover
Copyright
Table of Contents
Preface
What Problems Are We Trying to Solve?
What Will We Cover?
Part I: Review
Part II: Various Data Problems
Part III: Bringing It Together
Appendix
Who Is This Book For?
Why SQL?
Warning! Opinions Ahead!
Typographical Conventions Used in This Book
Additional Information on the Book's Conventions
The Data "Model"
Environment Layout
Customer Table
"Normalized" View
Meet the Snedleys
Using Code Examples
O'Reilly Online Learning
How to Contact Us
Acknowledgments
Part I. Review
Chapter 1. A SELECT Review
Simple SELECT Statements
Common Table Expressions
In CASE of Emergency
Joins
A Diversion into NULL Values
OUTER JOINs
Finding the Most Current Value
Final Thoughts on SELECT
Chapter 2. Function Junction
Aggregate Functions
MAX
MIN
COUNT
SUM
AVG
Conversion Functions
CAST and CONVERT
COALESCE
TRY_CONVERT
Cryptographic Functions: HASHBYTES
Date and Time Functions
GETDATE
DATEADD
DATEDIFF
DATEPART
ISDATE
Logical Functions: IIF
String Functions
CHARINDEX and PATINDEX
LEN
LEFT, RIGHT, and SUBSTRING
LTRIM, RTRIM, and TRIM
LOWER and UPPER
REPLACE and TRANSLATE
REVERSE
STRING_AGG
System Functions
ISNULL
ISNUMERIC
Final Thoughts on Functions
Part II. Various Data Problems
Chapter 3. Names, Names, Names
What's in a Name?
Last Names
Punctuation
Suffixes
First Names
Middle Name
Nicknames
Company Name
Full Name
"Person-Like Entities"
Final Thoughts on Names
Chapter 4. Location, Location, Location
What Makes an Address?
Street Address
Box, Suite, Lot, or Apartment Number
Don't Overdo It!
City
County
State or State Abbreviation
ZIP or Postal Code
Country
Final Thoughts on Locations
Chapter 5. Dates, Dates, Dates
Time Is Relative
Final Thoughts on Dates
Chapter 6. Email
What Makes a Valid Email Address?
Final Thoughts on Email
Chapter 7. Phone Numbers
What Makes a "Phone Number"?
One Final Note on Tax IDs
Final Thoughts on Phone Numbers (and Tax IDs)
Chapter 8. Bad Characters
Data Representations
Invisible Whitespace
COLLATE
Cleaning Up the Input Data
Final Thoughts on Bad Characters
Chapter 9. Orthogonal Data
A Common Problem, A Common Solution, A New Common Problem
Lather, Rinse, Repeat
Final Thoughts on Orthogonal Data
Part III. Bringing It Together
Chapter 10. The Big Score
What Will We Want?
Tuning Scores
Eliminating Duplicates
Duplicate Data
Duplicated Data
Final Thoughts on Scoring
Chapter 11. Data Quality, or GIGO
Sneaking Data Quality In
Impossible Data
Simply Wrong
Semantically Wrong
ETL Your Way to Success
Final Thoughts on Data Quality
Chapter 12. Tying It All Together
Approach
What's the Score?
First Pass: Naive Matching
Second Pass: Normalizing Relations
Impossible Data
Now Let's Normalize
Third Pass: Score!
What About Tuning?
Final Thoughts on Practical Matters
Chapter 13. Code Is Data, Too!
Working with XML Data
Working with JSON Data
Extracting Data from HTML
Code-Generating Code
Impact Analysis: The Second Case Study
Gather Together Every Code "Artifact" You Can
Import Artifacts into SQL
And Now, for My Next Trick
Final Thoughts on Code As Data
Final Thoughts on All of It
Appendix. The Data "Model"
Customer Table
NormalizedCustomer View
PotentialMatches Table
CustomerCountByState View
PostalAbbreviations Table
Glossary
Index
About the Author
Colophon
Tech Stack

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Fuzzy Data Matching with SQL

Description

More details

Other editions

Additional editions

Content

System requirements