
Sharing Big Data Safely
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Many big data-driven companies today are moving to protect certain types of data against intrusion, leaks, or unauthorized eyes. But how do you lock down data while granting access to people who need to see it? In this practical book, authors Ted Dunning and Ellen Friedman offer two novel and practical solutions that you can implement right away.
Ideal for both technical and non-technical decision makers, group leaders, developers, and data scientists, this book shows you how to:
- Share original data in a controlled way so that different groups within your organization only see part of the whole. You'll learn how to do this with the new open source SQL query engine Apache Drill.
- Provide synthetic data that emulates the behavior of sensitive data. This approach enables external advisors to work with you on projects involving data that you can''t show them.
If you're intrigued by the synthetic data solution, explore the log-synth program that Ted Dunning developed as open source code (available on GitHub), along with how-to instructions and tips for best practice. You'll also get a collection of use cases.
Providing lock-down security while safely sharing data is a significant challenge for a growing number of organizations. With this book, you'll discover new options to share data safely without sacrificing security.
More details
Other editions
Additional editions

Content
- Cover
- Copyright
- Table of Contents
- Preface
- Who Should Use This Book
- Chapter 1. So Secure It's Lost
- Safe Access in Secure Big Data Systems
- Chapter 2. The Challenge: Sharing Data Safely
- Surprising Outcomes with Anonymity
- The Netflix Prize
- Unexpected Results from the Netflix Contest
- Implications of Breaking Anonymity
- Be Alert to the Possibility of Cross-Reference Datasets
- New York Taxicabs: Threats to Privacy
- Sharing Data Safely
- Chapter 3. Data on a Need-to-Know Basis
- Views: A Secure Way to Limit What Is Seen
- Why Limit Access?
- Apache Drill Views for Granular Security
- How Views Work
- Summary of Need-to-Know Methods
- Chapter 4. Fake Data Gives Real Answers
- The Surprising Thing About Fake Data
- Keep It Simple: log-synth
- Log-synth Use Case 1: Broken Large-Scale Hive Query
- Log-synth Use Case 2: Fraud Detection Model for Common Point of Compromise
- What Thieves Do
- Why Machine Learning Experts Were Consulted
- Using log-synth to Generate Fake User Histories
- Summary: Fake Data and log-synth to Safely Work with Secure Data
- Chapter 5. Fixing a Broken Large-Scale Query
- A Description of the Problem
- Determining What the Synthetic Data Needed to Be
- Schema for the Synthetic Data
- Generating the Synthetic Data
- Tips and Caveats
- What to Do from Here?
- Chapter 6. Fraud Detection
- What Is Really Important?
- The User Model
- Sampler for the Common Point of Compromise
- How the Breach Model Works
- Results of the Entire System Together
- Handy Tricks
- Summary
- Chapter 7. A Detailed Look at log-synth
- Goals
- Maintaining Simplicity: The Role of JSON in log-synth
- Structure
- Sampling Complex Values
- Structuring and De-structuring Samplers
- Extending log-synth
- Using log-synth with Apache Drill
- Choice of Data Generators
- R is for Random
- Benchmark Systems
- Probabilistic Programming
- Differential Privacy Preserving Systems
- Future Directions for log-synth
- Chapter 8. Sharing Data Safely: Practical Lessons
- Appendix A. Additional Resources
- Log-synth Open Source Software
- Apache Drill and Drill SQL Views
- General Resources and References
- Cheapside Hoard and Treasures
- Codes and Cipher
- Netflix Prize
- Problems with Data Sharing
- Additional O'Reilly Books by Dunning and Friedman
- About the Authors
- Strata+Hadoop World
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.