Building Secure and Reliable Systems

Best Practices for Designing, Implementing, and Maintaining Systems
 
 
O'Reilly (Verlag)
  • erschienen am 16. März 2020
  • |
  • 558 Seiten
 
E-Book | ePUB mit Adobe-DRM | Systemvoraussetzungen
978-1-4920-8307-8 (ISBN)
 
Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best practices to help your organization design scalable and reliable systems that are fundamentally secure.Two previous OReilly books from GoogleSite Reliability Engineering and The Site Reliability Workbookdemonstrated how and why a commitment to the entire service lifecycle enables organizations to successfully build, deploy, monitor, and maintain software systems. In this latest guide, the authors offer insights into system design, implementation, and maintenance from practitioners who specialize in security and reliability. They also discuss how building and adopting their recommended best practices requires a culture thats supportive of such change.Youll learn about secure and reliable systems through:Design strategiesRecommendations for coding, testing, and debugging practicesStrategies to prepare for, respond to, and recover from incidentsCultural best practices that help teams across your organization collaborate effectively
  • Englisch
  • Sebastopol
  • |
  • USA
  • 5,38 MB
978-1-4920-8307-8 (9781492083078)
weitere Ausgaben werden ermittelt
  • Intro
  • Foreword by Royal Hansen
  • Foreword by Michael Wildpaner
  • Preface
  • Why We Wrote This Book
  • Who This Book Is For
  • A Note About Culture
  • How to Read This Book
  • Conventions Used in This Book
  • O'Reilly Online Learning
  • How to Contact Us
  • Acknowledgments
  • I. Introductory Material
  • 1. The Intersection of Security and Reliability
  • On Passwords and Power Drills
  • Reliability Versus Security: Design Considerations
  • Confidentiality, Integrity, Availability
  • Confidentiality
  • Integrity
  • Availability
  • Reliability and Security: Commonalities
  • Invisibility
  • Assessment
  • Simplicity
  • Evolution
  • Resilience
  • From Design to Production
  • Investigating Systems and Logging
  • Crisis Response
  • Recovery
  • Conclusion
  • 2. Understanding Adversaries
  • Attacker Motivations
  • Attacker Profiles
  • Hobbyists
  • Vulnerability Researchers
  • Governments and Law Enforcement
  • Intelligence gathering
  • Military purposes
  • Policing domestic activity
  • Protecting your systems from nation-state actors
  • Activists
  • Protecting your systems from hacktivists
  • Criminal Actors
  • Protecting your systems from criminal actors
  • Automation and Artificial Intelligence
  • Protecting your systems from automated attacks
  • Insiders
  • First-party insiders
  • Third-party insiders
  • Related insiders
  • Threat modeling insider risk
  • Designing for insider risk
  • Attacker Methods
  • Threat Intelligence
  • Cyber Kill ChainsT
  • Tactics, Techniques, and Procedures
  • Risk Assessment Considerations
  • Conclusion
  • II. Designing Systems
  • 3. Case Study: Safe Proxies
  • Safe Proxies in Production Environments
  • Google Tool Proxy
  • Conclusion
  • 4. Design Tradeoffs
  • Design Objectives and Requirements
  • Feature Requirements
  • Nonfunctional Requirements
  • Features Versus Emergent Properties
  • Example: Google Design Document
  • Balancing Requirements
  • Example: Payment Processing
  • Security and reliability considerations
  • Using a third-party service provider to handle sensitive data
  • Benefits
  • Costs and nontechnical risks
  • Reliability risks
  • Security risks
  • Managing Tensions and Aligning Goals
  • Example: Microservices and the Google Web Application Framework
  • Aligning Emergent-Property Requirements
  • Initial Velocity Versus Sustained Velocity
  • Conclusion
  • 5. Design for Least Privilege
  • Concepts and Terminology
  • Least Privilege
  • Zero Trust Networking
  • Zero Touch
  • Classifying Access Based on Risk
  • Best Practices
  • Small Functional APIs
  • Breakglass
  • Auditing
  • Collecting good audit logs
  • Choosing an auditor
  • Testing and Least Privilege
  • Testing of least privilege
  • Testing with least privilege
  • Diagnosing Access Denials
  • Graceful Failure and Breakglass Mechanisms
  • Worked Example: Configuration Distribution
  • POSIX API via OpenSSH
  • Software Update API
  • Custom OpenSSH ForceCommand
  • Custom HTTP Receiver (Sidecar)
  • Custom HTTP Receiver (In-Process)
  • Tradeoffs
  • A Policy Framework for Authentication and Authorization Decisions
  • Using Advanced Authorization Controls
  • Investing in a Widely Used Authorization Framework
  • Avoiding Potential Pitfalls
  • Advanced Controls
  • Multi-Party Authorization (MPA)
  • Three-Factor Authorization (3FA)
  • Business Justifications
  • Temporary Access
  • Proxies
  • Tradeoffs and Tensions
  • Increased Security Complexity
  • Impact on Collaboration and Company Culture
  • Quality Data and Systems That Impact Security
  • Impact on User Productivity
  • Impact on Developer Complexity
  • Conclusion
  • 6. Design for Understandability
  • Why Is Understandability Important?
  • System Invariants
  • Analyzing Invariants
  • Mental Models
  • Designing Understandable Systems
  • Complexity Versus Understandability
  • Breaking Down Complexity
  • Centralized Responsibility for Security and Reliability Requirements
  • System Architecture
  • Understandable Interface Specifications
  • Prefer narrow interfaces that offer less room for interpretation
  • Prefer interfaces that enforce a common object model
  • Pay attention to idempotent operations
  • Understandable Identities, Authentication, and Access Control
  • Identities
  • Example: Identity model for the Google production system
  • Authentication and transport security
  • Access control
  • Security Boundaries
  • Small TCBs and strong security boundaries
  • Security boundaries and threat models
  • TCBs and understandability
  • Software Design
  • Using Application Frameworks for Service-Wide Requirements
  • Understanding Complex Data Flows
  • Considering API Usability
  • Example: Secure cryptographic APIs and the Tink crypto framework
  • Conclusion
  • 7. Design for a Changing Landscape
  • Types of Security Changes
  • Designing Your Change
  • Architecture Decisions to Make Changes Easier
  • Keep Dependencies Up to Date and Rebuild Frequently
  • Release Frequently Using Automated Testing
  • Use Containers
  • Use Microservices
  • Example: Google's frontend design
  • Different Changes: Different Speeds, Different Timelines
  • Short-Term Change: Zero-Day Vulnerability
  • Example: Shellshock
  • Medium-Term Change: Improvement to Security Posture
  • Example: Strong second-factor authentication using FIDO security keys
  • Long-Term Change: External Demand
  • Example: Increasing HTTPS usage
  • Complications: When Plans Change
  • Example: Growing Scope-Heartbleed
  • Conclusion
  • 8. Design for Resilience
  • Design Principles for Resilience
  • Defense in Depth
  • The Trojan Horse
  • Threat modeling and vulnerability discovery
  • Deployment of the attack
  • Execution of the attack
  • Compromise
  • Google App Engine Analysis
  • Risky APIs
  • Runtime layers
  • Controlling Degradation
  • Differentiate Costs of Failures
  • Computing resources
  • User experience
  • Speed of mitigation
  • Deploy Response Mechanisms
  • Load shedding
  • Throttling
  • Automated response
  • Automate Responsibly
  • Failing safe versus failing secure
  • A foothold for humans
  • Controlling the Blast Radius
  • Role Separation
  • Location Separation
  • Aligning physical and logical architecture
  • Isolation of trust
  • Limitations of location-based trust
  • Isolation of confidentiality
  • Time Separation
  • Failure Domains and Redundancies
  • Failure Domains
  • Functional isolation
  • Data isolation
  • Practical aspects
  • Component Types
  • High-capacity components
  • High-availability components
  • Low-dependency components
  • Controlling Redundancies
  • Failover strategies
  • Common pitfalls
  • Continuous Validation
  • Validation Focus Areas
  • Validation in Practice
  • Inject anticipated changes of behavior
  • Exercise emergency components as part of normal workflows
  • Split when you cannot mirror traffic
  • Oversubscribe but prevent complacency
  • Measure key rotation cycles
  • Practical Advice: Where to Begin
  • Conclusion
  • 9. Design for Recovery
  • What Are We Recovering From?
  • Random Errors
  • Accidental Errors
  • Software Errors
  • Malicious Actions
  • Design Principles for Recovery
  • Design to Go as Quickly as Possible (Guarded by Policy)
  • Limit Your Dependencies on External Notions of Time
  • Rollbacks Represent a Tradeoff Between Security and Reliability
  • Deny lists
  • Minimum Acceptable Security Version Numbers
  • Rotating signing keys
  • Rolling back firmware and other hardware-centric constraints
  • Use an Explicit Revocation Mechanism
  • A centralized service to revoke certificates
  • Failing open
  • Handling emergencies directly
  • Removing dependency on accurate notions of time
  • Revoking credentials at scale
  • Avoiding risky exceptions
  • Know Your Intended State, Down to the Bytes
  • Host management
  • Device firmware
  • Global services
  • Persistent data
  • Design for Testing and Continuous Validation
  • Emergency Access
  • Access Controls
  • Communications
  • Responder Habits
  • Unexpected Benefits
  • Conclusion
  • 10. Mitigating Denial-of-Service Attacks
  • Strategies for Attack and Defense
  • Attacker's Strategy
  • Defender's Strategy
  • Designing for Defense
  • Defendable Architecture
  • Defendable Services
  • Mitigating Attacks
  • Monitoring and Alerting
  • Graceful Degradation
  • A DoS Mitigation System
  • Strategic Response
  • Dealing with Self-Inflicted Attacks
  • User Behavior
  • Client Retry Behavior
  • Conclusion
  • III. Implementing Systems
  • 11. Case Study: Designing, Implementing, and Maintaining a Publicly Trusted CA
  • Background on Publicly Trusted Certificate Authorities
  • Why Did We Need a Publicly Trusted CA?
  • The Build or Buy Decision
  • Design, Implementation, and Maintenance Considerations
  • Programming Language Choice
  • Complexity Versus Understandability
  • Securing Third-Party and Open Source Components
  • Testing
  • Resiliency for the CA Key Material
  • Data Validation
  • Conclusion
  • 12. Writing Code
  • Frameworks to Enforce Security and Reliability
  • Benefits of Using Frameworks
  • Example: Framework for RPC Backends
  • Example code snippets
  • Common Security Vulnerabilities
  • SQL Injection Vulnerabilities: TrustedSqlString
  • Preventing XSS: SafeHtml
  • Lessons for Evaluating and Building Frameworks
  • Simple, Safe, Reliable Libraries for Common Tasks
  • Rollout Strategy
  • Incremental rollout
  • Legacy conversions
  • Simplicity Leads to Secure and Reliable Code
  • Avoid Multilevel Nesting
  • Eliminate YAGNI Smells
  • Repay Technical Debt
  • Refactoring
  • Security and Reliability by Default
  • Choose the Right Tools
  • Use memory-safe languages
  • Use strong typing and static type checking
  • Use Strong Types
  • Sanitize Your Code
  • C++: Valgrind or Google Sanitizers
  • Go: Race Detector
  • Conclusion
  • 13. Testing Code
  • Unit Testing
  • Writing Effective Unit Tests
  • When to Write Unit Tests
  • How Unit Testing Affects Code
  • Integration Testing
  • Writing Effective Integration Tests
  • Dynamic Program Analysis
  • Fuzz Testing
  • How Fuzz Engines Work
  • Writing Effective Fuzz Drivers
  • An Example Fuzzer
  • Continuous Fuzzing
  • Example: ClusterFuzz and OSSFuzz
  • Static Program Analysis
  • Automated Code Inspection Tools
  • Integration of Static Analysis in the Developer Workflow
  • Abstract Interpretation
  • Formal Methods
  • Conclusion
  • 14. Deploying Code
  • Concepts and Terminology
  • Threat Model
  • Best Practices
  • Require Code Reviews
  • Rely on Automation
  • Verify Artifacts, Not Just People
  • Treat Configuration as Code
  • Securing Against the Threat Model
  • Advanced Mitigation Strategies
  • Binary Provenance
  • What to put in binary provenance
  • Provenance-Based Deployment Policies
  • Implementing policy decisions
  • Verifiable Builds
  • Verifiable build architectures
  • Implementing verifiable builds
  • Untrusted inputs
  • Unauthenticated inputs
  • Deployment Choke Points
  • Post-Deployment Verification
  • Practical Advice
  • Take It One Step at a Time
  • Provide Actionable Error Messages
  • Ensure Unambiguous Provenance
  • Create Unambiguous Policies
  • Include a Deployment Breakglass
  • Securing Against the Threat Model, Revisited
  • Conclusion
  • 15. Investigating Systems
  • From Debugging to Investigation
  • Example: Temporary Files
  • Debugging Techniques
  • Distinguish horses from zebras
  • Set aside time for debugging and investigations
  • Record your observations and expectations
  • Know what's normal for your system
  • Reproduce the bug
  • Isolate the problem
  • Be mindful of correlation versus causation
  • Test your hypotheses with actual data
  • Reread the docs
  • Practice!
  • What to Do When You're Stuck
  • Improve observability
  • Take a break
  • Clean up code
  • Delete it!
  • Stop when things start to go wrong
  • Improve access and authorization controls, even for nonsensitive systems
  • Collaborative Debugging: A Way to Teach
  • How Security Investigations and Debugging Differ
  • Collect Appropriate and Useful Logs
  • Design Your Logging to Be Immutable
  • Take Privacy into Consideration
  • Determine Which Security Logs to Retain
  • Operating system logs
  • Host agents
  • Application logs
  • Cloud logs
  • Network-based logging and detection
  • Budget for Logging
  • Robust, Secure Debugging Access
  • Reliability
  • Security
  • Conclusion
  • IV. Maintaining Systems
  • 16. Disaster Planning
  • Defining "Disaster"
  • Dynamic Disaster Response Strategies
  • Disaster Risk Analysis
  • Setting Up an Incident Response Team
  • Identify Team Members and Roles
  • Establish a Team Charter
  • Establish Severity and Priority Models
  • Define Operating Parameters for Engaging the IR Team
  • Develop Response Plans
  • Create Detailed Playbooks
  • Ensure Access and Update Mechanisms Are in Place
  • Prestaging Systems and People Before an Incident
  • Configuring Systems
  • Training
  • Processes and Procedures
  • Testing Systems and Response Plans
  • Auditing Automated Systems
  • Conducting Nonintrusive Tabletops
  • Testing Response in Production Environments
  • Single system testing/fault injection
  • Human resource testing
  • Multicomponent testing
  • System-wide failures/failovers
  • Red Team Testing
  • Evaluating Responses
  • Google Examples
  • Test with Global Impact
  • DiRT Exercise Testing Emergency Access
  • Industry-Wide Vulnerabilities
  • Conclusion
  • 17. Crisis Management
  • Is It a Crisis or Not?
  • Triaging the Incident
  • Compromises Versus Bugs
  • Taking Command of Your Incident
  • The First Step: Don't Panic!
  • Beginning Your Response
  • Establishing Your Incident Team
  • Operational Security
  • Trading Good OpSec for the Greater Good
  • The Investigative Process
  • Sharding the investigation
  • Keeping Control of the Incident
  • Parallelizing the Incident
  • Handovers
  • Morale
  • Communications
  • Misunderstandings
  • Hedging
  • Meetings
  • Keeping the Right People Informed with the Right Levels of Detail
  • Putting It All Together
  • Triage
  • Declaring an Incident
  • Communications and Operational Security
  • Beginning the Incident
  • Handover
  • Handing Back the Incident
  • Preparing Communications and Remediation
  • Closure
  • Conclusion
  • 18. Recovery and Aftermath
  • Recovery Logistics
  • Recovery Timeline
  • Planning the Recovery
  • Scoping the Recovery
  • Recovery Considerations
  • How will your attacker respond to your recovery effort?
  • Is your recovery infrastructure or tooling compromised?
  • What variants of the attack exist?
  • Will your recovery reintroduce attack vectors?
  • What are your mitigation options?
  • Recovery Checklists
  • Initiating the Recovery
  • Isolating Assets (Quarantine)
  • System Rebuilds and Software Upgrades
  • Data Sanitization
  • Recovery Data
  • Credential and Secret Rotation
  • After the Recovery
  • Postmortems
  • Examples
  • Compromised Cloud Instances
  • Large-Scale Phishing Attack
  • Targeted Attack Requiring Complex Recovery
  • Conclusion
  • V. Organization and Culture
  • 19. Case Study: Chrome Security Team
  • Background and Team Evolution
  • Security Is a Team Responsibility
  • Help Users Safely Navigate the Web
  • Speed Matters
  • Design for Defense in Depth
  • Be Transparent and Engage the Community
  • Conclusion
  • 20. Understanding Roles and Responsibilities
  • Who Is Responsible for Security and Reliability?
  • The Roles of Specialists
  • Understanding Security Expertise
  • Certifications and Academia
  • Integrating Security into the Organization
  • Embedding Security Specialists and Security Teams
  • Example: Embedding Security at Google
  • Special Teams: Blue and Red Teams
  • External Researchers
  • Conclusion
  • 21. Building a Culture of Security and Reliability
  • Defining a Healthy Security and Reliability Culture
  • Culture of Security and Reliability by Default
  • Culture of Review
  • Culture of Awareness
  • Culture of Yes
  • Culture of Inevitably
  • Culture of Sustainability
  • Changing Culture Through Good Practice
  • Align Project Goals and Participant Incentives
  • Reduce Fear with Risk-Reduction Mechanisms
  • Make Safety Nets the Norm
  • Increase Productivity and Usability
  • Overcommunicate and Be Transparent
  • Build Empathy
  • Convincing Leadership
  • Understand the Decision-Making Process
  • Build a Case for Change
  • Pick Your Battles
  • Escalations and Problem Resolution
  • Conclusion
  • Conclusion
  • A. A Disaster Risk Assessment Matrix
  • Index

Dateiformat: ePUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Bitte beachten Sie bei der Verwendung der Lese-Software Adobe Digital Editions: wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

40,49 €
inkl. 5% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe-DRM
siehe Systemvoraussetzungen
E-Book bestellen