Red Hat Enterprise Linux Troubleshooting Guide

 
 
Packt Publishing Limited
  • 1. Auflage
  • |
  • erschienen am 19. Oktober 2015
  • |
  • 458 Seiten
 
E-Book | ePUB mit Adobe DRM | Systemvoraussetzungen
978-1-78528-787-9 (ISBN)
 
Identify, capture and resolve common issues faced by Red Hat Enterprise Linux administrators using best practices and advanced troubleshooting techniquesAbout This BookDevelop a strong understanding of the base tools available within Red Hat Enterprise Linux (RHEL) and how to utilize these tools to troubleshoot and resolve real-world issuesGain hidden tips and techniques to help you quickly detect the reason for poor network/storage performanceTroubleshoot your RHEL to isolate problems using this example-oriented guide full of real-world solutionsWho This Book Is ForIf you have a basic knowledge of Linux from administration or consultant experience and wish to add to your Red Hat Enterprise Linux troubleshooting skills, then this book is ideal for you. The ability to navigate and use basic Linux commands is expected.What You Will LearnIdentify issues that need rapid resolution against long term root cause analysisDiscover commands for testing network connectivity such as telnet, netstat, ping, ip and curlSpot performance issues with commands such as top, ps, free, iostat, and vmstatUse tcpdump for traffic analysisRepair a degraded file system and rebuild a software raidIdentify and troubleshoot hardware issues using dmesgTroubleshoot custom applications with strace and knowledge of Linux resource limitationsIn DetailRed Hat Enterprise Linux is an operating system that allows you to modernize your infrastructure, boost efficiency through virtualization, and finally prepare your data center for an open, hybrid cloud IT architecture. It provides the stability to take on today's challenges and the flexibility to adapt to tomorrow's demands.In this book, you begin with simple troubleshooting best practices and get an overview of the Linux commands used for troubleshooting. The book will cover the troubleshooting methods for web applications and services such as Apache and MySQL. Then, you will learn to identify system performance bottlenecks and troubleshoot network issues; all while learning about vital troubleshooting steps such as understanding the problem statement, establishing a hypothesis, and understanding trial, error, and documentation. Next, the book will show you how to capture and analyze network traffic, use advanced system troubleshooting tools such as strace, tcpdump & dmesg, and discover common issues with system defaults.Finally, the book will take you through a detailed root cause analysis of an unexpected reboot where you will learn to recover a downed system.Style and approachThis is an easy-to-follow guide packed with examples of real-world core Linux concepts. All the topics are presented in detail while you're performing the actual troubleshooting steps.
  • Englisch
  • Birmingham
  • |
  • Großbritannien
978-1-78528-787-9 (9781785287879)
1785287877 (1785287877)
weitere Ausgaben werden ermittelt
Benjamin Cane has nearly 10 years of experience in Linux systems administration. His first systems administration role was in 2006. At that time, he worked for a web hosting company supporting thousands of FreeBSD and Linux systems.
Afterwards, he joined a managed services company that specialized in managing mission-critical systems. There, he worked his way to the position of a lead systems engineer, providing 24x7 support for highly critical enterprise systems that ran Red Hat Enterprise Linux.
Now, Benjamin is a systems architect. He focuses on building High and Continuous Availability environments within the financial services industry. He is also currently a Red Hat Certified Engineer and Certified Ethical Hacker.
With his experience in mission-critical environments, he has learned to identify and troubleshoot very complex issues quickly, because often these environments have a low tolerance for downtime. Being able to identify the root causes of very complex systems issues quickly is a skill that requires extensive knowledge of Linux and troubleshooting best practices.
In addition to this book, Benjamin writes about Linux systems administration and DevOps topics on his blog at http://bencane.com. He is also the project founder for Runbook (https://github.com/Runbook/runbook), an open source application designed to monitor and automatically resolve infrastructure and application issues.
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Table of Contents
  • Preface
  • Chapter 1: Troubleshooting Best Practices
  • Styles of troubleshooting
  • The Data Collector
  • The Educated Guesser
  • The Adaptor
  • Choosing the appropriate style
  • Troubleshooting steps
  • Understanding the problem statement
  • Asking questions
  • Attempting to duplicate the issue
  • Running investigatory commands
  • Establishing a hypothesis
  • Putting together patterns
  • Is this something that I've encountered before?
  • Trial and error
  • Start by creating a backup
  • Getting help
  • Books
  • Team Wikis or Runbooks
  • Google
  • Man pages
  • Red Hat kernel docs
  • People
  • Documentation
  • Root cause analysis
  • The anatomy of a good RCA
  • The problem as it was reported
  • The actual root cause of the problem
  • A timeline of events and actions taken
  • Any key data points to validate the root cause
  • A plan of action to prevent the incident from reoccurring
  • Establishing a root cause
  • Sometimes you must sacrifice a root cause analysis
  • Understanding your environment
  • Summary
  • Chapter 2: Troubleshooting Commands and Sources of Useful Information
  • Finding useful information
  • Log files
  • The default location
  • Common log files
  • Finding logs that are not in the default location
  • Configuration files
  • Default system configuration directory
  • Finding configuration files
  • The proc filesystem
  • Troubleshooting commands
  • Command-line basics
  • Command flags
  • The piping command output
  • Gathering general information
  • w - show who is logged on and what they are doing
  • rpm - RPM package manager
  • df - report file system space usage
  • free - display memory utilization
  • ps - report a snapshot of current running processes
  • Networking
  • ip - show and manipulate network settings
  • netstat - network statistics
  • Performance
  • iotop - a simple top-like I/O monitor
  • iostat - report I/O and CPU statistics
  • vmstat - report virtual memory statistics
  • sar - collect, report, or save system activity information
  • Summary
  • Chapter 3: Troubleshooting a Web Application
  • A small back story
  • The reported issue
  • Data gathering
  • Asking questions
  • Duplicating the issue
  • Understanding the environment
  • Where is this blog hosted?
  • Ok, it's within our environment
  • now what?
  • What services are installed and running?
  • Looking for error messages
  • Apache logs
  • Verifying the database
  • Verifying the WordPress database
  • Establishing a hypothesis
  • Resolving the issue
  • Understanding database data files
  • Finding the MariaDB data folder
  • Resolving data file issues
  • Validating
  • Final validation
  • Summary
  • Chapter 4: Troubleshooting Performance Issues
  • Performance issues
  • It's slow
  • Performance
  • Application
  • CPU
  • Top - a single command to look at everything
  • Determining the number of CPUs available
  • ps - Drill down deeper on individual processes with ps
  • Putting it all together
  • A quick look with top
  • Memory
  • free - Looking at free and used memory
  • Checking for oomkill
  • ps - Checking individual processes memory utilization
  • vmstat - Monitoring memory allocation and swapping
  • Putting it all together
  • Disk
  • iostat - CPU and device input/output statistics
  • Who is writing to these devices?
  • iotop - A top top-like command for disk i/o
  • Putting it all together
  • Network
  • ifstat - Review interface statistics
  • Quick review of what we have identified
  • Comparing historical metrics
  • sar - System activity report
  • CPU
  • Memory
  • Disk
  • Network
  • Review what we learned by comparing historical statistics
  • Summary
  • Chapter 5: Network Troubleshooting
  • Database connectivity issues
  • Data collection
  • Duplicating the issue
  • Finding the database server
  • Testing connectivity
  • Telnet from blog.example.com
  • Telnet from our laptop
  • Ping
  • Troubleshooting DNS
  • Checking DNS with dig
  • Looking up DNS with nslookup
  • What did dig and nslookup tell us?
  • DNS summary
  • Pinging from another location
  • Testing port connectivity with cURL
  • Showing current network connections with netstat
  • Using netstat to watch for new connections
  • Breakdown of netstat states
  • Capturing network traffic with tcpdump
  • Taking a look at the server's network interfaces
  • Specifying the interface with tcpdump
  • Reading the captured data
  • A quick primer on TCP
  • Reviewing collected data
  • Taking a look on the other side
  • Identifying the network configuration
  • Testing connectivity from db.example.com
  • Looking for connections with netstat
  • Tracing network connections with tcpdump
  • Routing
  • Viewing the routing table
  • Utilizing IP to show the routing table
  • Looking for routing misconfigurations
  • Hypothesis
  • Trial and error
  • Removing the invalid route
  • Configuration files
  • Summary
  • Chapter 6: Diagnosing and Correcting Firewall Issues
  • Diagnosing firewalls
  • Déjà vu
  • Troubleshooting from historic issues
  • Basic troubleshooting
  • Validating the MariaDB service
  • Troubleshooting with tcpdump
  • Understanding ICMP
  • Understanding connection rejections
  • A quick summary of what you have learned so far
  • Managing the Linux firewall with iptables
  • Verify that iptables is running
  • Show iptables rules being enforced
  • Understanding iptables rules
  • Ordering matters
  • Default policies
  • Breaking down the iptables rules
  • Putting the rules together
  • Viewing iptables counters
  • Correcting the iptables rule ordering
  • Summary
  • Chapter 7: Filesystem Errors and Recovery
  • Diagnosing filesystem errors
  • Read-only filesystems
  • Using the mount command to list mounted filesystems
  • A mounted filesystem
  • Using fdisk to list available partitions
  • Back to troubleshooting
  • NFS - Network Filesystem
  • NFS and network connectivity
  • Using the showmount command
  • NFS server configuration
  • Exploring /etc/exports
  • Identifying the current exports
  • Testing NFS from another client
  • Making mounts permanent
  • Unmounting the /mnt filesystem
  • Troubleshooting the NFS server, again
  • Finding the NFS log messages
  • Reading /var/log/messages
  • Read-only filesystems
  • Identifying disk issues
  • Recovering the filesystem
  • Unmounting the filesystem
  • Filesystem checks with fsck
  • The fsck and xfs filesystems
  • How do these tools repair a filesystem?
  • Mounting the filesystem
  • Repairing the other filesystems
  • Recovering the / (root) filesystem
  • Validation
  • Summary
  • Chapter 8: Hardware Troubleshooting
  • Starting with a log entry
  • What is a RAID?
  • RAID 0 - striping
  • RAID 1 - mirroring
  • RAID 5 - striping with distributed parity
  • RAID 6 - striping with double distributed parity
  • RAID 10 - mirrored and striped
  • Back to troubleshooting our RAID
  • How RAID recovery works
  • Checking the current RAID status
  • Summarizing the key information
  • Looking at md status with /proc/mdstat
  • Using both /proc/mdstat and mdadm
  • Identifying a bigger issue
  • Understanding /dev
  • More than just disk drives
  • Device messages with dmesg
  • Summarizing what dmesg has provided
  • Using mdadm to examine the superblock
  • Checking /dev/sdb2
  • What we have learned so far
  • Re-adding the drives to the arrays
  • Adding a new disk device
  • When disks are not added cleanly
  • Another way to watch the rebuild status
  • Summary
  • Chapter 9: Using System Tools to Troubleshoot Applications
  • Open source versus home-grown applications
  • When the application won't start
  • Exit codes
  • Is the script failing, or the application?
  • A wealth of information in the configuration file
  • Watching log files during startup
  • Checking whether the application is already running
  • Checking open files
  • Understanding file descriptors
  • Getting back to the lsof output
  • Using lsof to check whether we have a previously running process
  • Finding out more about the application
  • Tracing an application with strace
  • What is a system call?
  • Using strace to identify why the application will not start
  • Resolving the conflict
  • Summary
  • Chapter 10: Understanding Linux User and Kernel Limits
  • A reported issue
  • Why is the job failing?
  • Background questions
  • Is the cron job even running?
  • User crontabs
  • Understanding user limits
  • The file size limit
  • The max user processes limit
  • The open files limit
  • Changing user limits
  • The limits.conf file
  • Future proofing the scheduled job
  • Running the job again
  • Kernel tunables
  • Finding the kernel parameter for open files
  • Changing kernel tunables
  • Permanently changing a tunable
  • Temporarily changing a tunable
  • Running the job one last time
  • A look back
  • Too many open files
  • A bit of clean up
  • Summary
  • Chapter 11: Recovering from Common Failures
  • The reported problem
  • Is Apache really down?
  • Why is it down?
  • What else was happening at that time?
  • Searching the messages log
  • Breaking down this useful one-liner
  • The uniq command
  • Tying it all together
  • What happens when a Linux system runs out of memory?
  • Minimum free memory
  • How oom-kill works
  • Determining whether our process was killed by oom-kill
  • Why did the system run out of memory?
  • Resolving the issue in the long-term and short-term
  • Long-term resolution
  • Short-term resolution
  • Summary
  • Chapter 12: Root Cause Analysis of an Unexpected Reboot
  • A late night alert
  • Identifying the issue
  • Did someone reboot this server?
  • What do the logs tell us?
  • Learning about new processes and services
  • What caused the high load average?
  • What are the run queue and load average?
  • Load average
  • Investigating the filesystem being full
  • The du command
  • Why wasn't the queue directory processed?
  • A checkpoint on what you learned
  • Sometimes you cannot prove everything
  • Preventing reoccurrence
  • Immediate action
  • Long-term actions
  • A sample Root Cause Analysis
  • Problem summary
  • Problem details
  • Root cause
  • Action plan
  • Further actions to be taken
  • Summary
  • Index

Dateiformat: EPUB
Kopierschutz: Adobe-DRM (Digital Rights Management)

Systemvoraussetzungen:

Computer (Windows; MacOS X; Linux): Installieren Sie bereits vor dem Download die kostenlose Software Adobe Digital Editions (siehe E-Book Hilfe).

Tablet/Smartphone (Android; iOS): Installieren Sie bereits vor dem Download die kostenlose App Adobe Digital Editions (siehe E-Book Hilfe).

E-Book-Reader: Bookeen, Kobo, Pocketbook, Sony, Tolino u.v.a.m. (nicht Kindle)

Das Dateiformat EPUB ist sehr gut für Romane und Sachbücher geeignet - also für "fließenden" Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein "harter" Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.

Weitere Informationen finden Sie in unserer E-Book Hilfe.


Download (sofort verfügbar)

43,65 €
inkl. 19% MwSt.
Download / Einzel-Lizenz
ePUB mit Adobe DRM
siehe Systemvoraussetzungen
E-Book bestellen

Unsere Web-Seiten verwenden Cookies. Mit der Nutzung dieser Web-Seiten erklären Sie sich damit einverstanden. Mehr Informationen finden Sie in unserem Datenschutzhinweis. Ok