
Survival Models and Data Analysis
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions



Persons
About the authors REGINA C. ELANDT-JOHNSON has been Professor of Biostatistics at the University of North Carolina at Chapel Hill since 1964. She is the author of Probability Models and Statistical Methods in Genetics (Wiley, 1971). Dr. Elandt-Johnson received her Ph.D. in statistics from Poznan Agricultural University in 1957.
NORMAN L. JOHNSON is Alumni Distinguished Professor at the University of North Carolina at Chapel Hill. Dr. Johnson served as Chairman of the Fisher Memorial Lecture Committee, American Statistical Association from 1976 until 1979. He is co-author of Distributions in Statistics (Wiley, 1969-1972); URN Models and their Applications (Wiley, 1977); and Statistics and Experimental Design in Engineering and Physical Sciences (Wiley, 1977). Dr. Johnson received his D.Sc. in statistics from University College, London in 1963.
Content
CHAPTER 1
Survival Data
1.1 SCOPE OF THE BOOK
The title of this book indicates that we discuss the treatment of "mortality data." The direct meaning of this term is data that arise from recording times of death of individuals in a specified group. There will usually be additional data from observations of characters (other than survival or death) on the individuals in the group. These may be made at or near the moment of death (e.g., cause of death, length of illness, physical characteristics near the moment of death) or at earlier times (e.g., sex, age, family history, physical characteristics at earlier epochs). Certain of these variables-most commonly age (time elapsed since birth) and/or time elapsed since other important events (e.g., commencement of illness, date of operation)-are regarded as being of primary interest. It is often desired to assess the relationship between mortality and these primary variables, allowing, as far as possible for some of the other characteristics. The latter, in this context, are called concomitant variables. (Note that, for a given set of data, the distinction between primary and concomitant variables depends on the relationships to be studied.)
Individuals in the group may be humans, animals, fishes, insects, and so on. The group itself may be defined in various ways-by geographical location (e.g., population of a town or state, patients in a hospital or in a set of hospitals), by previous history (e.g., medical treatment, type of sickness, employment).
Occasionally we consider situations in which the replacement of "mortality" by the more general term "failure" is appropriate. In such contexts, the individuals are not necessarily (although they may be) living organisms. They may, for example, be mass-produced articles, such as electric lamps, with failure meaning inability to function in a specified role.
We are not primarily concerned with reversible changes of status, such as sickness causing temporary inability to work or repairable failure of electrical or mechanical systems. However there are occasional references to these matters, and Chapter 14 is devoted to discussing the distribution of age of onset of a disease.
Also, we are not concerned with statistics of birth, except as defining entry into a specific group of individuals and contributing to the assessment of mortality at juvenile ages. In particular, we do not study the measurement of fertility or the general province of demography.
Primarily, we are concerned with the study of failure data, and the relation of failure to a few important variables, such as age or time elapsed since some event (other than birth or manufacture). Other variables (concomitant variables) are introduced because of a possible relationship with failure but are not studied for their own sake.
1.2 SOURCES OF DATA
From the foregoing description, it can be seen that the methods discussed are applicable to a wide variety of situations. The sources of data are correspondingly varied. We first describe sources of mortality data, later turning to the topic of failure data in general.
A major subdivision of mortality data is between data relating to populations under more or less uncontrolled conditions (such as statistics of human deaths in a state or nation) and those observed under controlled conditions of a more or less experimental nature (as in a clinical trial).
Usually, the amount of data collected in the former situation is considerably greater than in the latter, though this need not be so. On the other hand, we almost always have more detailed information on each individual exposed to risk in the latter situation. In fact, in the uncontrolled situation we rarely have an exact enumeration of all the individuals who might be observed to fail (those exposed to risk). (A more precise discussion and definition of exposed to risk can be found in Chapter 2.)
When the date of death is recorded in a specific area over a specific period of time, estimates of the number exposed to risk are usually based on census data. For convenience, we use the term census-type data generally to describe data in which the numbers exposed to risk are estimated indirectly. When records are available from which the numbers exposed to risk can be ascertained directly we, again for convenience, use the term experimental-type data. Sometimes these terms may not appear to be very relevant to the data actually under consideration. Their function is to remind ourselves what type of data we are considering.
As we have already mentioned, experimental-type data are usually considerably smaller in volume than census-type data. An exception arises in the mortality experience of insurance companies. The records of such companies contain information on all persons insured with them, from which it is possible to determine exactly the exposed to risk, among whom the deaths (resulting in claims) are also recorded. For a given year of age, the numbers exposed to risk can quite easily be in the tens of thousands or more, and in practice some approximations to the exposed to risk may be used, corresponding to various groupings of ages (according to last birthday or nearest birthday), dates of entry, withdrawal, and death.
1.3 TYPES OF VARIABLES
We have already introduced the concept of concomitant variables in Section 1.1. Here we examine our classification of variables in somewhat greater detail.
The basic variable, representing survival or failure, is essentially a variable taking just two values (a binary variable) that can be chosen arbitrarily and are usually, and conveniently, taken to be 0 and 1. It can be measured by direct counting, as in experimental-type situations, or by indirect estimation, as in some census-type situations. We are most often concerned with studying the proportion of individuals surviving specified periods as a function of a few important variables. By far the most important variable is age, although in clinical trials, duration since an event such as initiation of treatment is sometimes taken as the "variable of interest."
Thus life tables (which will be discussed in Chapter 4) usually represent the pattern of mortality (or failure) as a function of age, for particular groups of individuals, but this is not always the case. For human or animal populations, the time since a specific event may be used as the variable of interest. Occasionally, as in select life tables (see Sections 4.12-4.14), both age and time since a specific event are variables of interest.
The remaining measured variables, beyond the basic (survival) variable and the variable(s) of interest are concomitant variables. In so far as they do affect the mortality (failure) pattern, it is desirable to allow for them. This may be done analytically, (1) by introducing some fairly simple model that (it is hoped) will represent adequately the influence of concomitant variables, or (2) by constructing separate life tables for different values of the concomitant variable(s). The latter is the safer method, but can be usefully applied only when the concomitant variable(s) can be defined in terms of very few categories. An important example, in living populations, is sex. It is quite common to have separate life tables for females and males.
Other important concomitant variables are geographical location, social class (often measured as an index based on income, education, etc.), and physical characteristics such as blood pressure, weight, and vital capacity. This last group is especially relevant in most clinical trial data.
In mechanical and electrical systems, the variable age (duration of effective service) is again of major importance. Other variables, mainly representing conditions of use, include temperature, pressure, operator training, chemical content of contact materials, and so on.
In particular, we study (in Chapter 13) the relationship between failure and one or more variables of interest. Age is very often the variable of interest, but in clinical trials time since some specified event, other than birth, is usually the variable of interest. There is a wide variety of other concomitant variables that may need attention.
A concomitant variable of some importance is chronological year (e.g., 1965, 1975). Although not always recognized as such, its importance is acknowledged, for example, in national life tables which always relate to a specific period of time (e.g., U. S. Life Tables for 1959-1961, 1969-1971, etc.).
1.4 EXPOSURE TO RISK
In most analyses of survival data we are interested in studying the proportions of failure among groups of individuals under specified conditions. Clearly, the longer the period for which an individual is under observation, the more likely it is that failure will be observed, sooner or later. Comparability of numbers of failures requires that they be referred to some unit period of observation. To do this, we would like to know, for each individual, the period of exposure to risk, that is, the period of time during which the failure (or death) of an individual will actually be recorded and contribute to the observed failures. For census-type data this period is usually not known and has to be estimated. For experimental-type data, information from which the period of exposure to risk can be determined for each individual is usually available, although when the volume of data is large, approximate evaluation may be used.
1.5 USE OF PROBABILITY THEORY
Since we are studying proportions, it is natural to represent them in terms of probabilities. We are then able to use...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.