
Administrative Records for Survey Methodology
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Addresses the international use of administrative records for large-scale surveys, censuses, and other statistical purposes
Administrative Records for Survey Methodology is a comprehensive guide to improving the quality, cost-efficiency, and interpretability of surveys and censuses using administrative data research. Contributions from a team of internationally-recognized experts provide practical approaches for integrating administrative data in statistical surveys, and discuss the methodological issues--including concerns of privacy, confidentiality, and legality--involved in collecting and analyzing administrative records. Numerous real-world examples highlight technological and statistical innovations, helping readers gain a better understanding of both fundamental methods and advanced techniques for controlling data quality reducing total survey error.
Divided into four sections, the first describes the basics of administrative records research and addresses disclosure limitation and confidentiality protection in linked data. Section two focuses on data quality and linking methodology, covering topics such as quality evaluation, measuring and controlling for non-consent bias, and cleaning and using administrative lists. The third section examines the use of administrative records in surveys and includes case studies of the Swedish register-based census and the administrative records applications used for the US 2020 Census. The book's final section discusses combining administrative and survey data to improve income measurement, enhancing health surveys with data linkage, and other uses of administrative data in evidence-based policymaking. This state-of-the-art resource:
* Discusses important administrative data issues and suggests how administrative data can be integrated with more traditional surveys
* Describes practical uses of administrative records for evidence-driven decisions in both public and private sectors
* Emphasizes using interdisciplinary methodology and linking administrative records with other data sources
* Explores techniques to leverage administrative data to improve the survey frame, reduce nonresponse follow-up, assess coverage error, measure linkage non-consent bias, and perform small area estimation.
Administrative Records for Survey Methodology is an indispensable reference and guide for statistical researchers and methodologists in academia, industry, and government, particularly census bureaus and national statistical offices, and an ideal supplemental text for undergraduate and graduate courses in data science, survey methodology, data collection, and data analysis methods.
More details
Other editions
Additional editions


Persons
Asaph Young Chun, PhD, is Director-General, Statistics Research Institute, Statistics Korea, Republic of Korea.
Michael D. Larsen, PhD, is Professor and Chair, Department of Mathematics and Statistics, Saint Michael's College, Vermont, USA.
Gabriele Durrant, PhD, is Professor, Department of Social Statistics and Demography, University of Southampton, UK.
Jerome P. Reiter, PhD, is Professor and Chair, Department of Statistical Science, Duke University, North Carolina, USA.
Content
Preface xv
Acknowledgments xxi
List of Contributors xxiii
Part I Fundamentals of Administrative Records Research and Applications 1
1 On the Use of Proxy Variables in Combining Register and Survey Data 3
Li-Chun Zhang
1.1 Introduction 3
1.1.1 A Multisource Data Perspective 3
1.1.2 Concept of Proxy Variable 5
1.2 Instances of Proxy Variable 7
1.2.1 Representation 7
1.2.2 Measurement 10
1.3 Estimation Using Multiple Proxy Variables 12
1.3.1 Asymmetric Setting 13
1.3.2 Uncertainty Evaluation: A Case of Two-Way Data 15
1.3.3 Symmetric Setting 17
1.4 Summary 20
References 20
2 Disclosure Limitation and Confidentiality Protection in Linked Data 25
John M. Abowd, Ian M. Schmutte, and Lars Vilhuber
2.1 Introduction 25
2.2 Paradigms of Protection 27
2.2.1 Input Noise Infusion 29
2.2.2 Formal Privacy Models 30
2.3 Confidentiality Protection in Linked Data: Examples 32
2.3.1 HRS-SSA 32
2.3.1.1 Data Description 32
2.3.1.2 Linkages to Other Data 32
2.3.1.3 Disclosure Avoidance Methods 33
2.3.2 SIPP-SSA-IRS (SSB) 34
2.3.2.1 Data Description 34
2.3.2.2 Disclosure Avoidance Methods 35
2.3.2.3 Disclosure Avoidance Assessment 35
2.3.2.4 Analytical Validity Assessment 37
2.3.3 LEHD: Linked Establishment and Employee Records 38
2.3.3.1 Data Description 38
2.3.3.2 Disclosure Avoidance Methods 39
2.3.3.3 Disclosure Avoidance Assessment for QWI 41
2.3.3.4 Analytical Validity Assessment for QWI 42
2.4 Physical and Legal Protections 43
2.4.1 Statistical Data Enclaves 44
2.4.2 Remote Processing 46
2.4.3 Licensing 46
2.4.4 Disclosure Avoidance Methods 47
2.4.5 Data Silos 48
2.5 Conclusions 49
2.A.1 Other Abbreviations 51
2.A.2 Concepts 52
Acknowledgments 54
References 54
Part II Data Quality of Administrative Records and Linking Methodology 61
3 Evaluation of the Quality of Administrative Data Used in the Dutch Virtual Census 63
Piet Daas, Eric S. Nordholt, Martijn Tennekes, and Saskia Ossen
3.1 Introduction 63
3.2 Data Sources and Variables 64
3.3 Quality Framework 66
3.3.1 Source and Metadata Hyper Dimensions 66
3.3.2 Data Hyper Dimension 68
3.4 Quality Evaluation Results for the Dutch 2011 Census 69
3.4.1 Source and Metadata: Application of Checklist 69
3.4.2 Data Hyper Dimension: Completeness and Accuracy Results 72
3.4.2.1 Completeness Dimension 73
3.4.2.2 Accuracy Dimension 75
3.4.2.3 Visualizing with a Tableplot 78
3.4.3 Discussion of the Quality Findings 80
3.5 Summary 81
3.6 Practical Implications for Implementation with Surveys and Censuses 81
3.7 Exercises 82
References 82
4 Improving Input Data Quality in Register-Based Statistics: The Norwegian Experience 85
Coen Hendriks
4.1 Introduction 85
4.2 The Use of Administrative Sources in Statistics Norway 86
4.3 Managing Statistical Populations 89
4.4 Experiences from the First Norwegian Purely Register-Based Population and Housing Census of 2011 91
4.5 The Contact with the Owners of Administrative Registers Was Put into System 93
4.5.1 Agreements on Data Processing 93
4.5.2 Agreements of Cooperation on Data Quality in Administrative Data Systems 95
4.5.3 The Forums for Cooperation 96
4.6 Measuring and Documenting Input Data Quality 96
4.6.1 Quality Indicators 96
4.6.2 Operationalizing the Quality Checks 97
4.6.3 Quality Reports 99
4.6.4 The Approach is Being Adopted by the Owners of Administrative Data 99
4.7 Summary 100
4.8 Exercises 101
References 104
5 Cleaning and Using Administrative Lists: Enhanced Practices and Computational Algorithms for Record Linkage and Modeling/Editing/Imputation 105
William E. Winkler
5.1 Introductory Comments 105
5.1.1 Example 1 105
5.1.2 Example 2 106
5.1.3 Example 3 107
5.2 Edit/Imputation 108
5.2.1 Background 108
5.2.2 Fellegi-Holt Model 110
5.2.3 Imputation Generalizing Little-Rubin 110
5.2.4 Connecting Edit with Imputation 111
5.2.5 Achieving Extreme Computational Speed 112
5.3 Record Linkage 113
5.3.1 Fellegi-Sunter Model 113
5.3.2 Estimating Parameters 116
5.3.3 Estimating False Match Rates 118
5.3.3.1 The Data Files 118
5.3.4 Achieving Extreme Computational Speed 123
5.4 Models for Adjusting Statistical Analyses for Linkage Error 124
5.4.1 Scheuren-Winkler 124
5.4.2 Lahiri-Larsen 125
5.4.3 Chambers and Kim 127
5.4.4 Chipperfield, Bishop, and Campbell 128
5.4.4.1 Empirical Data 130
5.4.5 Goldstein, Harron, and Wade 132
5.4.6 Hof and Zwinderman 133
5.4.7 Tancredi and Liseo 133
5.5 Concluding Remarks 133
5.6 Issues and Some Related Questions 134
References 134
6 Assessing Uncertainty When Using Linked Administrative Records 139
Jerome P. Reiter
6.1 Introduction 139
6.2 General Sources of Uncertainty 140
6.2.1 Imperfect Matching 140
6.2.2 Incomplete Matching 141
6.3 Approaches to Accounting for Uncertainty 142
6.3.1 Modeling Matching Matrix as Parameter 143
6.3.2 Direct Modeling 146
6.3.3 Imputation of Entire Concatenated File 148
6.4 Concluding Remarks 149
6.4.1 Problems to Be Solved 149
6.4.2 Practical Implications 150
6.5 Exercises 150
Acknowledgment 151
References 151
7 Measuring and Controlling for Non-Consent Bias in Linked Survey and Administrative Data 155
Joseph W. Sakshaug
7.1 Introduction 155
7.1.1 What is Linkage Consent? Why is Linkage Consent Needed? 155
7.1.2 Linkage Consent Rates in Large-Scale Surveys 156
7.1.3 The Impact of Linkage Non-Consent Bias on Survey Inference 158
7.1.4 The Challenge of Measuring and Controlling for Linkage Non-Consent Bias 158
7.2 Strategies for Measuring Linkage Non-Consent Bias 159
7.2.1 Formulation of Linkage Non-Consent Bias 159
7.2.2 Modeling Non-Consent Using Survey Information 160
7.2.3 Analyzing Non-Consent Bias for Administrative Variables 162
7.3 Methods for Minimizing Non-Consent Bias at the Survey Design Stage 163
7.3.1 Optimizing Linkage Consent Rates 163
7.3.2 Placement of the Consent Request 163
7.3.3 Wording of the Consent Request 165
7.3.4 Active and Passive Consent Procedures 166
7.3.5 Linkage Consent in Panel Studies 167
7.4 Methods for Minimizing Non-Consent Bias at the Survey Analysis Stage 168
7.4.1 Controlling for Linkage Non-Consent Bias via Statistical Adjustment 169
7.4.2 Weighting Adjustments 169
7.4.3 Imputation 170
7.5 Summary 172
7.5.1 Key Points for Measuring Linkage Non-Consent Bias 172
7.5.2 Key Points for Controlling for Linkage Non-Consent Bias 172
7.6 Practical Implications for Implementation with Surveys and Censuses 173
7.7 Exercises 174
References 174
Part III Use of Administrative Records in Surveys 179
8 A Register-Based Census: The Swedish Experience 181
Martin Axelson, Anders Holmberg, Ingegerd Jansson, and Sara Westling
8.1 Introduction 181
8.2 Background 182
8.3 Census 2011 183
8.4 A Register-Based Census 185
8.4.1 Registers at Statistics Sweden 185
8.4.2 Facilitating a System of Registers 186
8.4.3 Introducing a Dwelling Identification Key 187
8.4.4 The Census Household and Dwelling Populations 188
8.5 Evaluation of the Census 190
8.5.1 Introduction 190
8.5.2 Evaluating Household Size and Type 192
8.5.2.1 Sampling Design 192
8.5.2.2 Data Collection 193
8.5.2.3 Reconciliation 194
8.5.2.4 Results 194
8.5.3 Evaluating Ownership 195
8.5.4 Lessons Learned 198
8.6 Impact on Population and Housing Statistics 199
8.7 Summary and Final Remarks 201
References 203
9 Administrative Records Applications for the 2020 Census 205
Vincent T. Mule Jr, and Andrew Keller
9.1 Introduction 205
9.2 Administrative Record Usage in the U.S. Census 206
9.3 Administrative Record Integration in 2020 Census Research 207
9.3.1 Administrative Record Usage Determinations 207
9.3.2 NRFU Design Incorporating Administrative Records 208
9.3.3 Administrative Records Sources and Data Preparation 210
9.3.4 Approach to Determine Administrative Record Vacant Addresses 212
9.3.5 Extension of Vacant Methodology to Nonexistent Cases 214
9.3.6 Approach to Determine Occupied Addresses 215
9.3.7 Other Aspects and Alternatives of Administrative Record Enumeration 217
9.4 Quality Assessment 219
9.4.1 Microlevel Evaluations of Quality 219
9.4.2 Macrolevel Evaluations of Quality 221
9.5 Other Applications of Administrative Record Usage 224
9.5.1 Register-Based Census 224
9.5.2 Supplement Traditional Enumeration with Adjustments for Estimated Error for Official Census Counts 224
9.5.3 Coverage Evaluation 225
9.6 Summary 226
9.7 Exercises 227
References 228
10 Use of Administrative Records in Small Area Estimation 231
Andreea L. Erciulescu, Carolina Franco, and Partha Lahiri
10.1 Introduction 231
10.2 Data Preparation 233
10.3 Small Area Estimation Models for Combining Information 238
10.3.1 Area-level Models 238
10.3.2 Unit-level Models 247
10.4 An Application 252
10.5 Concluding Remarks 259
10.6 Exercises 259
Acknowledgments 261
References 261
Part IV Use of Administrative Data in Evidence-Based Policymaking 269
11 Enhancement of Health Surveys with Data Linkage 271
Cordell Golden and Lisa B. Mirel
11.1 Introduction 271
11.1.1 The National Center for Health Statistics (NCHS) 271
11.1.2 The NCHS Data Linkage Program 272
11.1.3 Initial Linkages with NCHS Surveys 272
11.2 Examples of NCHS Health Surveys that Were Enhanced Through Linkage 273
11.2.1 National Health Interview Survey (NHIS) 273
11.2.2 National Health and Nutrition Examination Survey (NHANES) 274
11.2.3 National Health Care Surveys 274
11.3 NCHS Health Surveys Linked with Vital Records and Administrative Data 275
11.3.1 National Death Index (NDI) 276
11.3.2 Centers for Medicare and Medicaid Services (CMS) 276
11.3.3 Social Security Administration (SSA) 277
11.3.4 Department of Housing and Urban Development (HUD) 277
11.3.5 United States Renal Data System and the Florida Cancer Data System 278
11.4 NCHS Data Linkage Program: Linkage Methodology and Processing Issues 278
11.4.1 Informed Consent in Health Surveys 278
11.4.2 Informed Consent for Child Survey Participants 279
11.4.3 Adaptive Approaches to Linking Health Surveys with Administrative Data 280
11.4.4 Use of Alternate Records 281
11.4.5 Protecting the Privacy of Health Survey Participants and Maintaining Data Confidentiality 282
11.4.6 Updates Over Time 283
11.5 Enhancements to Health Survey Data Through Linkage 284
11.6 Analytic Considerations and Limitations of Administrative Data 286
11.6.1 Adjusting Sample Weights for Linkage-Eligibility 287
11.6.2 Residential Mobility and Linkages to State Programs and Registries 288
11.7 Future of the NCHS Data Linkage Program 289
11.8 Exercises 291
Acknowledgments 292
Disclaimer 292
References 292
12 Combining Administrative and Survey Data to Improve Income Measurement 297
Bruce D. Meyer and Nikolas Mittag
12.1 Introduction 297
12.2 Measuring and Decomposing Total Survey Error 299
12.3 Generalized Coverage Error 302
12.4 Item Nonresponse and Imputation Error 305
12.5 Measurement Error 307
12.6 Illustration: Using Data Linkage to Better Measure Income and Poverty 311
12.7 Accuracy of Links and the Administrative Data 312
12.8 Conclusions 315
12.9 Exercises 316
Acknowledgments 317
References 317
13 Combining Data from Multiple Sources to Define a Respondent: The Case of Education Data 323
Peter Siegel, Darryl Creel, and James Chromy
13.1 Introduction 323
13.1.1 Options for Defining a Unit Respondent When Data Exist from Sources Instead of or in Addition to an Interview 324
13.1.2 Concerns with Defining a Unit Respondent Without Having an Interview 325
13.2 Literature Review 326
13.3 Methodology 327
13.3.1 Computing Weights for Interview Respondents and for Unit Respondents Who May Not Have Interview Data (Usable Case Respondents) 327
13.3.1.1 How Many Weights Are Necessary? 328
13.3.2 Imputing Data When All or Some Interview Data Are Missing 328
13.3.3 Conducting Nonresponse Bias Analyses to Appropriately Consider Interview and Study Nonresponse 329
13.4 Example of Defining a Unit Respondent for the National Postsecondary Student Aid Study (NPSAS) 330
13.4.1 Overview of NPSAS 330
13.4.2 Usable Case Respondent Approach 333
13.4.2.1 Results 333
13.4.3 Interview Respondent Approach 335
13.4.3.1 Results 336
13.4.4 Comparison of Estimates, Variances, and Nonresponse Bias Using Two Approaches to Define a Unit Respondent 338
13.5 Discussion: Advantages and Disadvantages of Two Approaches to Defining a Unit Respondent 340
13.5.1 Interview Respondents 340
13.5.2 Usable Case Respondents 341
13.6 Practical Implications for Implementation with Surveys and Censuses 342
13.A Appendix 343
13.A.1 NPSAS:08 Study Respondent Definition 343
13.B Appendix 343
References 348
Index 349
Preface
Sample surveys are used by governments to describe the populations of their countries and provide estimates for use in policy decision making. Surveys can focus on individuals, households, businesses, students and schools, patients and hospitals, plots of land, or other entities. For surveys to be useful for official purposes they must cover the target population, represent the entirety of the population, collect information on key variables with accurate measurement methods, and have large enough sample sizes so that estimates are sufficiently precise at national and subnational levels. Achieving these four goals in a nationwide sample survey with a limited budget while being conducted in a short time interval is very challenging. The purpose of this book is to explore developments in the use of administrative records for improving sample surveys.
Sample surveys aim to gather information on a population. The target population is the specific part of the population that one aims to survey. Some parts of the broader population typically are excluded from the target population based on contact mode, data collection mode, the survey frame or list, or convenience. Individuals without a regular address, residing in some forms of group quarters, or without phone or Internet access, for example, might be effectively ineligible to serve as respondents. Survey frames record contact information and some other variables on members of a population, but of course they do not necessarily include all members of the population and have up-to-date information on everyone. Some individuals with accurate contact information in the frame will prove harder than others to contact or even refuse to participate. Surveys then are potentially limited to reporting about respondents and the population to which they are similar. Surveys cannot be overly long or else they risk deterring potential respondents and costing a lot of money per respondent. As a result, surveys can accommodate only so many questions. Self-report and less detailed questions, with their inherent limitations, for sensitive and complex items, often must be used for expediency. Budgets for national surveys compete with other government interests. Even large surveys typically have smaller-than-desired sample sizes in local areas and in subsets of the population. Despite these significant challenges, official statistical agencies around the world gather critically useful data on a myriad of topics.
The conditions for conducting sample surveys have changed immensely in the past 100 years. There is little chance that change will slow down. In-person surveys have been replaced and augmented by surveys by mail, by phone, and by Internet. Contact and data collection via multiple modes now are standard. The social environment, too, has evolved. Response rates are lower. Despite technological advances, people are increasingly busy. Official government surveys compete for attention with ever-more marketing and polling. Concerns over privacy and confidentiality have been elevated, rightly so, in the public consciousness. Simultaneously, government, researchers, and the public want more from data and surveys. Official surveys contribute to identifying challenges and to improvements in society. It is not practical, or maybe even possible, to get more out of old ways of conducting surveys.
Administrative records in a general sense are records kept for administrative purposes of the government. Administrative records can pertain to almost all aspects of life, including taxes, wages, education, health, residence, voting, crime, and property and business ownership. Does an individual have a license for a dog, for fishing at public lakes, to drive a car or motorcycle, or to own a gun? Does an individual receive public assistance through a government program? Administrative records, essential for government operations, contain a wealth of information on large segments of the population, but there are limitations. The records contain information on only some variables on subsets of the overall population. Information is collected so that a government can execute its program, but not typically for other purposes. Additional variables that might be interesting for study purposes likely are not recorded. Methods of recording variables might not be those that would be used in a scientific study. Those included in an administrative data file are not a random sample from the population. Some administrative records are collected over the course of several months or years, instead of only during a succinct time interval.
The use of administrative records has been part of the survey process for many decades. Survey textbooks since at least the 1960s (Cochran 1977; Kish 1967; Hansen, Hurwitz, and Madow 1953; Särndal, Swensson, and Wretman 1992) present methods for using auxiliary variables. It typically is assumed that values of auxiliary variables are available for all members of the population without error, or at least that aggregate totals are known. They might have come from a census, from a large survey at a previous time, or as part of the sample frame. Auxiliary variables are used for stratified surveys, probability proportional to size sampling, difference estimation, and ratio estimation. Often, they are treated in classic literature as known, fixed values.
Despite the limitations of administrative records, researchers, including the authors in this book, have been exploring how "adrecs" can be used to improve sample surveys in today's world and build on the record of past successes. They have examined new possibilities for using administrative record information to address four goals (coverage, response, variables, and accuracy) of official surveys. Increasing timeliness and decreasing costs through use of administrative records also are of continuing interest.
The book is organized into four sections. The first section contains two chapters. Chapter 1, by Li-Chun Zhang, presents fundamental challenges and approaches to integrating survey and administrative data for statistical purposes. The chapter focuses on administrative data, also called register or registry data, as a source for proxy variables. The proxy variables obtained from administrative sources can, for example, enhance a survey by providing additional information, be used for quality assessment of responses, and provide substitutes for missing values. Chapter 2, by John Marion Abowd, Ian Schmutte, and Lars Vilhuber addresses confidentiality protection and disclosure limitation in linked data. Linking data on population elements is an essential step for many uses of administrative records in conjunction with survey data. If individuals from a survey can be located uniquely in administrative records, then variables in those administrative records can be meaningfully associated with their originating units, thereby generating useful proxy variables. Data files from surveys, both from those linked to administrative information and those not, are made available to researchers and policy analysts. In standard practice, values of personally identifying information, such as names, fine-level geographic information including addresses, birthdates, and identification numbers, are suppressed. A data file containing a rich set of variables for analysis, however, increases the chance that someone could identify a unique individual from the survey in the population based on the values for several variables. The concern is that such an identification violates legal promises of confidentiality, causes harm to individuals who view their survey responses and administrative information as sensitive, and endangers future survey operations. Chapter 2 describes three applications, traditional statistical disclosure limitation methods, and new developments. The article includes discussion of how researchers access data (access modalities) and the usefulness (analytic validity) of data made available after modification for enhanced disclosure limitation.
Section 2 groups together five chapters on data quality and record linkage. Chapter 3, by Piet Daas, Eric Schulte Nordholt, Martjin Tennekes, and Saskia Ossen, examines the quality of administrative data used in the Dutch virtual census. A challenge in assessing quality of a data source is having better information on some variables for at least a subset of the population. Coen Hendriks, in Chapter 4, reports on improving the quality of data going into Norwegian register-based statistics. In Chapter 5, William Winkler considers a wide range of topics from initial cleaning of data files, record linkage, and integrated modeling, editing, and imputation. The impact of cleaning data files through standardizing variables, parsing variables such as addresses into separable components, and checking for logical errors cannot be overstated. Various approaches are in use for linking records from two files on the same population. Dr. Winkler reviews several enhancements, including variations in string comparator metrics and memory indexing, that have been put into practice at the U.S. Census Bureau. Jerry Reiter writes about assessing uncertainty when using administrative records in Chapter 6. Along with survey estimates, one typically needs to provide estimates of standard error. How do the quality of administrative records and the performance of the linkage to the survey impact the accuracy of estimates? Multiple imputation (Rubin 1986, 1987) could be one area for further exploration. In Chapter 7, Joseph Sakshaug addresses the specific question of measuring and controlling non-consent bias when surveys and administrative data are linked...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.