Preparing Data for Analysis with JMP

Name: Preparing Data for Analysis with JMP
Brand: SAS Institute
Price: 29.49 EUR
Availability: OnlineOnly

Robert Carver(Author)

SAS Institute (Publisher)

Published on 1. May 2017

216 pages

E-Book

ePUB with Adobe-DRM

System requirements

978-1-63526-148-6 (ISBN)

€29.49incl. 7% vat

System requirements

for ePUB with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Person

Content

Intro
Contents
About This Book
What Does This Book Cover?
Is This Book for You?
What Are the Prerequisites for This Book?
What Should You Know about the Examples?
Software Used to Develop the Book's Content
Example Data
Output and Graphics
Where Are the Exercise Solutions?
Acknowledgments
We Want to Hear from You
About The Author
Data Management in the Analytics Process
Introduction
A Continuous Process
Figure 1.1: One Model of the Analytics Life Cycle (SAS)
Asking Questions That Data Can Help to Answer
Sourcing Relevant Data
Reproducibility
Combining and Reconciling Multiple Sources
Identifying and Addressing Data Issues
Data Requirements Shaped by Modeling Strategies
Plan of the Book
Conclusion
References
Data Management Foundations
Introduction
Matching Form to Function
Figure 2.1 Directory of Members of the U.S. House of Representatives
JMP Data Tables
Figure 2.2 The Column Info Window
Data Types and Modeling Types
Data Types
Modeling Types
Table 2.1: JMP Modeling Types
Figure 2.3 Modeling Type Icons
Basics of Relational Databases
Conclusion
References
Sources of Data and Their Challenges
Introduction
Internal Data in Flat Files
Relational Databases
External Data on the World Wide Web
User-Facing Query Interfaces
Figure 3.1: Access Options
Figure 3.2: The World DataBank Query Page
Figure 3.3: Detail of a Series Selection
Tabular Data Pages
Evolving WWW Data Standards
Ethical and Legal Considerations
Conclusion
References
Single Files
Introduction
Review of JMP File Types
Figure 4.1: File Formats Readable in JMP
Common Formats Other than JMP
MS Excel
Importing an Excel File
Figure 4.2: 2013 Revision of the World Population Policies Database Extract
Figure 4.3: The Excel Import Wizard
Figure 4.4: Individual Worksheet Settings Updated
Figure 4.5: Identifying the Final Data Row for Import
Figure 4.6: The JSL Script to Import the UN Data
Using the Excel Add-In (Windows Only)
Figure 4.7: The JMP Tab in Excel
Figure 4.8: The JMP Preferences Dialog
Figure 4.9: A Data Table Created from Excel
Text Files
Setting File Import Preferences
Figure 4.10: Setting Preferences for Text Data Files
Using the Text Import Wizard
Figure 4.11: World Development Indicators Download Site
Figure 4.12: Preparing to Use the Text Import Wizard
Figure 4.13: The Preview Window for CSV File Import
Figure 4.14: Inspect and Edit Column Names and Types Before Import
Figure 4.15: The Source Script
SAS Files
Files Stored Locally
Files Stored on a SAS Metadata Server
Figure 4.16: Establishing a Connection to a SAS Metadata Server
Figure 4.17: Importing Data Using the Browse SAS Folders Platform
Figure 4.18: Metadata about a SAS File Imported from a Server
Other Data File Formats
Conclusion
References
Database Queries
Introduction
Sample Databases in This Chapter
Connecting to a Database
Figure 5.1: Initiate Database Connection from File Menu
Figure 5.2: Database Operations for One Table
Figure 5.3: Select Data Source
Figure 5.4: Supplying Credentials to Complete the Connection
Figure 5.5: Database Open Table Dialog Showing a Connection
Figure 5.6: Machine Data Sources
Figure 5.7: Entering Credentials
Extracting Data from One Table in a Database
Import an Entire Table
Figure 5.8: Selecting a Table to Open
Figure 5.9: Data Table Imported from a Database
Import a Subset of a Table
Figure 5.10: Starting to Build SQL Code
Figure 5.11: Constructing a WHERE Clause
Figure 5.12: Building the Formula in the First WHERE Clause
Figure 5.13: The Completed Formula
Querying a Database from JMP
Query Builder
Figure 5.14: Query Builder Icon on Toolbar
Figure 5.15: Query Builder Launch from JMP Starter
Revisiting the Olympic Medals Query
Figure 5.16: First Step in Query Builder
Figure 5.17: Specifying Query Conditions
An Illustrative Scenario: Bicycle Parts
Table 5.1: Components Data to Import from the Adventure Works Database
Designing a Query with Query Builder
Figure 5.18: Selecting Tables for the Query
Figure 5.19: List of Tables to Join
Figure 5.20: Edit Join Dialog
Figure 5.21: A Potential Join in Need of Adjustment
Figure 5.22: Editing a Join Condition
Figure 5.23: Main Query Builder Dialog
Figure 5.24: Included Columns Panel with Customized Column Attributes
Figure 5.25: Filter and Order by Conditions
Figure 5.26: The Final SQL Code
Query Builder for SAS Server Data
Figure 5.27: Selecting Tables for Query
Figure 5.28: Selecting Columns and Rows for the Query
Figure 5.29: Drawing a Random Sample of Rows from a Query Result Set
Conclusion
References
Importing Data from Websites
Introduction
Variety of Web Formats
Internet Open
Figure 6.1: IOC Country Codes
Figure 6.2: Internet Open
Figure 6.3: Selecting Web Page Tables to Import
Figure 6.4: The First Imported Table
Figure 6.5: National Flags Unfurled
Common Issues to Anticipate
Figure 6.6: MLB Home Run Distance Leaderboard for the 2016 Season
Figure 6.7: Internet Open Alert
Figure 6.8: The Imported MLB Leaderboard Data
Conclusion
References
Reshaping a Data Table
Introduction
What Shape Is a Data Table?
Wide versus Long Format
Table 7.1: Wide Array of Artificial Experimental Data
Table 7.2: Long Arrangement of the Same Data
Reasons for Wide and Long Formats
Stacking Wide Data
Figure 7.1: Example Data Table in Wide Format
Figure 7.2: The Stack Dialog
Figure 7.3: The Experimental Data in Narrow, or Stacked, Format
Unstacking Narrow Data
Figure 7.4: The Split Dialog to Reshape a Narrow Table
Figure 7.5: The Experimental Data after Split to Wide Format
Additional Examples
Stacking Wide Data
Figure 7.6: Burtin's Antibiotics Data
Scripting for Reproducibility
Figure 7.7: Table Variables as Inputs to a Script
Figure 7.8: JSL Code to Stack the Antibiotics MIC Data
Figure 7.9: Antibiotics Data in Stacked (Long) Format
Splitting Long Data
Figure 7.10: Smartphone OS Data Table
Figure 7.11: Column Properties
Figure 7.12: The Split Dialog
Figure 7.13: The Smartphone OS Data in Wide Format
Transposing Rows and Columns
Figure 7.14: Transpose Dialog
Figure 7.15: Transposed Operating System Market Shares
Reshaping the WDI Data
Figure 7.16: Specifications for the Stack Command
Figure 7.17: The Split Dialog
Figure 7.18: The WDI Data Reshaped
Conclusion
References
Joining, Subsetting, and Filtering
Introduction
Combining Data from Multiple Tables with Join
Figure 8.1: Columns in Movies.jmp and Ratings.jmp
Figure 8.2: The Join Dialog
Figure 8.3: Choosing the Columns to Include in the Joined Table
Figure 8.4: The Results of the Join Operation
Saving Memory with a Virtual Join
Figure 8.5: Two Virtually Joined Tables
Why and How to Select a Subset
A Brief Detour: Creating a New Column from an Existing Column
Figure 8.6: Starting to Create a New Column
Figure 8.7: Character Functions in the Formula Editor
Figure 8.8: Substringing the Title Column
Figure 8.9: Defining the Starting Position of the Substring
Row Filters: Global and Local
Global Filter
Figure 8.10: Selecting Criteria for the Global Data Filter
Figure 8.11: Graph Builder Exploration of Filtered Data
Local Filter
Figure 8.12: The Local Data Filter
A More Durable Subset
Figure 8.13: The Subset Dialog
Combining Rows with Concatenate
Figure 8.15: Comparing Columns in the Two Insurance Example Tables
Figure 8.16: Concatenating the Two Insurance Client Tables
Query Builder for Tables
Back to the Movies
Figure 8.18: Filtering to Find Release Years
Olympic Medals and Development Indicators
More Wrangling before the Query
Figure 8.19: Aggregating Data using Table Summary
Additional Complications
Selecting Tables and Key Columns for the Query
Figure 8.20: Query Builder after Identifying Primary and Secondary Tables
Figure 8.21: Establishing Join Columns
Figure 8.22: Choosing an Inner Join
Building the Query: Column Selection
Figure 8.23: Starting to Select Columns for the Olympics Query
The Query Table
Figure 8.24: The Query Result
Conclusion
References
Data Exploration: Visual and Automated Tools to Detect Problems
Introduction
Common Issues to Anticipate
On the Hunt for Dirty Data
Distribution
Figure 9.1: Distributions of Continuous Columns
Columns Viewer
Figure 9.2: Columns Viewer Summary
Figure 9.3: Columns Viewer Summary for Categorical Columns
Figure 9.4: Distribution of Categorical Columns (Detail)
Multivariate (Correlations and Scatterplot Matrix)
Figure 9.5: Scatterplot Matrix
More Tools within the Multivariate Platform
Figure 9.6: Multivariate Platform Menu
Principal Components
Outlier Analysis
Item Reliability
Explore Outliers
Figure 9.7: Explore Outliers Dialog
Quantile Range Outliers
Figure 9.8: Quantile Range Outliers Report
Robust Fit Outliers
Figure 9.9: Robust Fit Outliers
Multivariate Robust Outliers
Figure 9.10: Multivariate Robust Outlier Report of Mahalanobis Distances
Multivariate k-Nearest Neighbors Outliers
Figure 9.11: K Nearest Neighbors Report
Explore Missing
Figure 9.12 Missing Value Report on Wealth Measures
Figure 9.13: Missing Value Clustering
Conclusion
References
Missing Data Strategies
Introduction
Much Ado about Nothing?
Four Basic Approaches
Working with Complete Cases
Analysis with Sampling Weights
Figure 10.1: Sampling Weights and Other Background Variables
Figure 10.2: Linear Regression with and without Sampling Weights
Imputation-based Methods
Recode
Figure 10.3: Recoding Missing Values as a Constant
Informative Missing
Figure 10.4: Initial Quadratic Model
Figure 10.5: Tag a Column Informative Missing
Figure 10.6: Model Results Using Informative Missing
Multivariate Normal Imputation
Figure 10.7: Available Estimation Methods
Figure 10. 8: Observed and Imputed GDP per Capita in Afghanistan
Multivariate SVD Imputation
Figure 10.9: Settings for SVD Imputation
Figure 10.10: JMP Alert Regarding Imputation Results
Figure 10.11: GDP Per Capita, Afghanistan, with Observed and Imputed Values
Special Considerations for Time Series
Figure 10.12: GDP Growth in Bhutan
Figure 10.13: Log-Linear Imputation for Smooth Time Series
Conclusion and a Note of Caution
References
Data Preparation for Analysis
Introduction
Common Issues and Appropriate Strategies
Table 11.1: Common Issues Addressed by Transformation
Distribution of Observations
Noisy Data
Figure 11.1: Initial "Noisy" Distribution
Figure 11. 2: Save Options for the Distribution Report
Figure 11.3: Comparing Raw Data and Discretized Values
Skewness or Outliers
Figure 11.4: Distribution of Population
Figure 11.5: Distribution of the Log of Population
Figure 11.6: Available Transformation Functions
Scale Differences among Model Variables
Figure 11.7: Effect of Standardizing Columns
Too Many Levels of a Categorical Variable
Figure 11.8: Three Categorical Variables with Many Levels
Figure 11.9: Grouping a Few Categorical Levels
Figure 11.10: Recode Options for a Categorical Column
Figure 11.11: Automatic Suggested Groupings of the Lead Studio Names
High Dimensionality: Abundance of Columns
Correlated or Redundant Variables
Missing or Sparse Observations across Columns
A PCA Example
Figure 11.12: Scatterplot Matrix from the Multivariate Report
Figure 11.13: Default PCA Report
Figure 11.14: Eigenvalues Report for the Olympics Data Table
Figure 11.15: Eigenvector Coefficients for the Olympics Data Table
Figure 11.16: Formatted Loading Matrix
Figure 11.17: Comparing Principal Components with and without Imputation
Abundance of Rows
Partitioning into Training, Validation, and Test Sets
Figure 11.18: Defining a Column to Split a Data Table into Training and Test Sets
Figure 11.19: Fit Model Platform including a Validation Column
Figure 11.20: The Crossvalidation Report
Aggregating Rows with Summary Tables
Figure 11.21: Summary Launch Window
Oversampling Rare Events
Figure 11.22: Subset Tables Launch Window
Date and Time-Related Issues
Formatting Dates and Times
Figure 11.23: Data and Time Format Options
Some Date Functions: Extracting Parts
Figure 11.24: Creating a New Variable from a Date
Aggregation
Row Functions Especially Useful in Time-Ordered Data
Elapsed Time and Date Arithmetic
Conclusion
References
Exporting Work to Other Platforms
Introduction
Why Export or Exchange Data?
Fit the Method to the Purpose
Save As
Figure 12.1: File Format Options in Save As
Figure 12.2: File Save As Alert Message
Export to a Database
Figure 12.3: Saving JMP Data Tables to a Database
Export to a SAS Library
Figure 12.4: Exporting Data to SAS
Exporting Reports
Figure 12.5: First Seasonal Flu Bubble Plot
Interactive Graphics
Figure 12.6: Controls in the Interactive HTML File Display
Figure 12.7: Interactive Bubble Plot as Flash File
Static Images: Graphics Formats, PowerPoint, and Word
Figure 12.8: Save As Options for JMP Output
Conclusion
References
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Y

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Preparing Data for Analysis with JMP

Description

More details

Other editions

Additional editions

Person

Content

System requirements