
Preparing Data for Analysis with JMP
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Access and clean up data easily using JMP®!
Data acquisition and preparation commonly consume approximately 75% of the effort and time of total data analysis. JMP provides many visual, intuitive, and even innovative data-preparation capabilities that enable you to make the most of your organization's data.
Preparing Data for Analysis with JMP® is organized within a framework of statistical investigations and model-building and illustrates the new data-handling features in JMP, such as the Query Builder. Useful to students and programmers with little or no JMP experience, or those looking to learn the new data-management features and techniques, it uses a practical approach to getting started with plenty of examples. Using step-by-step demonstrations and screenshots, this book walks you through the most commonly used data-management techniques that also include lots of tips on how to avoid common problems.
With this book, you will learn how to:
- Manage database operations using the JMP Query Builder
- Get data into JMP from other formats, such as Excel, csv, SAS, HTML, JSON, and the web
- Identify and avoid problems with the help of JMP's visual and automated data-exploration tools
- Consolidate data from multiple sources with Query Builder for tables
- Deal with common issues and repairs that include the following tasks:
- reshaping tables (stack/unstack)
- managing missing data with techniques such as imputation and Principal Components Analysis
- cleaning and correcting dirty data
- computing new variables
- transforming variables for modelling
- reconciling time and date
- Subset and filter your data
- Save data tables for exchange with other platforms
More details
Other editions
Additional editions

Person
Content
- Intro
- Contents
- About This Book
- What Does This Book Cover?
- Is This Book for You?
- What Are the Prerequisites for This Book?
- What Should You Know about the Examples?
- Software Used to Develop the Book's Content
- Example Data
- Output and Graphics
- Where Are the Exercise Solutions?
- Acknowledgments
- We Want to Hear from You
- About The Author
- Data Management in the Analytics Process
- Introduction
- A Continuous Process
- Figure 1.1: One Model of the Analytics Life Cycle (SAS)
- Asking Questions That Data Can Help to Answer
- Sourcing Relevant Data
- Reproducibility
- Combining and Reconciling Multiple Sources
- Identifying and Addressing Data Issues
- Data Requirements Shaped by Modeling Strategies
- Plan of the Book
- Conclusion
- References
- Data Management Foundations
- Introduction
- Matching Form to Function
- Figure 2.1 Directory of Members of the U.S. House of Representatives
- JMP Data Tables
- Figure 2.2 The Column Info Window
- Data Types and Modeling Types
- Data Types
- Modeling Types
- Table 2.1: JMP Modeling Types
- Figure 2.3 Modeling Type Icons
- Basics of Relational Databases
- Conclusion
- References
- Sources of Data and Their Challenges
- Introduction
- Internal Data in Flat Files
- Relational Databases
- External Data on the World Wide Web
- User-Facing Query Interfaces
- Figure 3.1: Access Options
- Figure 3.2: The World DataBank Query Page
- Figure 3.3: Detail of a Series Selection
- Tabular Data Pages
- Evolving WWW Data Standards
- Ethical and Legal Considerations
- Conclusion
- References
- Single Files
- Introduction
- Review of JMP File Types
- Figure 4.1: File Formats Readable in JMP
- Common Formats Other than JMP
- MS Excel
- Importing an Excel File
- Figure 4.2: 2013 Revision of the World Population Policies Database Extract
- Figure 4.3: The Excel Import Wizard
- Figure 4.4: Individual Worksheet Settings Updated
- Figure 4.5: Identifying the Final Data Row for Import
- Figure 4.6: The JSL Script to Import the UN Data
- Using the Excel Add-In (Windows Only)
- Figure 4.7: The JMP Tab in Excel
- Figure 4.8: The JMP Preferences Dialog
- Figure 4.9: A Data Table Created from Excel
- Text Files
- Setting File Import Preferences
- Figure 4.10: Setting Preferences for Text Data Files
- Using the Text Import Wizard
- Figure 4.11: World Development Indicators Download Site
- Figure 4.12: Preparing to Use the Text Import Wizard
- Figure 4.13: The Preview Window for CSV File Import
- Figure 4.14: Inspect and Edit Column Names and Types Before Import
- Figure 4.15: The Source Script
- SAS Files
- Files Stored Locally
- Files Stored on a SAS Metadata Server
- Figure 4.16: Establishing a Connection to a SAS Metadata Server
- Figure 4.17: Importing Data Using the Browse SAS Folders Platform
- Figure 4.18: Metadata about a SAS File Imported from a Server
- Other Data File Formats
- Conclusion
- References
- Database Queries
- Introduction
- Sample Databases in This Chapter
- Connecting to a Database
- Figure 5.1: Initiate Database Connection from File Menu
- Figure 5.2: Database Operations for One Table
- Figure 5.3: Select Data Source
- Figure 5.4: Supplying Credentials to Complete the Connection
- Figure 5.5: Database Open Table Dialog Showing a Connection
- Figure 5.6: Machine Data Sources
- Figure 5.7: Entering Credentials
- Extracting Data from One Table in a Database
- Import an Entire Table
- Figure 5.8: Selecting a Table to Open
- Figure 5.9: Data Table Imported from a Database
- Import a Subset of a Table
- Figure 5.10: Starting to Build SQL Code
- Figure 5.11: Constructing a WHERE Clause
- Figure 5.12: Building the Formula in the First WHERE Clause
- Figure 5.13: The Completed Formula
- Querying a Database from JMP
- Query Builder
- Figure 5.14: Query Builder Icon on Toolbar
- Figure 5.15: Query Builder Launch from JMP Starter
- Revisiting the Olympic Medals Query
- Figure 5.16: First Step in Query Builder
- Figure 5.17: Specifying Query Conditions
- An Illustrative Scenario: Bicycle Parts
- Table 5.1: Components Data to Import from the Adventure Works Database
- Designing a Query with Query Builder
- Figure 5.18: Selecting Tables for the Query
- Figure 5.19: List of Tables to Join
- Figure 5.20: Edit Join Dialog
- Figure 5.21: A Potential Join in Need of Adjustment
- Figure 5.22: Editing a Join Condition
- Figure 5.23: Main Query Builder Dialog
- Figure 5.24: Included Columns Panel with Customized Column Attributes
- Figure 5.25: Filter and Order by Conditions
- Figure 5.26: The Final SQL Code
- Query Builder for SAS Server Data
- Figure 5.27: Selecting Tables for Query
- Figure 5.28: Selecting Columns and Rows for the Query
- Figure 5.29: Drawing a Random Sample of Rows from a Query Result Set
- Conclusion
- References
- Importing Data from Websites
- Introduction
- Variety of Web Formats
- Internet Open
- Figure 6.1: IOC Country Codes
- Figure 6.2: Internet Open
- Figure 6.3: Selecting Web Page Tables to Import
- Figure 6.4: The First Imported Table
- Figure 6.5: National Flags Unfurled
- Common Issues to Anticipate
- Figure 6.6: MLB Home Run Distance Leaderboard for the 2016 Season
- Figure 6.7: Internet Open Alert
- Figure 6.8: The Imported MLB Leaderboard Data
- Conclusion
- References
- Reshaping a Data Table
- Introduction
- What Shape Is a Data Table?
- Wide versus Long Format
- Table 7.1: Wide Array of Artificial Experimental Data
- Table 7.2: Long Arrangement of the Same Data
- Reasons for Wide and Long Formats
- Stacking Wide Data
- Figure 7.1: Example Data Table in Wide Format
- Figure 7.2: The Stack Dialog
- Figure 7.3: The Experimental Data in Narrow, or Stacked, Format
- Unstacking Narrow Data
- Figure 7.4: The Split Dialog to Reshape a Narrow Table
- Figure 7.5: The Experimental Data after Split to Wide Format
- Additional Examples
- Stacking Wide Data
- Figure 7.6: Burtin's Antibiotics Data
- Scripting for Reproducibility
- Figure 7.7: Table Variables as Inputs to a Script
- Figure 7.8: JSL Code to Stack the Antibiotics MIC Data
- Figure 7.9: Antibiotics Data in Stacked (Long) Format
- Splitting Long Data
- Figure 7.10: Smartphone OS Data Table
- Figure 7.11: Column Properties
- Figure 7.12: The Split Dialog
- Figure 7.13: The Smartphone OS Data in Wide Format
- Transposing Rows and Columns
- Figure 7.14: Transpose Dialog
- Figure 7.15: Transposed Operating System Market Shares
- Reshaping the WDI Data
- Figure 7.16: Specifications for the Stack Command
- Figure 7.17: The Split Dialog
- Figure 7.18: The WDI Data Reshaped
- Conclusion
- References
- Joining, Subsetting, and Filtering
- Introduction
- Combining Data from Multiple Tables with Join
- Figure 8.1: Columns in Movies.jmp and Ratings.jmp
- Figure 8.2: The Join Dialog
- Figure 8.3: Choosing the Columns to Include in the Joined Table
- Figure 8.4: The Results of the Join Operation
- Saving Memory with a Virtual Join
- Figure 8.5: Two Virtually Joined Tables
- Why and How to Select a Subset
- A Brief Detour: Creating a New Column from an Existing Column
- Figure 8.6: Starting to Create a New Column
- Figure 8.7: Character Functions in the Formula Editor
- Figure 8.8: Substringing the Title Column
- Figure 8.9: Defining the Starting Position of the Substring
- Row Filters: Global and Local
- Global Filter
- Figure 8.10: Selecting Criteria for the Global Data Filter
- Figure 8.11: Graph Builder Exploration of Filtered Data
- Local Filter
- Figure 8.12: The Local Data Filter
- A More Durable Subset
- Figure 8.13: The Subset Dialog
- Combining Rows with Concatenate
- Figure 8.15: Comparing Columns in the Two Insurance Example Tables
- Figure 8.16: Concatenating the Two Insurance Client Tables
- Query Builder for Tables
- Back to the Movies
- Figure 8.18: Filtering to Find Release Years
- Olympic Medals and Development Indicators
- More Wrangling before the Query
- Figure 8.19: Aggregating Data using Table Summary
- Additional Complications
- Selecting Tables and Key Columns for the Query
- Figure 8.20: Query Builder after Identifying Primary and Secondary Tables
- Figure 8.21: Establishing Join Columns
- Figure 8.22: Choosing an Inner Join
- Building the Query: Column Selection
- Figure 8.23: Starting to Select Columns for the Olympics Query
- The Query Table
- Figure 8.24: The Query Result
- Conclusion
- References
- Data Exploration: Visual and Automated Tools to Detect Problems
- Introduction
- Common Issues to Anticipate
- On the Hunt for Dirty Data
- Distribution
- Figure 9.1: Distributions of Continuous Columns
- Columns Viewer
- Figure 9.2: Columns Viewer Summary
- Figure 9.3: Columns Viewer Summary for Categorical Columns
- Figure 9.4: Distribution of Categorical Columns (Detail)
- Multivariate (Correlations and Scatterplot Matrix)
- Figure 9.5: Scatterplot Matrix
- More Tools within the Multivariate Platform
- Figure 9.6: Multivariate Platform Menu
- Principal Components
- Outlier Analysis
- Item Reliability
- Explore Outliers
- Figure 9.7: Explore Outliers Dialog
- Quantile Range Outliers
- Figure 9.8: Quantile Range Outliers Report
- Robust Fit Outliers
- Figure 9.9: Robust Fit Outliers
- Multivariate Robust Outliers
- Figure 9.10: Multivariate Robust Outlier Report of Mahalanobis Distances
- Multivariate k-Nearest Neighbors Outliers
- Figure 9.11: K Nearest Neighbors Report
- Explore Missing
- Figure 9.12 Missing Value Report on Wealth Measures
- Figure 9.13: Missing Value Clustering
- Conclusion
- References
- Missing Data Strategies
- Introduction
- Much Ado about Nothing?
- Four Basic Approaches
- Working with Complete Cases
- Analysis with Sampling Weights
- Figure 10.1: Sampling Weights and Other Background Variables
- Figure 10.2: Linear Regression with and without Sampling Weights
- Imputation-based Methods
- Recode
- Figure 10.3: Recoding Missing Values as a Constant
- Informative Missing
- Figure 10.4: Initial Quadratic Model
- Figure 10.5: Tag a Column Informative Missing
- Figure 10.6: Model Results Using Informative Missing
- Multivariate Normal Imputation
- Figure 10.7: Available Estimation Methods
- Figure 10. 8: Observed and Imputed GDP per Capita in Afghanistan
- Multivariate SVD Imputation
- Figure 10.9: Settings for SVD Imputation
- Figure 10.10: JMP Alert Regarding Imputation Results
- Figure 10.11: GDP Per Capita, Afghanistan, with Observed and Imputed Values
- Special Considerations for Time Series
- Figure 10.12: GDP Growth in Bhutan
- Figure 10.13: Log-Linear Imputation for Smooth Time Series
- Conclusion and a Note of Caution
- References
- Data Preparation for Analysis
- Introduction
- Common Issues and Appropriate Strategies
- Table 11.1: Common Issues Addressed by Transformation
- Distribution of Observations
- Noisy Data
- Figure 11.1: Initial "Noisy" Distribution
- Figure 11. 2: Save Options for the Distribution Report
- Figure 11.3: Comparing Raw Data and Discretized Values
- Skewness or Outliers
- Figure 11.4: Distribution of Population
- Figure 11.5: Distribution of the Log of Population
- Figure 11.6: Available Transformation Functions
- Scale Differences among Model Variables
- Figure 11.7: Effect of Standardizing Columns
- Too Many Levels of a Categorical Variable
- Figure 11.8: Three Categorical Variables with Many Levels
- Figure 11.9: Grouping a Few Categorical Levels
- Figure 11.10: Recode Options for a Categorical Column
- Figure 11.11: Automatic Suggested Groupings of the Lead Studio Names
- High Dimensionality: Abundance of Columns
- Correlated or Redundant Variables
- Missing or Sparse Observations across Columns
- A PCA Example
- Figure 11.12: Scatterplot Matrix from the Multivariate Report
- Figure 11.13: Default PCA Report
- Figure 11.14: Eigenvalues Report for the Olympics Data Table
- Figure 11.15: Eigenvector Coefficients for the Olympics Data Table
- Figure 11.16: Formatted Loading Matrix
- Figure 11.17: Comparing Principal Components with and without Imputation
- Abundance of Rows
- Partitioning into Training, Validation, and Test Sets
- Figure 11.18: Defining a Column to Split a Data Table into Training and Test Sets
- Figure 11.19: Fit Model Platform including a Validation Column
- Figure 11.20: The Crossvalidation Report
- Aggregating Rows with Summary Tables
- Figure 11.21: Summary Launch Window
- Oversampling Rare Events
- Figure 11.22: Subset Tables Launch Window
- Date and Time-Related Issues
- Formatting Dates and Times
- Figure 11.23: Data and Time Format Options
- Some Date Functions: Extracting Parts
- Figure 11.24: Creating a New Variable from a Date
- Aggregation
- Row Functions Especially Useful in Time-Ordered Data
- Elapsed Time and Date Arithmetic
- Conclusion
- References
- Exporting Work to Other Platforms
- Introduction
- Why Export or Exchange Data?
- Fit the Method to the Purpose
- Save As
- Figure 12.1: File Format Options in Save As
- Figure 12.2: File Save As Alert Message
- Export to a Database
- Figure 12.3: Saving JMP Data Tables to a Database
- Export to a SAS Library
- Figure 12.4: Exporting Data to SAS
- Exporting Reports
- Figure 12.5: First Seasonal Flu Bubble Plot
- Interactive Graphics
- Figure 12.6: Controls in the Interactive HTML File Display
- Figure 12.7: Interactive Bubble Plot as Flash File
- Static Images: Graphics Formats, PowerPoint, and Word
- Figure 12.8: Save As Options for JMP Output
- Conclusion
- References
- Index
- A
- B
- C
- D
- E
- F
- G
- H
- I
- J
- K
- L
- M
- N
- O
- P
- Q
- R
- S
- T
- U
- V
- W
- Y
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.