
Mastering Python for Data Science
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
- Create data visualizations and mine for patterns
- Advanced techniques for the four fundamentals of Data Science with Python - data mining, data analysis, data visualization, and machine learning
Book DescriptionData science is a relatively new knowledge domain which is used by various organizations to make data driven decisions. Data scientists have to wear various hats to work with data and to derive value from it. The Python programming language, beyond having conquered the scientific community in the last decade, is now an indispensable tool for the data science practitioner and a must-know tool for every aspiring data scientist. Using Python will offer you a fast, reliable, cross-platform, and mature environment for data analysis, machine learning, and algorithmic problem solving. This comprehensive guide helps you move beyond the hype and transcend the theory by providing you with a hands-on, advanced study of data science. Beginning with the essentials of Python in data science, you will learn to manage data and perform linear algebra in Python. You will move on to deriving inferences from the analysis by performing inferential statistics, and mining data to reveal hidden patterns and trends. You will use the matplot library to create high-end visualizations in Python and uncover the fundamentals of machine learning. Next, you will apply the linear regression technique and also learn to apply the logistic regression technique to your applications, before creating recommendation engines with various collaborative filtering algorithms and improving your predictions by applying the ensemble methods. Finally, you will perform K-means clustering, along with an analysis of unstructured data with different text mining techniques and leveraging the power of Python in big data analytics.What you will learn - Manage data and perform linear algebra in Python
- Derive inferences from the analysis by performing inferential statistics
- Solve data science problems in Python
- Create highend visualizations using Python
- Evaluate and apply the linear regression technique to estimate the relationships among variables.
- Build recommendation engines with the various collaborative filtering algorithms
- Apply the ensemble methods to improve your predictions
- Work with big data technologies to handle data at scale
Who this book is forIf you are a Python developer who wants to master the world of data science then this book is for you. Some knowledge of data science is assumed.
All prices
More details
Other editions
Additional editions

Person
Samir Madhavan has been working in the field of data science since 2010. He is an industry expert on machine learning and big data. He has also reviewed R Machine Learning Essentials by Packt Publishing. He was part of the ubiquitous Aadhar project of the Unique Identification Authority of India, which is in the process of helping every Indian get a unique number that is similar to a social security number in the United States. He was also the first employee of Flutura Decision Sciences and Analytics and is a part of the core team that has helped scale the number of employees in the company to 50. His company is now recognized as one of the most promising Internet of Things-Decision Sciences companies in the world.
Content
- Cover
- Copyright
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Table of Contents
- Preface
- Chapter 1: Getting Started with Raw Data
- The world of arrays with NumPy
- Creating an array
- Mathematical operations
- Array subtraction
- Squaring an array
- A trigonometric function performed on the array
- Conditional operations
- Matrix multiplication
- Indexing and slicing
- Shape manipulation
- Empowering data analysis with pandas
- The data structure of pandas
- Series
- DataFrame
- Panel
- Inserting and exporting data
- CSV
- XLS
- JSON
- Database
- Data cleansing
- Checking the missing data
- Filling the missing data
- String operations
- Merging data
- Data operations
- Aggregation operations
- Joins
- The inner join
- The left outer join
- The full outer join
- The groupby function
- Summary
- Chapter 2: Inferential Statistics
- Various forms of distribution
- A normal distribution
- A normal distribution from a binomial distribution
- A Poisson distribution
- A Bernoulli distribution
- A z-score
- A p-value
- One-tailed and two-tailed tests
- Type 1 and Type 2 errors
- A confidence interval
- Correlation
- Z-test vs T-test
- The F distribution
- The chi-square distribution
- Chi-square for the goodness of fit
- The chi-square test of independence
- ANOVA
- Summary
- Chapter 3: Finding a Needle in a Haystack
- What is data mining?
- Presenting an analysis
- Studying the Titanic
- Which passenger class has the maximum number of survivors?
- What is the distribution of survivors based on gender among the various classes?
- What is the distribution of nonsurvivors among the various classes who have family aboard the ship?
- What was the survival percentage among different age groups?
- Summary
- Chapter 4: Making Sense of Data through Advanced Visualization
- Controlling the line properties of a chart
- Using keyword arguments
- Using the setter methods
- Using the setp() command
- Creating multiple plots
- Playing with text
- Styling your plots
- Box plots
- Heatmaps
- Scatter plots with histograms
- A scatter plot matrix
- Area plots
- Bubble charts
- Hexagon bin plots
- Trellis plots
- A 3D plot of a surface
- Summary
- Chapter 5: Uncovering Machine Learning
- Different types of machine learning
- Supervised learning
- Unsupervised learning
- Reinforcement learning
- Decision trees
- Linear regression
- Logistic regression
- The naive Bayes classifier
- The k-means clustering
- Hierarchical clustering
- Summary
- Chapter 6: Performing Predictions with a Linear Regression
- Simple linear regression
- Multiple regression
- Training and testing a model
- Summary
- Chapter 7: Estimating the Likelihood of Events
- Logistic regression
- Data preparation
- Creating training and testing sets
- Building a model
- Model evaluation
- Evaluating a model based on test data
- Model building and evaluation with SciKit
- Summary
- Chapter 8: Generating Recommendations with Collaborative Filtering
- Recommendation data
- User-based collaborative filtering
- Finding similar users
- The Euclidean distance score
- The Pearson correlation score
- Ranking the users
- Recommending items
- Item-based collaborative filtering
- Summary
- Chapter 9: Pushing Boundaries with Ensemble Models
- The census income dataset
- Exploring the census data
- Hypothesis 1: People who are older earn more
- Hypothesis 2: Income bias based on working class
- Hypothesis 3: People with more education earn more
- Hypothesis 4: Married people tend to earn more
- Hypothesis 5: There is a bias in income based on race
- Hypothesis 6: There is a bias in the income based on occupation
- Hypothesis 7: Men earn more
- Hypothesis 8: People who clock in more hours earn more
- Hypothesis 9: There is a bias in income based on the country of origin
- Decision trees
- Random forests
- Summary
- Chapter 10: Applying Segmentation with k-means Clustering
- The k-means algorithm and its working
- A simple example
- The k-means clustering with countries
- Determining the number of clusters
- Clustering the countries
- Summary
- Chapter 11: Analyzing Unstructured Data with Text Mining
- Preprocessing data
- Creating a wordcloud
- Word and sentence tokenization
- Parts of speech tagging
- Stemming and lemmatization
- Stemming
- Lemmatization
- The Stanford Named Entity Recognizer
- Performing sentiment analysis on world leaders using Twitter
- Summary
- Chapter 12: Leveraging Python in the World of Big Data
- What is Hadoop?
- The programming model
- The MapReduce architecture
- The Hadoop DFS
- Hadoop's DFS architecture
- Python MapReduce
- The basic word count
- A sentiment score for each review
- The overall sentiment score
- Deploying the MapReduce code on Hadoop
- File handling with Hadoopy
- Pig
- Python with Apache Spark
- Scoring the sentiment
- The overall sentiment
- Summary
- Index
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.