
Data Science Essentials in Python
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work. This one-stop solution covers the essential data science you need in Python.
Data science is one of the fastest-growing disciplines in terms of academic research, student enrollment, and employment. Python, with its flexibility and scalability, is quickly overtaking the R language for data-scientific projects. Keep Python data-science concepts at your fingertips with this modular, quick reference to the tools used to acquire, clean, analyze, and store data.
This one-stop solution covers essential Python, databases, network analysis, natural language processing, elements of machine learning, and visualization. Access structured and unstructured text and numeric data from local files, databases, and the Internet. Arrange, rearrange, and clean the data. Work with relational and non-relational databases, data visualization, and simple predictive analysis (regressions, clustering, and decision trees). See how typical data analysis problems are handled. And try your hand at your own solutions to a variety of medium-scale projects that are fun to work on and look good on your resume.
Keep this handy quick guide at your side whether you're a student, an entry-level data science professional converting from R to Python, or a seasoned Python developer who doesn't want to memorize every function and option.
What You Need:
You need a decent distribution of Python 3.3 or above that includes at least NLTK, Pandas, NumPy, Matplotlib, Networkx, SciKit-Learn, and BeautifulSoup. A great distribution that meets the requirements is Anaconda, available for free from www.continuum.io. If you plan to set up your own database servers, you also need MySQL (www.mysql.com) and MongoDB (www.mongodb.com). Both packages are free and run on Windows, Linux, and Mac OS.
More details
Other editions
Additional editions

Person
Dmitry Zinoviev has an MS in Physics from Moscow State University and a PhD in Computer Science from Stony Brook University. His research interests include computer simulation and modeling, network science, social network analysis, and digital humanities. He has been teaching at Suffolk University in Boston, MA since 2001.
Content
- Cover
- Table of Contents
- Acknowledgments
- Preface
- About This Book
- About the Audience
- About the Software
- Notes on Quotes
- The Book Forum
- Your Turn
- 1. What Is Data Science?
- Unit 1. Data Analysis Sequence
- Unit 2. Data Acquisition Pipeline
- Unit 3. Report Structure
- Your Turn
- 2. Core Python for Data Science
- Unit 4. Understanding Basic String Functions
- Unit 5. Choosing the Right Data Structure
- Unit 6. Comprehending Lists Through List Comprehension
- Unit 7. Counting with Counters
- Unit 8. Working with Files
- Unit 9. Reaching the Web
- Unit 10. Pattern Matching with Regular Expressions
- Unit 11. Globbing File Names and Other Strings
- Unit 12. Pickling and Unpickling Data
- Your Turn
- 3. Working with Text Data
- Unit 13. Processing HTML Files
- Unit 14. Handling CSV Files
- Unit 15. Reading JSON Files
- Unit 16. Processing Texts in Natural Languages
- Your Turn
- 4. Working with Databases
- Unit 17. Setting Up a MySQL Database
- Unit 18. Using a MySQL Database: Command Line
- Unit 19. Using a MySQL Database: pymysql
- Unit 20. Taming Document Stores: MongoDB
- Your Turn
- 5. Working with Tabular Numeric Data
- Unit 21. Creating Arrays
- Unit 22. Transposing and Reshaping
- Unit 23. Indexing and Slicing
- Unit 24. Broadcasting
- Unit 25. Demystifying Universal Functions
- Unit 26. Understanding Conditional Functions
- Unit 27. Aggregating and Ordering Arrays
- Unit 28. Treating Arrays as Sets
- Unit 29. Saving and Reading Arrays
- Unit 30. Generating a Synthetic Sine Wave
- Your Turn
- 6. Working with Data Series and Frames
- Unit 31. Getting Used to Pandas Data Structures
- Unit 32. Reshaping Data
- Unit 33. Handling Missing Data
- Unit 34. Combining Data
- Unit 35. Ordering and Describing Data
- Unit 36. Transforming Data
- Unit 37. Taming Pandas File I/O
- Your Turn
- 7. Working with Network Data
- Unit 38. Dissecting Graphs
- Unit 39. Network Analysis Sequence
- Unit 40. Harnessing Networkx
- Your Turn
- 8. Plotting
- Unit 41. Basic Plotting with PyPlot
- Unit 42. Getting to Know Other Plot Types
- Unit 43. Mastering Embellishments
- Unit 44. Plotting with Pandas
- Your Turn
- 9. Probability and Statistics
- Unit 45. Reviewing Probability Distributions
- Unit 46. Recollecting Statistical Measures
- Unit 47. Doing Stats the Python Way
- Your Turn
- 10. Machine Learning
- Unit 48. Designing a Predictive Experiment
- Unit 49. Fitting a Linear Regression
- Unit 50. Grouping Data with K-Means Clustering
- Unit 51. Surviving in Random Decision Forests
- Your Turn
- A1. Further Reading
- A2. Solutions to Single-Star Projects
- Bibliography
- Index
- - SYMBOLS -
- - A -
- - B -
- - C -
- - D -
- - E -
- - F -
- - G -
- - H -
- - I -
- - J -
- - K -
- - L -
- - M -
- - N -
- - O -
- - P -
- - Q -
- - R -
- - S -
- - T -
- - U -
- - V -
- - W -
- - X -
- - Y -
- - Z -
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.