
Social Media Data Mining and Analytics
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Social media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses.
Social Media Data Mining and Analytics isn't just another book on the business case for social media. Rather, this book provides hands-on examples for applying state-of-the-art tools and technologies to mine social media - examples include Twitter, Wikipedia, Stack Exchange, LiveJournal, movie reviews, and other rich data sources. In it, you will learn:
* The four key characteristics of online services-users, social networks, actions, and content
* The full data discovery lifecycle-data extraction, storage, analysis, and visualization
* How to work with code and extract data to create solutions
* How to use Big Data to make accurate customer predictions
* How to personalize the social media experience using machine learning
Using the techniques the authors detail will provide organizations the competitive advantage they need to harness the rich data available from social media platforms.
More details
Other editions
Additional editions

Persons
GUNGOR POLATKAN, PHD, is a Tech Lead/Engineering Manager designing and implementing end-to-end machine learning and artificial intelligence offline/online pipelines for the LinkedIn Learning relevance backend. He was previously a machine learning scientist at Twitter, where he worked on topics such as ad targeting and user modeling.
P. OSCAR BOYKIN, PHD, is a software engineer at Stripe where he works on machine learning infrastructure. He was previously a Senior Staff Engineer at Twitter, where he worked on data infrastructure problems. He is coauthor of the Scala big-data libraries Algebird, Scalding and Summingbird.
ANTONIOS CHALKIOPOULOS, MSC, is a Distributed Systems Specialist. A system engineer who has delivered fast/big data projects in media, betting, and finance, he is now leading the effort on the Lenses platform for data streaming as a co-founder and CEO at https://lenses.stream.
Content
2 - Title Page [Seite 3]
3 - Copyright [Seite 4]
4 - Contents [Seite 15]
5 - Introduction [Seite 19]
5.1 - Human Interactions Measured [Seite 20]
5.1.1 - Online Behavior Through Data Collection [Seite 21]
5.1.2 - What Types of Data Are Essential to Collect? [Seite 22]
5.2 - Asking and Answering Questions with Data [Seite 24]
5.3 - The Datasets Used in This Book [Seite 25]
5.3.1 - Wikipedia [Seite 26]
5.3.2 - Twitter [Seite 26]
5.3.3 - Stack Exchange [Seite 27]
5.3.4 - LiveJournal [Seite 28]
5.3.5 - Scientific Documents from Cora [Seite 30]
5.3.6 - Amazon Fine Food Reviews [Seite 30]
5.3.7 - MovieLens Movie Ratings [Seite 30]
5.4 - The Languages and Frameworks Used in This Book [Seite 30]
5.4.1 - Python [Seite 32]
5.4.2 - Scalding [Seite 35]
5.5 - System Requirements to Run the Examples [Seite 35]
5.6 - Overview of the Chapters [Seite 36]
5.7 - Online Repository for the Book [Seite 37]
6 - Chapter 1 Users: The Who of Social Media [Seite 39]
6.1 - Measuring Variations in User Behavior in Wikipedia [Seite 40]
6.1.1 - The Diversity of User Activities [Seite 41]
6.1.1.1 - The Origin of the User Activity Distribution [Seite 50]
6.1.1.2 - The Consequences of the Power Law [Seite 58]
6.1.2 - The Long Tail in Human Activities [Seite 63]
6.2 - Long Tails Everywhere: The 80/20 Rule (p/q Rule) [Seite 66]
6.3 - Online Behavior on Twitter [Seite 70]
6.3.1 - Retrieving Tweets for Users [Seite 71]
6.3.2 - Logarithmic Binning [Seite 74]
6.3.3 - User Activities on Twitter [Seite 75]
6.4 - Summary [Seite 77]
7 - Chapter 2 Networks: The How of Social Media [Seite 79]
7.1 - Types and Properties of Social Networks [Seite 80]
7.1.1 - When Users Create the Connections: Explicit Networks [Seite 81]
7.1.2 - Directed Versus Undirected Graphs [Seite 83]
7.1.3 - Node and Edge Properties [Seite 83]
7.1.4 - Weighted Graphs [Seite 84]
7.1.5 - Creating Graphs from Activities: Implicit Networks [Seite 86]
7.2 - Visualizing Networks [Seite 89]
7.3 - Degrees: The Winner Takes All [Seite 93]
7.3.1 - Counting the Number of Connections [Seite 95]
7.3.2 - The Long Tail in User Connections [Seite 96]
7.3.3 - Beyond the Idealized Network Model [Seite 100]
7.4 - Capturing Correlations: Triangles, Clustering, and Assortativity [Seite 102]
7.4.1 - Local Triangles and Clustering [Seite 102]
7.4.2 - Assortativity [Seite 108]
7.5 - Summary [Seite 113]
8 - Chapter 3 Temporal Processes: The When of Social Media [Seite 115]
8.1 - What Traditional Models Tell You About Events in Time [Seite 115]
8.1.1 - When Events Happen Uniformly in Time [Seite 117]
8.2 - Inter-Event Times [Seite 119]
8.2.1 - Comparing to a Memoryless Process [Seite 124]
8.2.2 - Autocorrelations [Seite 127]
8.2.3 - Deviations from Memorylessness [Seite 129]
8.2.4 - Periodicities in Time in User Activities [Seite 131]
8.3 - Bursty Activities of Individuals [Seite 137]
8.3.1 - Correlations and Bursts [Seite 143]
8.3.1.1 - Reservoir Sampling [Seite 144]
8.4 - Forecasting Metrics in Time [Seite 148]
8.4.1 - Finding Trends [Seite 150]
8.4.2 - Finding Seasonality [Seite 153]
8.4.3 - Forecasting Time Series with ARIMA [Seite 155]
8.4.3.1 - The Autoregressive Part ("AR") [Seite 156]
8.4.3.2 - The Moving Average Part ("MA") [Seite 157]
8.4.3.3 - The Full ARIMA(p, d, q) Model [Seite 157]
8.5 - Summary [Seite 159]
9 - Chapter 4 Content: The What of Social Media [Seite 161]
9.1 - Defining Content: Focus on Text and Unstructured Data [Seite 161]
9.1.1 - Creating Features from Text: The Basics of Natural Language Processing [Seite 163]
9.1.2 - The Basic Statistics of Term Occurrences in Text [Seite 166]
9.2 - Using Content Features to Identify Topics [Seite 167]
9.2.1 - The Popularity of Topics [Seite 176]
9.2.2 - How Diverse Are Individual Users' Interests? [Seite 179]
9.3 - Extracting Low-Dimensional Information from High-Dimensional Text [Seite 182]
9.3.1 - Topic Modeling [Seite 183]
9.3.1.1 - Unsupervised Topic Modeling [Seite 185]
9.3.1.2 - Supervised Topic Modeling [Seite 193]
9.3.1.3 - Relational Topic Modeling [Seite 200]
9.4 - Summary [Seite 207]
10 - Chapter 5 Processing Large Datasets [Seite 209]
10.1 - MapReduce: Structuring Parallel and Sequential Operations [Seite 210]
10.1.1 - Counting Words [Seite 212]
10.1.2 - Skew: The Curse of the Last Reducer [Seite 215]
10.2 - Multi-Stage MapReduce Flows [Seite 217]
10.2.1 - Fan-Out [Seite 218]
10.2.2 - Merging Data Streams [Seite 219]
10.2.3 - Joining Two Data Sources [Seite 221]
10.2.4 - Joining Against Small Datasets [Seite 224]
10.2.5 - Models of Large-Scale MapReduce [Seite 225]
10.3 - Patterns in MapReduce Programming [Seite 226]
10.3.1 - Static MapReduce Jobs [Seite 226]
10.3.2 - Iterative MapReduce Jobs [Seite 233]
10.3.2.1 - PageRank for Ranking in Graphs [Seite 233]
10.3.2.2 - k-means Clustering [Seite 237]
10.3.3 - Incremental MapReduce Jobs [Seite 241]
10.3.4 - Temporal MapReduce Jobs [Seite 242]
10.3.4.1 - Rollups and Data Cubing [Seite 243]
10.3.4.2 - Expanding Rollup Jobs [Seite 249]
10.3.5 - Challenges with Processing Long-Tailed Social Media Data [Seite 250]
10.4 - Sampling and Approximations: Getting Results with Less Computation [Seite 252]
10.4.1 - HyperLogLog [Seite 255]
10.4.1.1 - HyperLogLog Example [Seite 257]
10.4.1.2 - HyperLogLog on the Stack Exchange Dataset [Seite 259]
10.4.1.3 - Performance of HLL on Large Datasets [Seite 260]
10.4.2 - Bloom Filters [Seite 261]
10.4.2.1 - A Bloom Filter Example [Seite 264]
10.4.2.2 - Bloom Filter as Pre-Computed Membership Knowledge [Seite 266]
10.4.2.3 - Bloom Filters on Large Social Datasets [Seite 267]
10.4.3 - Count-Min Sketch [Seite 269]
10.4.3.1 - Count-Min Sketch-Heavy Hitters Example [Seite 271]
10.4.3.2 - Count-Min Sketch-Top Percentage Example [Seite 273]
10.4.3.3 - Aggregating Approximate Data Structures [Seite 273]
10.4.3.4 - Summary of Approximations [Seite 274]
10.5 - Executing on a Hadoop Cluster (Amazon EC2) [Seite 275]
10.5.1 - Installing a CDH Cluster on Amazon EC2 [Seite 275]
10.5.2 - Providing IAM Access to Collaborators [Seite 279]
10.5.3 - Adding On-Demand Cluster Capabilities [Seite 280]
10.6 - Summary [Seite 281]
11 - Chapter 6 Learn, Map, and Recommend [Seite 283]
11.1 - Social Media Services Online [Seite 284]
11.1.1 - Search Engines [Seite 284]
11.1.2 - Content Engagement [Seite 284]
11.1.3 - Interactions with the Real World [Seite 286]
11.1.4 - Interactions with People [Seite 287]
11.2 - Problem Formulation [Seite 289]
11.3 - Learning and Mapping [Seite 291]
11.3.1 - Matrix Factorization [Seite 293]
11.3.2 - Learning, Training [Seite 295]
11.3.2.1 - Under- and Overfitting [Seite 295]
11.3.2.2 - Regularizing in Matrix Factorization [Seite 297]
11.3.2.3 - Non-Negative Matrix Factorization and Sparsity [Seite 298]
11.3.3 - Demonstration on Movie Ratings [Seite 299]
11.3.3.1 - Interpreting the Learned Stereotypes [Seite 303]
11.3.3.2 - Exploratory Analysis [Seite 307]
11.4 - Prediction and Recommendation [Seite 312]
11.4.1 - Evaluation [Seite 315]
11.4.2 - Overview of Methodologies [Seite 316]
11.4.2.1 - Nearest Neighbor-Based Approaches [Seite 316]
11.4.2.2 - Approaches Based on Supervised Learning [Seite 318]
11.4.2.3 - Predicting Movie Ratings with Logistic Regression [Seite 318]
11.4.2.4 - Common Issues with Features [Seite 326]
11.4.2.5 - Domain-Specific Applications [Seite 327]
11.5 - Summary [Seite 328]
12 - Chapter 7 Conclusions [Seite 331]
12.1 - The Surprising Stability of Human Interaction Patterns [Seite 331]
12.2 - Averages, Standard Deviations, and Sampling [Seite 334]
12.3 - Removing Outliers [Seite 341]
13 - Index [Seite 347]
14 - EULA [Seite 355]
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.