Social Media Data Mining and Analytics

Name: Social Media Data Mining and Analytics
Brand: Wiley
Price: 28.99 EUR
Availability: OnlineOnly

Gabor Szabo Gungor Polatkan P. Oscar Boykin Antonios Chalkiopoulos(Author)

Wiley (Publisher)

1st Edition

Published on 18. September 2018

352 pages

E-Book

PDF with Adobe-DRM

System requirements

978-1-118-82490-0 (ISBN)

€28.99incl. 7% vat

System requirements

for PDF with Adobe-DRM

E-Book Single Licence

Available for download

Description

More details

Other editions

Persons

Content

1 - Cover [Seite 1]
2 - Title Page [Seite 3]
3 - Copyright [Seite 4]
4 - Contents [Seite 15]
5 - Introduction [Seite 19]
5.1 - Human Interactions Measured [Seite 20]
5.1.1 - Online Behavior Through Data Collection [Seite 21]
5.1.2 - What Types of Data Are Essential to Collect? [Seite 22]
5.2 - Asking and Answering Questions with Data [Seite 24]
5.3 - The Datasets Used in This Book [Seite 25]
5.3.1 - Wikipedia [Seite 26]
5.3.2 - Twitter [Seite 26]
5.3.3 - Stack Exchange [Seite 27]
5.3.4 - LiveJournal [Seite 28]
5.3.5 - Scientific Documents from Cora [Seite 30]
5.3.6 - Amazon Fine Food Reviews [Seite 30]
5.3.7 - MovieLens Movie Ratings [Seite 30]
5.4 - The Languages and Frameworks Used in This Book [Seite 30]
5.4.1 - Python [Seite 32]
5.4.2 - Scalding [Seite 35]
5.5 - System Requirements to Run the Examples [Seite 35]
5.6 - Overview of the Chapters [Seite 36]
5.7 - Online Repository for the Book [Seite 37]
6 - Chapter 1 Users: The Who of Social Media [Seite 39]
6.1 - Measuring Variations in User Behavior in Wikipedia [Seite 40]
6.1.1 - The Diversity of User Activities [Seite 41]
6.1.1.1 - The Origin of the User Activity Distribution [Seite 50]
6.1.1.2 - The Consequences of the Power Law [Seite 58]
6.1.2 - The Long Tail in Human Activities [Seite 63]
6.2 - Long Tails Everywhere: The 80/20 Rule (p/q Rule) [Seite 66]
6.3 - Online Behavior on Twitter [Seite 70]
6.3.1 - Retrieving Tweets for Users [Seite 71]
6.3.2 - Logarithmic Binning [Seite 74]
6.3.3 - User Activities on Twitter [Seite 75]
6.4 - Summary [Seite 77]
7 - Chapter 2 Networks: The How of Social Media [Seite 79]
7.1 - Types and Properties of Social Networks [Seite 80]
7.1.1 - When Users Create the Connections: Explicit Networks [Seite 81]
7.1.2 - Directed Versus Undirected Graphs [Seite 83]
7.1.3 - Node and Edge Properties [Seite 83]
7.1.4 - Weighted Graphs [Seite 84]
7.1.5 - Creating Graphs from Activities: Implicit Networks [Seite 86]
7.2 - Visualizing Networks [Seite 89]
7.3 - Degrees: The Winner Takes All [Seite 93]
7.3.1 - Counting the Number of Connections [Seite 95]
7.3.2 - The Long Tail in User Connections [Seite 96]
7.3.3 - Beyond the Idealized Network Model [Seite 100]
7.4 - Capturing Correlations: Triangles, Clustering, and Assortativity [Seite 102]
7.4.1 - Local Triangles and Clustering [Seite 102]
7.4.2 - Assortativity [Seite 108]
7.5 - Summary [Seite 113]
8 - Chapter 3 Temporal Processes: The When of Social Media [Seite 115]
8.1 - What Traditional Models Tell You About Events in Time [Seite 115]
8.1.1 - When Events Happen Uniformly in Time [Seite 117]
8.2 - Inter-Event Times [Seite 119]
8.2.1 - Comparing to a Memoryless Process [Seite 124]
8.2.2 - Autocorrelations [Seite 127]
8.2.3 - Deviations from Memorylessness [Seite 129]
8.2.4 - Periodicities in Time in User Activities [Seite 131]
8.3 - Bursty Activities of Individuals [Seite 137]
8.3.1 - Correlations and Bursts [Seite 143]
8.3.1.1 - Reservoir Sampling [Seite 144]
8.4 - Forecasting Metrics in Time [Seite 148]
8.4.1 - Finding Trends [Seite 150]
8.4.2 - Finding Seasonality [Seite 153]
8.4.3 - Forecasting Time Series with ARIMA [Seite 155]
8.4.3.1 - The Autoregressive Part ("AR") [Seite 156]
8.4.3.2 - The Moving Average Part ("MA") [Seite 157]
8.4.3.3 - The Full ARIMA(p, d, q) Model [Seite 157]
8.5 - Summary [Seite 159]
9 - Chapter 4 Content: The What of Social Media [Seite 161]
9.1 - Defining Content: Focus on Text and Unstructured Data [Seite 161]
9.1.1 - Creating Features from Text: The Basics of Natural Language Processing [Seite 163]
9.1.2 - The Basic Statistics of Term Occurrences in Text [Seite 166]
9.2 - Using Content Features to Identify Topics [Seite 167]
9.2.1 - The Popularity of Topics [Seite 176]
9.2.2 - How Diverse Are Individual Users' Interests? [Seite 179]
9.3 - Extracting Low-Dimensional Information from High-Dimensional Text [Seite 182]
9.3.1 - Topic Modeling [Seite 183]
9.3.1.1 - Unsupervised Topic Modeling [Seite 185]
9.3.1.2 - Supervised Topic Modeling [Seite 193]
9.3.1.3 - Relational Topic Modeling [Seite 200]
9.4 - Summary [Seite 207]
10 - Chapter 5 Processing Large Datasets [Seite 209]
10.1 - MapReduce: Structuring Parallel and Sequential Operations [Seite 210]
10.1.1 - Counting Words [Seite 212]
10.1.2 - Skew: The Curse of the Last Reducer [Seite 215]
10.2 - Multi-Stage MapReduce Flows [Seite 217]
10.2.1 - Fan-Out [Seite 218]
10.2.2 - Merging Data Streams [Seite 219]
10.2.3 - Joining Two Data Sources [Seite 221]
10.2.4 - Joining Against Small Datasets [Seite 224]
10.2.5 - Models of Large-Scale MapReduce [Seite 225]
10.3 - Patterns in MapReduce Programming [Seite 226]
10.3.1 - Static MapReduce Jobs [Seite 226]
10.3.2 - Iterative MapReduce Jobs [Seite 233]
10.3.2.1 - PageRank for Ranking in Graphs [Seite 233]
10.3.2.2 - k-means Clustering [Seite 237]
10.3.3 - Incremental MapReduce Jobs [Seite 241]
10.3.4 - Temporal MapReduce Jobs [Seite 242]
10.3.4.1 - Rollups and Data Cubing [Seite 243]
10.3.4.2 - Expanding Rollup Jobs [Seite 249]
10.3.5 - Challenges with Processing Long-Tailed Social Media Data [Seite 250]
10.4 - Sampling and Approximations: Getting Results with Less Computation [Seite 252]
10.4.1 - HyperLogLog [Seite 255]
10.4.1.1 - HyperLogLog Example [Seite 257]
10.4.1.2 - HyperLogLog on the Stack Exchange Dataset [Seite 259]
10.4.1.3 - Performance of HLL on Large Datasets [Seite 260]
10.4.2 - Bloom Filters [Seite 261]
10.4.2.1 - A Bloom Filter Example [Seite 264]
10.4.2.2 - Bloom Filter as Pre-Computed Membership Knowledge [Seite 266]
10.4.2.3 - Bloom Filters on Large Social Datasets [Seite 267]
10.4.3 - Count-Min Sketch [Seite 269]
10.4.3.1 - Count-Min Sketch-Heavy Hitters Example [Seite 271]
10.4.3.2 - Count-Min Sketch-Top Percentage Example [Seite 273]
10.4.3.3 - Aggregating Approximate Data Structures [Seite 273]
10.4.3.4 - Summary of Approximations [Seite 274]
10.5 - Executing on a Hadoop Cluster (Amazon EC2) [Seite 275]
10.5.1 - Installing a CDH Cluster on Amazon EC2 [Seite 275]
10.5.2 - Providing IAM Access to Collaborators [Seite 279]
10.5.3 - Adding On-Demand Cluster Capabilities [Seite 280]
10.6 - Summary [Seite 281]
11 - Chapter 6 Learn, Map, and Recommend [Seite 283]
11.1 - Social Media Services Online [Seite 284]
11.1.1 - Search Engines [Seite 284]
11.1.2 - Content Engagement [Seite 284]
11.1.3 - Interactions with the Real World [Seite 286]
11.1.4 - Interactions with People [Seite 287]
11.2 - Problem Formulation [Seite 289]
11.3 - Learning and Mapping [Seite 291]
11.3.1 - Matrix Factorization [Seite 293]
11.3.2 - Learning, Training [Seite 295]
11.3.2.1 - Under- and Overfitting [Seite 295]
11.3.2.2 - Regularizing in Matrix Factorization [Seite 297]
11.3.2.3 - Non-Negative Matrix Factorization and Sparsity [Seite 298]
11.3.3 - Demonstration on Movie Ratings [Seite 299]
11.3.3.1 - Interpreting the Learned Stereotypes [Seite 303]
11.3.3.2 - Exploratory Analysis [Seite 307]
11.4 - Prediction and Recommendation [Seite 312]
11.4.1 - Evaluation [Seite 315]
11.4.2 - Overview of Methodologies [Seite 316]
11.4.2.1 - Nearest Neighbor-Based Approaches [Seite 316]
11.4.2.2 - Approaches Based on Supervised Learning [Seite 318]
11.4.2.3 - Predicting Movie Ratings with Logistic Regression [Seite 318]
11.4.2.4 - Common Issues with Features [Seite 326]
11.4.2.5 - Domain-Specific Applications [Seite 327]
11.5 - Summary [Seite 328]
12 - Chapter 7 Conclusions [Seite 331]
12.1 - The Surprising Stability of Human Interaction Patterns [Seite 331]
12.2 - Averages, Standard Deviations, and Sampling [Seite 334]
12.3 - Removing Outliers [Seite 341]
13 - Index [Seite 347]
14 - EULA [Seite 355]

System requirements

Save as PDF Copy link into clipboard

Schweitzer Fachinformationen

Social Media Data Mining and Analytics

Description

More details

Other editions

Additional editions

Persons

Content

System requirements