
Literary Detective Work on the Computer
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
More details
Other editions
Additional editions

Person
Content
- Literary Detective Work on the Computer
- Editorial page
- Title page
- LCC data
- Table of contents
- Preface
- 1. Author identification
- 1. Introduction
- 2. Feature selection
- 2.1 Evaluation of feature sets for authorship attribution
- 3. Inter-textual distances
- 3.1 Manhattan distance and Euclidean distance
- 3.2 Labbé and Labbé's measure
- 3.3 Chi-squared distance
- 3.4 The cosine similarity measure
- 3.5 Kullback-Leibler Divergence (KLD)
- 3.6 Burrows' Delta
- 3.7 Evaluation of feature-based measures for inter-textual distance
- 3.8 Inter-textual distance by semantic similarity
- 3.9 Stemmatology as a measure of inter-textual distance
- 4. Clustering techniques
- 4.1 Introduction to factor analysis
- 4.2 Matrix algebra
- 4.3 Use of matrix algebra for PCA
- 4.4 PCA case studies
- 4.5 Correspondence analysis
- 5. Comparisons of classifiers
- 6. Other tasks related to authorship
- 6.1 Stylochronometry
- 6.2 Affect dictionaries and psychological profiling
- 6.3 Evaluation of author profiling
- 7. Conclusion
- 2. Plagiarism and spam filtering
- 1. Introduction
- 2. Plagiarism detection software
- 2.1 Collusion and plagiarism, external and intrinsic
- 2.2 Preprocessing of corpora and feature extraction
- 2.3 Sequence comparison and exact match
- 2.4 Source-suspicious document similarity measures
- 2.5 Fingerprinting
- 2.6 Language models
- 2.7 Natural Language Processing
- 2.8 Intrinsic plagiarism detection
- 2.9 Plagiarism of program code
- 2.10 Distance between translated and original text
- 2.11 Direction of plagiarism
- 2.12 The search engine-based approach used at PAN-13
- 2.13 Case study 1: Hidden influences from printed sources in the Gaelic tales
- 2.14 Case study 2: General George Pickett and related writings
- 2.15 Evaluation methods
- 2.16 Conclusion
- 3. Spam filters
- 3.1 Content-based techniques
- 3.2 Building a labelled corpus for training
- 3.3 Exact matching techniques
- 3.4 Rule-based methods
- 3.5 Machine learning
- 3.5.1 Naïve Bayes
- 3.5.2 Logistic regression
- 3.5.3 Boosting
- 3.6 Unsupervised machine learning approaches
- 3.7 Other spam-filtering problems
- 3.8 Evaluation of spam filters
- 3.9 Non-linguistic techniques
- 3.9.1 Safelists
- 3.9.2 Human challenges
- 3.9.3 Reputation analysis
- 3.9.4 Networking considerations
- 3.9.5 Web harvesting
- 3.9.6 Payment and legislation
- 3.10 Conclusion
- 4. Recommendations for further reading
- 3. Computer studies of Shakespearean authorship
- 1. Introduction
- 2. Shakespeare, Wilkins and Pericles
- 2.1 Correspondence analysis for "Pericles" and related texts
- 3. Shakespeare, Fletcher and The Two Noble Kinsmen
- 4. King John
- 5. The Raigne of King Edward III
- 5.1 Neural networks in stylometry
- 5.2 Cusum charts in stylometry
- 5.3 Burrows' Zeta and Iota
- 6. Hand D in "Sir Thomas More"
- 6.1 Elliott, Valenza and the Earl of Oxford
- 6.2 Elliott and Valenza: Hand D
- 6.3 Bayesian approach to questions of Shakespearian authorship
- 6.4 Bayesian analysis of Shakespeare's second-person pronouns
- 6.5 Vocabulary differences, LDA and the authorship of Hand D
- 6.6 Hand D: Conclusions
- 7. The three parts of Henry VI
- 8. Timon of Athens
- 9. The "Puritan" and the "Yorkshire Tragedy"
- 10. Arden of Faversham
- 11. Estimation of the extent of Shakespeare's vocabulary and the authorship of the "Taylor" poem
- 12. The chronology of Shakespeare
- 13. Conclusion
- 4. Stylometric analysis of religious texts
- 1. Introduction
- 1.1 Overview of the New Testament by correspondence analysis
- 1.2 Q
- 1.2.1 The work of Tony Honoré
- 1.2.2 Correspondence analysis of Luke
- 1.2.3 Correspondence analysis of the Synoptic Gospels
- 1.2.4 Correspondence analysis of Matthew
- 5. Conclusion
- 5. Computers and decipherment
- 1. Introduction
- 1.1 Differences between cryptography and decipherment
- 1.2 Cryptological techniques for automatic language recognition
- 1.3 Dictionary approaches to language recognition
- 1.4 Sinkov's test
- 1.5 Index of coincidence
- 1.6 The Log-Likelihood ratio
- 1.7 The Chi-Squared test statistic
- 1.8 Entropy of language
- 1.9 Zipf's Law and Heaps' Law coefficients
- 1.10 Modal token length
- 1.11 Autocorrelation analysis
- 1.12 Vowel identification
- 2. Rongorongo
- 2.1 History of Rongorongo
- 2.2 Characteristics of Rongorongo
- 2.3 Obstacles to decipherment
- 2.4 Encoding of RongRongo symbols
- 2.5 The "Mamari" lunar calendar
- 2.6 Basic statistics of the Rongorongo corpus
- 2.7 Alignment of the Rongorongo corpus
- 2.8 A concordance for Rongorongo
- 2.9 Collocations and collostructions
- 2.10 Classification by genre
- 2.11 Vocabulary richness
- 2.12 Podzniakov's approach to matching frequency curves
- 3. The Indus Valley texts
- 3.1 Why decipherment of the Indus texts is difficult
- 3.2 Are the Indus texts writing?
- 3.3 Other evidence for the Indus Script being writing
- 3.4 Determining the order of the Markov model
- 3.5 Missing symbols
- 3.6 Text segmentation and the Log-likelihood measure
- 3.7 Network analysis of the Indus Signs
- 4. Linear A
- 5. The Phaistos disk
- 6. Iron Age Pictish symbols
- 7. Mayan glyphs
- 8. Conclusion
- References
- Index
System requirements
File format: PDF
Copy-Protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (only limited: Kindle).
The file format PDF always displays a book page identically on any hardware. This makes PDF suitable for complex layouts such as those used in textbooks and reference books (images, tables, columns, footnotes). Unfortunately, on the small screens of e-readers or smartphones, PDFs are rather annoying, requiring too much scrolling.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our eBook Help page.