During the last few years, a new approach to linguistic analysis has started to emerge. This approach, which has come to be known under various labels such as 'data-oriented parsing', 'corpus-based interpretation' and 'treebank grammar', assumes that human language comprehension and production works with representations of concrete past language experiences rather than with abstract grammatical rules. It operates by decomposing the given representations into fragments and recomposing those pieces to analyze (infinitely many) new utterances. This book shows how this general approach can apply to various kinds of linguistic representations. Experiments with this approach suggest that the productive units of natural language cannot be defined by a minimal set of rules or principles, but need to be defined by a large, redundant set of previously experienced structures. Bod argues that this outcome has important consequences for linguistic theory, leading to an entirely new view of the nature of linguistic competence.
1. Introduction: what are the productive units of natural language?; 2. A DOP model for tree representations; 3. Formal stochastic language theory; 4. Parsing and disambiguation; 5. Testing DOP: redundancy vs. minimality; 6. Learning new words; 7. Learning new structures; 8. A DOP model for compositional semantic representations; 9. Speech understanding and dialogue processing; 10. DOP models for non-context-free representations; 11. Conclusion: linguistics reconsidered; References.
'Bod develops a theory of human language based on linguistic experience. Instead of rules or principles, previously derived chunks of representations constitute the knowledge base for language use. With empirical rigor and compelling argumentation the author develops the theoretical foundations for his data-oriented approach and extends it to semantics and the processing of spoken dialogue. All computational linguists with a sincere interest in corpus-based methods should definitely read this well-written book. Theoretical linguists and psycholinguists will find it illuminating and thought-provoking.' Hans Uszkoreit, DFKI Saarbruken and Saarland University 'Beyond Grammar should be read by all theoretical linguists who feel intrigued or threatened by the renaissance of statistical natural language processing. Bod argues for the provocative thesis that knowledge of language should be understood not as a grammar, but as a 'statistical ensemble of language experience that changes slightly every time a new utterance is perceived or produced'. By building a conceptual theory that integrates formal language theory with statistical linguistics, he also shows why the coming statistical revolution need not put theoretical linguists out of business. This is a beautifully written, important, and accessible work.' Joan Bresnan, Stanford University
Dewey Decimal Classfication (DDC)