Linguistically annotated corpora are becoming a central part of the corpus linguistics field. One of their main strengths is the level of searchability they offer, but with the annotation come problems of the initial complexity of queries and query tools. This book gives a full, pedagogic account of this burgeoning field.
Beginning with an overview of corpus linguistics, its prerequisites and goals, the book then introduces linguistically annotated corpora. It explores the different levels of linguistic annotation, including morphological, parts of speech, syntactic, semantic and discourse-level, as well as advantages and challenges for such annotations. It covers the main annotated corpora for English, the Penn Treebank, the International Corpus of English, and OntoNotes, as well as a wide range of corpora for other languages. In its third part, search strategies required for different types of data are explored. All chapters are accompanied by exercises and by sections on further reading.
Sprache
Verlagsort
Verlagsgruppe
Bloomsbury Publishing Plc
Zielgruppe
Für höhere Schule und Studium
Für Beruf und Forschung
Maße
Höhe: 234 mm
Breite: 156 mm
Gewicht
ISBN-13
978-1-4411-1675-8 (9781441116758)
DOI
Copyright in bibliographic data and cover images is held by Nielsen Book Services Limited or by the publishers or by their respective licensors: all rights reserved.
Schweitzer Klassifikation
Sandra Kuebler is Associate Professor for Computational Linguistics in the Linguistics Department at Indiana University, Indiana, USA.
Heike Zinsmeister is Professor for German Linguistics and Corpus Linguistics at the University of Hamburg, Germany.
Preface
Part I Introduction
1. Corpus Linguistics
2. Corpora and Linguistic Annotation
Part II Linguistic Annotation
3. Linguistic Annotation on the Word Level
4. Syntactic Annotation
5. Semantic Annotation
6. Discourse Annotation
Part III Using Linguistic Annotation in Corpus Linguistics
7. Advantages and Limitations of Using Linguistically Annotated
Corpora
8. Corpus Linguistics Using Linguistically Annotated Corpora
Part IV Querying Linguistically Annotated Corpora
9. Concordances
10. Regular Expressions
11. Searching on the Word Level
12. Querying Syntactic Structures
13. Searching for Semantic and Discourse Phenomena
Appendix A. Penn Treebank POS Tagset 343
Appendix B. ICE POS Tagset 345
Bibliography
Index