This book comprises a set of articles that specify the methodology of text mining, describe the creation of lexical resources in the framework of text mining and use text mining for various tasks in natural language processing (NLP). The analysis of large amounts of textual data is a prerequisite to build lexical resources such as dictionaries and ontologies and also has direct applications in automated text processing in fields such as history, healthcare and mobile applications, just to name a few. This volume gives an update in terms of the recent gains in text mining methods and reflects the most recent achievements with respect to the automatic build-up of large lexical resources. It addresses researchers that already perform text mining, and those who want to enrich their battery of methods. Selected articles can be used to support graduate-level teaching.
The book is suitable for all readers that completed undergraduate studies of computational linguistics, quantitative linguistics, computer science and computational humanities. It assumes basic knowledge of computer science and corpus processing as well as of statistics.
After completing his doctoral dissertation with Gerhard Heyer at the University of Leipzig (Germany), Chris Biemann joined the semantic search startup Powerset (San Francisco) in 2008, which was acquired to become part of Microsoft's Bing in the same year. In 2011, he joined TU Darmstadt (Germany) as an assistant professor (W1) for Language Technology. His interests are situated in statistical semantics, unsupervised and knowledge-free natural language processing and in leveraging the wisdom of the crowds for language data acquisition. Alexander Mehler is professor (W3) for Computational Humanities / Text Technology at the Goethe University Frankfurt am Main, where he heads the Text Technology Lab as part of the Institute of Informatics. His research interests focus on the empirical analysis and simulative synthesis of discourse units in spoken and written communication. He aims at a quantitative theory of networking in linguistic systems to enable multi-agent simulations of their life cycle. Alexander Mehler integrates models of semantic spaces with simulation models of language evolution and topological models of network theory to capture the complexity of linguistic information systems. Currently, he is heading several research projects on the analysis of linguistic networks in historical semantics. Most recently he started a research project on kinetic text-technologies that integrates the paradigm of games with a purpose with the wiki way of collaborative writing and kinetic HCI.
Foreword.- PART I. Text Mining Techniques and Methodologies.- Thomas Eckart, Dirk Goldhahn, and Uwe Quasthoff: Building large resources for text mining.- Hristo Tanev: Learning Textologies: Networks of Linked Word Clusters.- Zornitsa Kozareva: Simple, Fast and Accurate Taxonomy Learning.- Patrick Oesterling, Christian Heine, Gunther H. Weber and Gerik Scheuermann: A Topology-Based Approach to Visualize the Thematic Composition of Document Collections.- Alexander Mehler, Tim vor der Brück, Rüdiger Gleim and Tim Geelhaar: Towards a Network Model of the Coreness of Texts; An Experiment in Classifying Latin Texts using the TTLab Latin Tagger.- PART II. Text Mining Applications. Stefan Bordag and Christian Hänig and Christian Beutenmüller: A structuralist approach for personal knowledge exploration systems on mobile devices.- Frank Oemig and Bernd Blobel: Natural Language Processing Supporting Interoperability in Healthcare.- Veronica Perez-Rosas, Cristian Bologa, Mihai Burzo and Rada Mihalcea: Deception Detection Within and Across Cultures.- Jonathan Sonntag and Manfred Stede: Sentiment Analysis: What's your Opinion?.- Marten Düring and Antal van den Bosch: Multi-perspective Event Detection in Texts Documenting the 1944 Battle of Arnhem.- Marco Büchler, Philip R. Burns, Martin Müller, Emily Franzini and Greta Franzini: Towards a Historical Text Re-use Detection.