Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
In this chapter, you will learn
Learning how to perform text mining should be an interesting and exciting journey throughout this book. A fun artifact of learning text mining is that you can use the methods in this book on your own social media or online exchanges. Beyond these everyday online applications to your personal interactions, this book provides business use cases in an effort to show how text mining can improve products, customer service, marketing or human resources.
There are many technical definitions of text mining both on the Internet and in textbooks, but as the primary goal of text mining in this book is the extraction of an output that is useful such as a visualization or structured table of outputs to be used elsewhere; this is my definition:
Text mining is the process of distilling actionable insights from text.
Text mining within the context of this book is a commitment to real world cases which impact business. Therefore, the definition and this book are aimed at meaningful distillation of text with the end goal to aid a decision-maker. While there may be some differences, the terms text mining and text analytics can be used interchangeably. Word choice is important; I use text mining because it more adequately describes the uncovering of insights and the use of specific algorithms beyond basic statistical analysis.
In this book, text mining is more than an academic exercise. I hope to show that text mining has enterprise value and can contribute to various business units. Specifically, text mining can be used to identify actionable social media posts for a customer service organization. It can be used in human resources for various purposes such as understanding candidate perceptions of the organization or to match job descriptions with resumes. Text mining has marketing implications to measure campaign salience. It can even be used to identify brand evangelists and impact customer propensity modeling. Presently the state of text mining is somewhere between novelty and providing real actionable business intelligence. The book gives you not only the tools to perform text mining but also the case studies to help identify practical business applications to get your creative text mining efforts started.
Text mining fits within many disciplines. These include private and academic uses. For academics, text mining may aid in the analytical understanding of qualitatively collected transcripts or the study of language and sociology. For the private enterprise, text mining skills are often contained in a data science team. This is because text mining may yield interesting and important inputs for predictive modeling, and also because the text mining skillset has been highly technical. However, text mining can be applied beyond a data science modeling workflow. Business intelligence could benefit from the skill set by quickly reviewing internal documents such as customer satisfaction surveys. Competitive intelligence and marketers can review external text to provide insightful recommendations to the organization. As businesses are saving more textual data, they will need to break text-mining skills outside of a data science team. In the end, text mining could be used in any data driven decision where text naturally fits as an input.
We should care about textual information for a variety of reasons.
Yet today's successful technology companies largely rely on numeric and categorical inputs for information gains, machine learning algorithms or operational optimization. It is illogical for an organization to study only structured information yet still devote precious resources to recording unstructured natural language. Text represents an untapped input that can further increase competitive advantage. Lastly, enterprises are transitioning from an industrial age to an information age; one could argue that the most successful companies are transitioning again to a customer-centric age. These companies realize that taking a long term view of customer wellbeing ensures long term success and helps the company to remain salient. Large companies can no longer merely create a product and forcibly market it to end-users. In an age of increasing customer expectations customers want to be heard by corporations. As a result, to be truly customer centric in a hyper competitive environment, an organization should be listening to their constituents whenever possible. Yet the amount of textual information from these interactions can be immense, so text mining offers a way to extract insights quickly.
Text mining will make an analyst's or data scientist's efforts to understand vast amounts of text easier and help ensure credibility from internal decision-makers. The alternative to text mining may mean ignoring text sources or merely sampling and manually reviewing text.
There are numerous consequences of ignoring text.
Explicitly ignoring text may be a conscious analyst decision, but doing so ignores text's insightful possibilities. This is analogous to an ostrich that sticks its head in the ground when confronted. If the aim is robust investigative quantitative analysis, then ignoring text is inappropriate. Of course, there are constraints to data science or business analysis, such as strict budgets or timelines. Therefore, it is not always appropriate to use text for analytics, but if the problem being investigated has a text component, and resource constraints do not forbid it, then ignoring text is not suitable.
As an alternative, some organizations will sample text and manually review it. This may mean having a single assessor or panel of readers or even outsourcing analytical efforts to human-based services like mturk or crowdflower. Often communication theory does not support these methods as a sound way to score text, or to extract meaning. Setting aside sampling biases and logistical tabulation difficulties, communication theory states that the meaning of a message relies on the recipient. Therefore a single evaluator introduces biases in meaning or numerical scoring, e.g. sentiment as a numbered scale. Additionally, the idea behind a group of people scoring text relies on Sir Francis Galton's theory of "Vox Populi" or wisdom of crowds.
mturk
crowdflower
To exploit the wisdom of crowds four elements must be considered:
Sir Francis Galton's experiment exploring the wisdom of crowds met these conditions with 800 participants. At an English country fair, people were asked to guess the weight of a single ox. Participants guessed separately from each other without sharing the guess. Participants were free to look at the cow themselves yet not receive expert consultation. In this case, contestants had a diverse background. For example, there were no prerequisites stating that they needed to be a certain age, demographic or profession. Lastly, guesses were recorded on paper for tabulation by Sir Francis to study. In the end, the experiment showed the merit of the wisdom of crowds. There was not an individual correct guess. However, the median average of the group was exactly right. It was even better than the individual farming experts who guessed the weight.
If these conditions are not met explicitly, then the results of the panel are suspect. This may seem easy to do, but in practice it is hard to ensure within an organization. For example a former colleague at a major technology company in California shared a story about the company's effort to create Internet-connected eyeglasses. The eyeglasses were shared with internal employees, and feedback was then solicited. The text feedback was...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.