Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
This chapter discusses the definition of a data mining project, including its initial concept, motivation, objective, viability, estimated costs, and expected benefit (returns). Key considerations are defined, and a way of quantifying the cost and benefit is presented in terms of the factors that most influence the project. Two case studies illustrate how the cost/benefit evaluation can be applied to real-world projects.
Keywords
business objectives
cost
benefit
evaluation
influential factors
customer call center
mobile applications advertising
This chapter discusses the definition of a data mining project, including its initial concept, motivation, objective, viability, estimated costs, and expected benefit (returns). Key considerations are defined, and a way of quantifying the cost and benefit is presented in terms of the factors that most influence the project. Two case studies illustrate how the cost/benefit evaluation can be applied for real-world projects.
A commercial data analysis project that lives up to its expectations will probably do so because sufficient time was dedicated at the outset to defining the project’s business objectives. What is meant by business objectives? The following are some examples:
• Reduce the loss of existing customers by 3 percent.
• Augment the contract signings of new customers by 2 percent.
• Augment the sales from cross-selling products to existing customers by 5 percent.
• Predict the television audience share with a probability of 70 percent.
• Predict, with a precision of 75 percent, which clients are most likely to contract a new product.
• Identify new categories of clients and products.
• Create a new customer segmentation model.
The first three examples define a specific percentage of precision and improvement as part of the objective.
Business Objective
Assigning a Value for Percent Improvement
The percentage improvement should always be considered with regard to the current precision of an existing index as a baseline. Also, the new precision objective should not get lost in the error bars of the current precision. That is, if the current precision has an error margin of ± 3% in its measurement or calculation, this should be taken into account.
In the fourth and fifth examples, an absolute value is specified for the desired precision for the data model. In the final two examples the desired improvement is not quantified; instead, the objective is expressed in qualitative terms.
This section enumerates some main issues and poses some key questions relevant to evaluating the viability of a potential data mining project. The checklists of general and specific considerations provided here are the bases for the rest of the chapter, which enters into a more detailed specification of benefit and cost criteria and applies these definitions to two case studies.
The following is a list of questions to ask when considering a data analysis project:
• Is data available that is consistent and correlated with the business objectives?
• What is the capacity for improvement with respect to the current methods? (The greater the capacity for improvement, the greater the economic benefit.)
• Is there an operational business need for the project results?
• Can the problem be solved by other techniques or methods? (If the answer is no, the profitability return on the project will be greater.)
• Does the project have a well-defined scope? (If this is the first instance of a project of this type, reducing the scale of the project is recommended.)
The following list provides specific considerations for evaluating the viability of a data mining project in terms of the available data:
• Does the necessary data for the business objectives exist, and does the business have access to it?
• If part or all of the data does not exist, can processes be defined to capture or obtain it?
• What is the coverage of the data with respect to the business objectives?
• What is the availability of a sufficient volume of data over a required period of time, for all clients, product types, sales channels, and so on? (The data should cover all the business factors to be analyzed and modeled. The historical data should cover the current business cycle.)
• Is it necessary to evaluate the quality of the available data in terms of reliability? (The reliability depends on the percentage of erroneous data and incomplete or missing data. The ranges of values must be sufficiently wide to cover all cases of interest.)
• Are people available who are familiar with the relevant data and the operational processes that generate the data?
There are several factors that influence the benefits of a project. A qualitative assessment of current functionality is first required: what is the current grade of satisfaction of how the task is being done? A value between 1 and 0 is assigned, where 1 is the highest grade of satisfaction and 0 is the lowest, where the lower the current grade of satisfaction, the greater the improvement and, consequently, the benefit, will be.
The potential quality of the result (the evaluation of future functionality) can be estimated by three aspects of the data: coverage, reliability, and correlation:
• The coverage or completeness of the data, assigned a value between 0 and 1, where 1 indicates total coverage.
• The quality or reliability of the data, assigned a value between 0 and 1, where 1 indicates the highest quality. (Both the coverage and the reliability are normally measured variable by variable, giving a total for the whole dataset. Good coverage and reliability for the data help to make the analysis a success, thus giving a greater benefit.)
• The correlation between the data and its grade of dependence with the business objective can be statistically measured. A correlation is typically measured as a value from –1 (total negative correlation) through 0 (no correlation) to 1 (total positive correlation). For example, if the business objective is that clients buy more products, the correlation would be calculated for each customer variable (age, time as a customer, zip code of postal address, etc.) with the customer’s sales volume.
Once individual values for coverage, reliability, and correlation are acquired, an estimation of the future functionality can be obtained using the formula:
An estimation of the possible improvement is then determined by calculating the difference between the current and the future functionality, thus:
A fourth aspect, volatility, concerns the amount of time the results of the analysis or data modeling will remain valid.
Volatility of the environment of the business objective can be defined as a value of between 0 and 1, where 0 = minimum volatility and 1 = maximum volatility. A high volatility can cause models and conclusions to become quickly out of date with respect to the data; even the business objective can lose relevance. Volatility depends on whether the results are applicable over the long, medium, or short terms with respect to the business cycle.
Note that this a priori evaluation gives an idea for the viability of a data mining project. However, it is clear that the quality and precision of the end result will also depend on how well the project is executed: analysis, modeling, implementation, deployment, and so on. The next section, which deals with the estimation of the cost of the project, includes a factor (expertise) that evaluates the availability of the people and skills necessary to guarantee the a posteriori success of the project.
There are numerous factors that influence how much a project costs. These include:
• Accessibility: The more data sources, the higher the cost. Typically, there are at least two different data sources.
• Complexity: The greater the number of variables in the data, the greater the cost. Categorical-type variables (zones, product types, etc.) must especially be taken into account, given that each variable may have many possible values (for example, 50). On the other hand, there could be just 10 other variables, each of which has only two possible...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.
Dateiformat: PDFKopierschutz: Adobe-DRM (Digital Rights Management)
Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.
Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Dateiformat: ePUBKopierschutz: Wasserzeichen-DRM (Digital Rights Management)
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Wasserzeichen-DRM wird hier ein „weicher” Kopierschutz verwendet. Daher ist technisch zwar alles möglich – sogar eine unzulässige Weitergabe. Aber an sichtbaren und unsichtbaren Stellen wird der Käufer des E-Books als Wasserzeichen hinterlegt, sodass im Falle eines Missbrauchs die Spur zurückverfolgt werden kann.