Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
In This Chapter
Although Excel is traditionally considered the premier tool for data analysis and reporting, it has some inherent characteristics that often lead to issues revolving around scalability, transparency of analytic processes, and confusion between data and presentation. Over the last several years, Microsoft has recognized this and created tools that allow you to develop reporting and business intelligence by connecting to various external databases. Microsoft has gone a step further with Excel 2013, offering business intelligence (BI) tools like Power Pivot natively; it effectively allows you to build robust relational data models within Excel.
With the introduction of these BI tools, it’s becoming increasingly important for you to understand core database fundamentals. Unlike traditional Excel concepts, where the approach to developing solutions is relatively intuitive, good database-driven development requires a bit of prior knowledge. There are a handful of fundamentals you should know before jumping into the BI tools. These include database terminology, basic database concepts, and database best practices.
The topics covered in this chapter explain the concepts and techniques necessary to successfully use database environments and give you the skills needed to normalize data and plan and implement effective tables.
If you’re already familiar with the concepts involved in database design, you may want to skim this chapter. If you’re new to the world of databases, spend some time in this chapter gaining a thorough understanding of these important topics.
Managers, accountants, and analysts have had to accept one simple fact over the years: Their analytical needs had outgrown Excel. They all met with fundamental issues that stemmed from one or more of Excel’s three problem areas: scalability, transparency of analytical processes, and separation of data and presentation.
Scalability is the ability for an application to develop flexibly to meet growth and complexity requirements. In the context of Excel, scalability refers to Excel’s ability to handle ever-increasing volumes of data. Most Excel aficionados are quick to point out that as of Excel 2007, you can place 1,048,576 rows of data into a single Excel worksheet. This is an overwhelming increase from the limitation of 65,536 rows imposed by previous versions of Excel. However, this increase in capacity does not solve all of the scalability issues that inundate Excel.
Imagine that you're working in a small company and using Excel to analyze your daily transactions. As time goes on, you build a robust process complete with all the formulas, PivotTables, and macros you need to analyze the data that is stored in your neatly maintained worksheet.
As your data grows, you start to notice performance issues. Your spreadsheet becomes slow to load and then slow to calculate. Why does this happen? It has to do with the way Excel handles memory. When an Excel file is loaded, the entire file is loaded into RAM. Excel does this to allow for quick data processing and access. The drawback to this behavior is that each time something changes in your spreadsheet, Excel has to reload the entire spreadsheet into RAM. A large spreadsheet takes a great deal of RAM to process even the smallest change. Eventually, each action you take in your gigantic worksheet will result in an excruciating wait.
Your PivotTables will require bigger pivot caches (memory containers), almost doubling your Excel workbook’s file size. Eventually, your workbook will become too big to distribute easily. You may even consider breaking down the workbook into smaller workbooks (possibly one for each region). This causes you to duplicate your work.
In time, you may eventually reach the 1,048,576-row limit of your worksheet. What happens then? Do you start a new worksheet? How do you analyze two datasets on two different worksheets as one entity? Are your formulas still good? Will you have to write new macros?
These are all issues that need to be dealt with.
You can find various clever ways to work around these limitations. In the end, though, they are just workarounds. Eventually you will begin to think less about the most effective way to perform and present analysis of your data and more about how to make something “fit” into Excel without breaking your formulas and functions. Excel is flexible enough that you can make most things “fit” into Excel just fine. However, when you think only in terms of Excel, you’re limiting yourself, albeit in an incredibly functional way.
In addition, these capacity limitations often force you to have the data prepared for you. That is, someone else extracts large chunks of data from a large database, then aggregates and shapes the data for use in Excel. Should you always depend on someone else for your data needs? What if you have the tools to “access” vast quantities of data without relying on others to provide data? Could you be more valuable to the organization? Could you focus on the accuracy of the analysis and the quality of the presentation instead of routing Excel data maintenance?
A relational database system (like Access or SQL Server) is a logical next step. Most database system tables take very few performance hits with larger datasets and have no predetermined row limitations. This allows you to handle larger datasets without requiring the data to be summarized or prepared to fit into Excel. Also, if a process becomes more crucial to the organization and needs to be tracked in a more “enterprise-acceptable” environment, it's easier to upgrade and scale up if that process is already in a relational database system.
One of Excel’s most attractive features is its flexibility. Each individual cell can contain text, a number, a formula, or practically anything else you define. Indeed, this is one of the fundamental reasons Excel is such an effective tool for data analysis. You can use named ranges, formulas, and macros to create an intricate system of interlocking calculations, linked cells, and formatted summaries that work together to create a final analysis.
The problem with that is there is no transparency of analytical processes, meaning it is extremely difficult to determine what is actually going on in a spreadsheet. If you've ever had to work with a spreadsheet created by someone else you know all too well the frustration that comes with deciphering the various gyrations of calculations and links being used to perform an analysis. Small spreadsheets that perform a modest analysis are painful to decipher but are usually still workable, while large, elaborate, multi-worksheet workbooks are virtually impossible to decode, often leaving you to start from scratch.
Compared to Excel, database systems might seem rigid, strict, and unwavering in their rules. However, all this rigidity comes with a benefit.
Because only certain actions are allowable, you can more easily come to understand what is being done within structured database objects, such as queries or stored procedures. If a dataset is being edited, a number is being calculated, or any portion of the dataset is being affected as a part of an analytical process, you can readily see that action by reviewing the query syntax or reviewing the stored procedure code. Indeed, in a relational database system, you never encounter hidden formulas, hidden cells, or dead named ranges.
Data should be separate from presentation; you do not want the data to become too tied into any one particular way of presenting it. For example, when you receive an invoice from a company, you don’t assume that the financial data on that invoice is the true source of your data. It is a presentation of your data. It can be presented to you in other manners and styles on charts or on Web sites, but such representations are never the actual source of the data.
What exactly does this concept have to do with Excel? People who perform data analysis with Excel tend to fuse the data, the analysis, and the presentation together. For example, you often see an Excel workbook that has 12 worksheets, each representing a month. On each worksheet, data for that month is listed along with formulas, PivotTables, and summaries. What happens when you're asked to provide a summary by quarter? Do you add more formulas and worksheets to consolidate the data on each of the month worksheets? The fundamental problem in this scenario is that the worksheets actually represent data values that are fused into the presentation of your analysis. The point here is that data should not be tied to a particular presentation, no matter how apparently logical or useful it may be. However, in Excel, it happens all the time.
In addition, because all manners and phases of analysis can be done directly within a spreadsheet, Excel cannot effectively provide adequate transparency to the analysis. Each cell has the potential of holding hidden formulas and containing links to other cells. In Excel, the line between analysis and data is blurred, which makes it difficult to determine exactly what is going on in a...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.
Dateiformat: PDFKopierschutz: Adobe-DRM (Digital Rights Management)
Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.
Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!