Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
A comparison of relational databases versus traditional file systems is discussed. Rows are not records; columns are not fields, and tables are not files.
Keywords
Database
File system
US standard railroad gauge
COBOL
FORTRAN
C
BASIC
PL/I
Java
Procedural programming language
OO programming language
E-R diagrams
Peter Chen
Data Declaration Language (DDL)
Data Control Language (DCL)
It ain't so much the things we don't know that get us in trouble. It's the things we know that ain't so.
-Artemus Ward (William Graham Sumner), American Writer and Humorist, 1834-1867
Perfecting oneself is as much unlearning as it is learning
-Edsgar Dijkstra
If you already have a background in data processing with traditional file systems, the first things to unlearn are
(0) Databases are not file sets.
(1) Tables are not files.
(2) Rows are not records.
(3) Columns are not fields.
(4) Values in RDBMS are scalar, not structured (arrays, lists, meta-data).
Do not feel ashamed of getting stuck in a conceptual rut; every new technology has this problem.
The US standard railroad gauge (distance between the rails) is 4 ft, 8.5 in. This gauge is used because the English built railroads to that gauge and US railroads were built by English expatriates.
Why did the English build railroads to that gauge? Because the first rail lines were built by the same people who built the pre-railroad tramways, and that's the gauge they used. Why did those wheelwrights use that gauge then? Because the people who built the horse-drawn trams used the same jigs and tools that they used for building wagons, which used that wheel spacing.
Why did the wagons use that odd wheel spacing? For the practical reason that any other spacing would break an axle on some of the old, long distance roads, because this is the measure of the old wheel ruts.
So who built these old rutted roads? The first long distance roads in Europe were built by Imperial Rome for their legions and used ever since. The initial ruts were first made by Roman war chariots, which were of uniform military issue. The Imperial Roman chariots were made to be just wide enough to accommodate the back-ends of two war horses (this example is originally due to Professor Tom O'Hare, Germanic Languages, University of Texas at Austin; email: tohare@mail.utexas.edu).
This story does not end there, however. Look at a NASA Space Shuttle and the two big booster rockets attached to the sides of the main fuel tank. These are solid rocket boosters or SRBs. The SRBs are made by Thiokol at their factory at Utah. The engineers who designed the SRBs might have preferred to make them a bit fatter, but the SRBs had to be shipped by train from the factory to the launch site in Florida. The railroad line from the factory runs through a tunnel in the mountains and the SRBs have to fit through that tunnel. The tunnel is slightly wider than the railroad track. So, the major design feature of what is arguably the world's most advanced transportation system was determined by the width of a horse's ass.
In a similar fashion, modern data processing began with punch cards (Hollerith cards if you are really old) used by the Bureau of the Census. Their original size was that of a US dollar bill. This was set by their inventor, Herman Hollerith, because he could get furniture to store the cards from the US Treasury Department, just across the street. Likewise, physical constraints limited each card to 80 columns of holes in which to record a symbol.
The influence of the punch card lingered on long after the invention of magnetic tapes and disk for data storage. This is why early video display terminals were 80 columns across. Even today, files which were migrated from cards to magnetic tape files or disk storage still use 80 column physical records.
But the influence was not just on the physical side of data processing. The methods for handling data from the prior media were imitated in the new media.
Data processing first consisted of sorting and merging decks of punch cards (later, sequential magnetic tape files) in a series of distinct steps. The result of each step feed into the next step in the process. Think of the assembly line in a factory.
Databases and RDBMS in particular are nothing like the file systems that came with COBOL, FORTRAN, C, BASIC, PL/I, Java, or any of the procedural and OO programming languages. We used to say that SQL means "Scarcely Qualifies as a Language" because it has no I/O of its own. SQL depends on a host language to get and receive data to and from end users.
Programming languages are usually based on some underlying model; if you understand the model, the language makes much more sense. For example, FORTRAN is based on algebra. This does not mean that FORTRAN is exactly like algebra. But if you know algebra, FORTRAN does not look all that strange to you the way that LISP or APL would. You can write an expression in an assignment statement or make a good guess as to the names of library functions you have never seen before.
Likewise, COBOL is based on English narratives of business processes. The design of COBOL files (and almost every other early programming language) was derived from paper forms. The most primitive form of a file is a sequence of records that are ordered within the file and referenced by physical position.
You open a file (think file folder or in-basket on your desk) and then read a first record (think of the first paper form on the stack), followed by a series of next records (process the stack of paperwork, one paper form at a time) until you come to the last record to raise the end-of-file condition (put the file folder in the out-basket). Notice the work flow:
1. The records (paper forms) have to physically exist to be processed. Files are not virtual by nature. In fact, this mindset is best expressed by a quote from Samuel Goldwyn "a verbal contract ain't worth the paper it is written on!"
2. You navigate among these records and perform actions, one record at a time. You can go backward or forward in the stack but nowhere else.
3. The actions you take on one file (think of a clerk with rubber stamps) have no effect on other files that are not in the same program. The files are like file folders in another in-basket.
4. Only programs (the clerk processing the paperwork) can change files. The in-basket will not pick up a rubber stamp and mark the papers by itself.
The model for SQL is data kept in abstract sets, not in physical files. The "unit of work" in SQL is the whole schema, not individual tables. This is a totally different model of work! Sets are those mathematical abstractions you studied in school. Sets are not ordered and the members of a set are all of the same type. When you do an operation on a set, the action happens "all at once" to the entire membership. That is, if I ask for the subset of odd numbers from the set of positive integers, I get all them back as a single set. I do not build the set of odd numbers by sequentially inspecting one element at a time. I define odd numbers with a rule-"If the remainder is ± 1 when you divide the number by 2, it is odd"-that could test any integer and classify it. Parallel processing is one of many, many advantages of having a set-oriented model. In RDBMS, everything happens all at once.
The Data Declaration Language (DDL) in SQL is what defines and controls access to the database content and maintains the integrity of that data for all programs that access the database. Data in a file is passive. It has no meaning until a program reads it. In COBOL, each program has a DATA DIVISION; in FORTRAN, each program has the FORMAT/READ statements; in Pascal, there is a RECORD declaration that serves the same purpose. Pick your non-SQL language.
These constructs provide a template or parsing rules to overlay upon the records in the file, split them into fields and get the data into the host program. Each program can split up the sequence of characters in a record anyway it wishes name and name the fields as it wished. This can lead to "job security" programming; I worked in a shop in the 1970's where one programmer would pick a theme (nations of the world, flowers, etc.) and name his fields "Afghanistan" or "Chrysanthemum" or worse. Nobody could read his code, so we could not fire him.
Likewise, the Data Control Language (DCL) controls access to the schema objects that a user can create. Standard SQL divides the database users into USER and ADMIN roles. These schema objects require ADMIN privileges to be created, altered, or dropped (CREATE, ALTER, DROP, etc.). Those with...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.
Dateiformat: PDFKopierschutz: Adobe-DRM (Digital Rights Management)
Das Dateiformat PDF zeigt auf jeder Hardware eine Buchseite stets identisch an. Daher ist eine PDF auch für ein komplexes Layout geeignet, wie es bei Lehr- und Fachbüchern verwendet wird (Bilder, Tabellen, Spalten, Fußnoten). Bei kleinen Displays von E-Readern oder Smartphones sind PDF leider eher nervig, weil zu viel Scrollen notwendig ist. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.
Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Dateiformat: ePUBKopierschutz: Wasserzeichen-DRM (Digital Rights Management)
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet - also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Wasserzeichen-DRM wird hier ein „weicher” Kopierschutz verwendet. Daher ist technisch zwar alles möglich – sogar eine unzulässige Weitergabe. Aber an sichtbaren und unsichtbaren Stellen wird der Käufer des E-Books als Wasserzeichen hinterlegt, sodass im Falle eines Missbrauchs die Spur zurückverfolgt werden kann.