Chapter 2 Code Relationally It is necessary to develop a relational mindset to become a successful Db2 programmer who codes with performance in mind. But what does this mean? First, we must understand what a relational database system is and how that differs from other types of data storage. What is a Database? Before we talk about relational database, let's first answer the question: "What is a database?" A database is a large structured set of persistent data. So, a phone book is a database. But within the world of IT, a database usually is associated with software. A simple database might be a single file containing many records, each of which contains the same fields having the same data type and length. In short, a database is an organized store of data wherein the data is accessible by named data elements. A Database Management System (DBMS) is a software package designed to create, store, and manage databases. The DBMS software enables end users or application programmers to share data. It provides a systematic method of creating, updating, retrieving and storing information in a database. DBMSs are generally responsible for data integrity, data access control, and automated rollback, restart and recovery. In layman's terms, you can think of a database as a file of information. You can think of the filing cabinet itself along with the file folders and labels as the DBMS. A DBMS manages databases. You implement and access database instances using the capabilities of the DBMS. Db2 is a database management system. Your payroll application uses the payroll database, which may be implemented using Db2 (or some other DBMS). It is important to understand this distinction to avoid confusion as we move forward. Relational Database Systems Relational database systems became the norm in IT in the 1980s as low-cost servers became powerful enough to make them widely practical and relatively affordable. There are other types of database systems available (such as NoSQL, hierarchical, and network) but the RDBMS, of which Db2 is one of the leading offerings, continues to be the leader in terms of usage, revenue, and installed base. Relational technology is based on the mathematics of set theory. Relational databases provide data storage, access and protection with reasonable performance for most applications, whether operational or analytical in nature. The RDBMS is adaptable to most use cases in a reliable and efficient way. The term relational comes from the mathematical term relation. In set theory, a relation is a set of unordered elements - all of the same type. A relational DBMS is based on relations. This overview of relational theory offers a quick introduction. It is important to note that today's database systems that are referred to as relational do not conform to all the requirements and definition of relational theory. For additional references that can offer more details, consult the Bibliography at the end of this chapter. How to Think About Data in a Db2 Database Working with a relational mindset is an important requirement for writing efficient Db2 application programs. Doing so requires an understanding of how data is stored and referenced by a relational DBMS like Db2. A database is a complex set of inter-related data designed for a specific intent. Do not think of the database as a set of files. Files have no relationships set within and among them, whereas your database does. Furthermore, do not think of tables as files because they are based on sets: Sets are not ordered whereas files have a specific order to them. Although there are performance-specific physical storage details that are important to learn (and we will cover them later) your relational mindset should be that tables are unordered sets of data. And members of each set are all of the same type. That means that each row has the same number of columns each of the same data type. When you perform an operation on a set, the action happens "all at once" to all the members of the set. Programmers tend to think in terms of sequential operations such as: Read x, multiply it by 2, save it to a new location, read another x until there are no more. We can accomplish all of this in one SQL statement with something like this: There is no looping and all the actions are contained in the single SQL statement. It is imperative to be able to think this way to write Db2 applications programs that perform well. Additionally, rows are not records. Records contained in files or data sets are sequential, stored in the order they were written. Db2 rows have no specific physical order and can be accessed by coding the appropriate SQL WHERE clauses. Finally, columns are not fields. Columns are typed and can be null. This is not so for fields. Without a program, a field has no meaning. How This Should Impact Your Coding Application developers accustomed to processing data a record-at-a-time will make very poor Db2 programmers without some training in the set-at-a-time nature of accessing data in a relational database. If you have experience programming with flat files you must unlearn the "flat file" mentality. Forget sequentially accessing data record by record. Access what you need using the features of SQL. Master file processing is not appropriate for optimal Db2 applications. With master file processing two or more files are read with one driving reads to the other. For example, consider a program designed to send offers to all customers who purchased dairy items in November. The master file approach would read the customer purchase history file looking for dairy items purchased in November. When it finds one it will take the customer id read from the history file and use it to read from the customer file to gather the customer address. The SQL approach simply joins the two tables (customer purchase history and customer) using the customer id with where conditions to limit the output to dairy items in November. Here is what the SQL solution looks like: Another aspect of coding relationally is to understand cursors. Remember, a SQL select statement can return multiple rows. A cursor is used to enable application programs to access individual rows. The select statement is assigned to a cursor, which is opened by the program, and then rows are fetched from the cursor one by one. For example: When you open a cursor in your program to process a SQL statement it is not the same as opening a file. Opening a cursor can cause a lot of activity to occur (e.g. sorting) whereas opening a file is a benign operation. Set-at-a-Time Processing and Relational Closure Every operation performed on a relational database operates on a table (or set of tables) and results in another table. This feature of relational databases is called relational closure. All SQL data manipulation operations-Select, Insert, Update, Delete-are performed at a set level. One retrieval statement can return multiple rows; one modification can modify multiple rows. Application developers must learn a different way of interacting with data because of the set-at-a-time nature of SQL. Most programming languages operate on data one record-at-a-time. When a program requires relational data, though, it must request the data using SQL, creating an impedance mismatch. The program expects data to be returned a single row-at-a-time, but SQL returns data a set-at-a-time. Db2 provides a feature called a cursor that accepts the input from a SQL request and provides a mechanism to fetch individual rows of the results set. Some programming tools automatically transform multi-row sets to single rows when communicating with Db2. Furthermore, many programmers are accustomed to hard-wiring data-navigational instructions into their programs. SQL specifies what to retrieve but not how to retrieve it. Db2 determines how best to retrieve the data based on the request. Programmers unaccustomed to database processing are unlikely to grasp this concept without some training. Relational Optimization Db2 determines the best method for accessing and modifying data based on your SQL statements, information about your system and database statistics. The same SQL statement can be optimized by Db2 to do the same work in many ways. This is a key benefit of using SQL instead of writing host language code. There is a component of Db2, known as the Optimizer, that processes SQL and creates executable code for it. The Optimizer is very complex and understanding all the nuances of how it works is not something most Db2 programmers need to know. There are important aspects of optimization, such as types of access paths, filter factors, and indexing, that will be covered later in this text. Let Db2 Do the Work An important guideline for coding relationally is to let Db2 do as much work as...