Distributed Database Management Systems

Name: Distributed Database Management Systems | A Practical Approach
Brand: Wiley-IEEE Computer Society Press
Price: 143.99 EUR
Availability: OnlineOnly

A Practical Approach

Saeed K. Rahimi Frank S. Haug(Autor*in)

Wiley-IEEE Computer Society Press

Erschienen am 13. Februar 2015

768 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-118-04353-0 (ISBN)

143,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Preface.

1 Introduction.

1.1 Database Concepts.

1.2 DBE Architectural Concepts.

1.3 Archetypical DBE Architectures.

1.4 A New Taxonomy.

1.5 An Example DDBE.

1.6 A Reference DDBE Architecture.

1.7 Transaction Management in Distributed Systems.

1.8 Summary.

1.9 Glossary.

References.

2 Data Distribution Alternatives.

2.1 Design Alternatives.

2.2 Fragmentation.

2.3 Distribution Transparency.

2.4 Impact of Distribution on User Queries.

2.5 A More Complex Example.

2.6 Summary.

2.7 Glossary.

References.

Exercises.

3 Database Control.

3.1 Authentication.

3.2 Access Rights.

3.3 Semantic Integrity Control.

3.4 Distributed Semantic Integrity Control.

3.5 Cost of Semantic Integrity Enforcement.

3.6 Summary.

3.7 Glossary.

References.

Exercises.

4 Query Optimization.

4.1 Sample Database.

4.2 Relational Algebra.

4.3 Computing Relational Algebra Operators.

4.4 Query Processing in Centralized Systems.

4.5 Query Processing in Distributed Systems.

4.6 Summary.

4.7 Glossary.

References.

Exercises.

5 Controlling Concurrency.

5.1 Terminology.

5.2 Multitransaction Processing Systems.

5.3 Centralized DBE Concurrency Control.

5.4 Concurrency Control in Distributed Database Systems.

5.5 Summary.

5.6 Glossary.

References.

Exercises.

6 Deadlock Handling.

6.1 Deadlock Definition.

6.2 Deadlocks in Centralized Systems.

6.3 Deadlocks in Distributed Systems.

6.4 Summary.

6.5 Glossary.

References.

Exercises.

7 Replication Control.

7.1 Replication Control Scenarios.

7.2 Replication Control Algorithms.

7.3 Summary.

7.4 Glossary.

References.

Exercises.

8 Failure and Commit Protocols.

8.1 Terminology.

8.2 Undo/Redo and Database Recovery.

8.3 Transaction States Revisited.

8.4 Database Recovery.

8.5 Other Types of Database Recovery.

8.6 Recovery Based on Redo/Undo Processes.

8.7 The Complete Recovery Algorithm.

8.8 Distributed Commit Protocols.

8.9 Summary.

8.10 Glossary.

References.

Exercises.

9 DDBE Security (Bradley S. Rubini).

9.1 Cryptography.

9.2 Securing Communications.

9.3 Securing Data.

9.4 Architectural Issues.

9.5 A Typical Deployment.

9.6 Summary.

9.7 Glossary.

References.

Exercises.

10 Data Modeling Overview.

10.1 Categorizing MLs and DMs.

10.2 The Conceptual Level of the CLP.

10.3 Conceptual Modeling Language Examples.

10.4 Working With Data Models.

10.5 Using Multiple Types of Modeling.

10.6 Summary.

10.7 Glossary.

References.

Exercises.

11 Logical Data Models.

11.1 The RDM.

11.2 The Network Data Model.

11.3 The Hierarchical Data Model.

11.4 The OODM.

11.5 Summary.

11.6 Glossary.

References.

Exercises.

12 Traditional DDBE Architectures.

12.1 Applying Our Taxonomy to Traditional DDBE Architectures.

12.2 The MDBS Architecture Classifications.

12.3 Approaches for Developing A DDBE.

12.4 Deployment of DDBE Software.

12.5 Integration Challenges.

12.6 Schema Integration Example.

12.7 Example of Existing Commercial DDBEs.

12.8 The Experiment.

12.9 Summary.

12.10 Glossary.

References.

Exercises.

13 New DDBE Architectures.

13.1 Cooperative DBEs.

13.2 Peer-to-Peer DDBEs.

13.3 Comparing COOP and P2P.

13.4 Summary.

13.5 Glossary.

References.

Exercises.

14 DDBE Platform Requirements.

14.1 DDBE Architectural Vocabulary.

14.2 Fundamental Platform Requirements.

14.3 Distributed Process Platform Requirements.

14.4 Distributed Data Platform Requirements.

14.5 Preview of the DDBE Platforms Used in Chapters 15-9.

14.6 Summary.

14.7 Glossary.

References.

Exercises.

15 The JMS Starter Kit.

15.1 Java Message Service Overview.

15.2 JMS Provider Implementation Alternatives.

15.3 JMS Starter Kit (JMS-SKIT) Framework Overview.

15.4 Using the JMS-SKIT Framework.

15.5 Summary.

15.6 Glossary.

References.

Exercises.

16 The J2EE Platform.

16.1 Java 2 Enterprise Edition (J2EE) Overview.

16.2 J2EE Support for Distributed Process Platform Requirements.

16.3 J2EE Support for Distributed Data Platform Requirements.

16.4 J2EE Platform Implementation Alternatives.

16.5 Summary.

16.6 Glossary.

References.

Exercises.

17 The J2EE Starter Kit.

17.1 Java 2 Enterprise Edition Starter Kit (J2EE-SKIT) Overview.

17.2 J2EE-SKIT Design Overview.

17.3 Summary.

17.4 Glossary.

References.

Exercises.

18 The Microsoft .NET Platform.

18.1 Platform Overview.

18.2 Support for Distributed Process Platform Requirements.

18.3 Distributed Data Platform Requirements.

18.4 Summary.

18.5 Glossary.

References.

Exercises.

19 The DNET Starter Kit.

19.1 DNET-SKIT Overview.

19.2 DNET-SKIT Design Overview.

19.3 Summary.

19.4 Glossary.

Reference.

Exercises.

Index.

CHAPTER 1 INTRODUCTION

Distributed: (adjective) of, relating to, or being a computer network in which at least some of the processing is done by the individual workstations and information is shared by and often stored at the workstations.

-Merriam-Webster's 11th Collegiate Dictionary

Database (noun) a [sic] usually large collection of data organized especially for rapid search and retrieval (as by a computer).

-Merriam-Webster's 11th Collegiate Dictionary

Informally speaking, a database (DB) is simply a collection of data stored on a computer, and the term distributed simply means that more than one computer might cooperate in order to perform some task. Most people working with distributed databases would accept both of the preceding definitions without any reservations or complaints. Unfortunately, achieving this same level of consensus is not as easy for any of the other concepts involved with distributed databases (DDBs). A DDB is not simply "more than one computer cooperating to store a collection of data"-this definition would include situations that are not really distributed databases, such as any machine that contains a DB and also mounts a remote file system from another machine. Similarly, this would be a bad definition because it would not apply to any scenario where we deploy a DDB on a single computer. Even when a DDB is deployed using only one computer, it remains a DDB because it is still possible to deploy it across multiple computers. Often, in order to discuss a particular approach for implementing a DB, we need to use more restrictive and specific definitions. This means that the same terms might have conflicting definitions when we consider more than one DB implementation alternative. This can be very confusing when researching DBs in general and especially confusing when focusing on DDBs. Therefore, in this chapter, we will present some definitions and archetypical examples along with a new taxonomy. We hope that these will help to minimize the confusion and make it easier to discuss multiple implementation alternatives throughout the rest of the book.

1.1 DATABASE CONCEPTS

Whenever we use the term "DB" in this book, we are always contemplating a collection of persistent data. This means that we "save" the data to some form of secondary storage (the data usually written to some form of hard disk). As long as we shut things down in an orderly fashion (following the correct procedures as opposed to experiencing a power failure or hardware failure), all the data written to secondary storage should still exist when the system comes back online. We can usually think of the data in a DB as being stored in one or more files, possibly spanning several partitions, or even several hard disk drives-even if the data is actually being stored in something more sophisticated than a simple file.

1.1.1 Data Models

Every DB captures data in two interdependent respects; it captures both the data structure and the data content. The term "data content" refers to the values actually stored in the DB, and usually this is what we are referring to when we simply say "data." The term "data structure" refers to all the necessary details that describe how the data is stored. This includes things like the format, length, location details for the data, and further details that identify how the data's internal parts and pieces are interconnected. When we want to talk about the structure of data, we usually refer to it as the data model (DM) (also called the DB's schema, or simply the schema). Often, we will use a special language or programmatic facility to create and modify the DM. When describing this language or facility, authors sometimes refer to the facility or language as "the data model" as well, but if we want to be more precise, this is actually the data modeling language (ML)-even when there is no textual language. The DM captures many details about the data being stored, but the DM does not include the actual data content. We call all of these details in the DM metadata, which is informally defined as "data about data" or "everything about data except the content itself." We will revisit data models and data modeling languages in Chapters 10 and 11.

1.1.2 Database Operations

Usually, we want to perform several different kinds of operations on DBs. Every DB must at least support the ability to "create" new data content (store new data values in the DB) and the ability to retrieve existing data content. After all, if we could not create new data, then the DB would always be empty! Similarly, if we could not retrieve the data, then the data would serve no purpose. However, these operations do not need to support the same kind of interface; for example, perhaps the data creation facility runs as a batch process but the retrieval facility might support interactive requests from a program or user. We usually expect newer DB software to support much more sophisticated operations than minimum requirements dictate. In particular, we usually want the ability to update and delete existing data content. We call this set of operations CRUD (which stands for "create, retrieve, update, and delete"). Most modern DBs also support similar operations involving the data structures and their constituent parts. Even when the DBs support these additional "schema CRUD" operations, complicated restrictions that are dependent on the ML and sometimes dependent on very idiosyncratic deployment details can prevent some schema operations from succeeding.

Some DBs support operations that are even more powerful than schema and data CRUD operations. For example, many DBs support the concept of a query, which we will define as "a request to retrieve a collection of data that can potentially use complex criteria to broaden or limit the collection of data involved." Likewise, many DBs support the concept of a command, which we will define as "a request to create new data, to update existing data, or to delete existing data-potentially using complex criteria similar to a query." Most modern DBs that support both queries and commands even allow us to use separate queries (called subqueries) to specify the complex criteria for these operations.

Any DB that supports CRUD operations must consider concurrent access and conflicting operations. Anytime two or more requests (any combination of queries and commands) attempt to access overlapping collections of data, we have concurrent access. If all of the operations are only retrieving data (no creation, update, or deletion), then the DB can implement the correct behavior without needing any sophisticated logic. If any one of the operations needs to perform a write (create, update, or delete), then we have conflicting operations on overlapping data. Whenever this happens, there are potential problems-if the DB allows all of the operations to execute, then the execution order might potentially change the results seen by the programs or users making the requests. In Chapters 5, 6, and 8, we will discuss the techniques that a DB might use to control these situations.

1.1.3 Database Management

When DBs are used to capture large amounts of data content, or complex data structures, the potential for errors becomes an important concern-especially when the size and complexity make it difficult for human verification. In order to address these potential errors and other issues (like the conflicting operation scenario that we mentioned earlier), we need to use some specialized software. The DB vendor can deploy this specialized software as a library, as a separate program, or as a collection of separate programs and libraries. Regardless of the deployment, we call this specialized software a database management system (DBMS). Vendors usually deploy a DBMS as a collection of separate programs and libraries.

1.1.4 DB Clients, Servers, and Environments

There is no real standard definition for a DBMS, but when a DBMS is deployed using one or more programs, this collection of programs is usually referred to as the DB-Server. Any application program that needs to connect to a DB is usually referred to as the DB-Client. Some authors consider the DB-Server and the DBMS to be equivalent-if there is no DB-Server, then there is no DBMS; so the terms DBMS-Server and DBMS-Client are also very common. However, even when there is no DB-Server, the application using the DB is still usually called the DB-Client. Different DBMS implementations have different restrictions. For example, some DBMSs can manage more than one DB, while other implementations require a separate DBMS for each DB.

Because of these differences (and many more that we will not discuss here), it is sometimes difficult to compare different implementations and deployments. Simply using the term "DBMS" can suggest certain features or restrictions in the mind of the reader that the author did not intend. For example, we expect most modern DBMSs to provide certain facilities, such as some mechanism for defining and enforcing integrity constraints-but these facilities are not necessarily required for all situations. If we were to use the term "DBMS" in one of these situations where these "expected" facilities were not required, the reader might incorrectly assume that the...

Inhalt (EPUB)

Systemvoraussetzungen

Als PDF speichern Als Link merken