1
Imperfection and Geographic Information
"We should learn to navigate on a sea of uncertainties, sailing in and around islands of certainty"
Edgar Morin, Seven Complex Lessons in Education for the Future (2000)
"Uncertainty is not in things but in our head: uncertainty is a lack of knowledge"
Jacques Bernoulli, Ars Conjectandi (1713)
1.1. Context
Today, geographic information is everywhere. With the constant development of new information and communication technologies, we are witnessing a significant increase in the number of sources of georeferenced data. Data are acquired by IT (information technology) means, such as connected objects, computers, mobile equipment, and through remote sensing, and are then processed in Geographic Information Systems (GISs). The increasing systematization of the automated acquisition of geographic data is paving the way for ever more numerous and complex applications.
In several fields, the terms "data" and "information" are quite often considered to be interchangeable. Yet, many distinguish between the concept of information and that of data [COO 17]. A piece of data corresponds to a value. It may be seen as the assignment of values to properties, for example, City = "Paris". Sometimes, the types of data are complex, as is the case for multimedia data. When data are processed, organized together, and structured in a precise context, we refer to it as information. In IT, knowledge often corresponds to rules and models that rely on information [BEL 04]. A knowledge base will make it possible, among other things, to reason and make deductions [ABI 00, NIL 90].
Information and data may be geographic or spatial [BEA 19]. "Geographic" is the adjective used when we refer to the Earth. In the field on which this book focuses, the term "spatial" usually refers to a localization (coordinates, topology, etc.) in some type of space (whether geographic or not). A spatial or geographic object has a geometry (a dimension, a shape, some coordinates) that may be more or less known or established. Different properties may be assigned to the object depending on its meaning. The field that studies the methods and technologies linked to geographic information (from its acquisition to its dissemination) is called "geomatics". The geomatic paradigm was born in Canada [BÉD 07].
Objects are often affected by imperfections. In the literature, various terms are used to refer to these imperfections, so it is difficult to put forward only one type of terminology. Depending on the points of view, the same term may be defined in a different manner.
The imperfection of information and geographic data is often neglected so, occasionally, there are risks involved when using them [BÉD 86, EDO 15]. For example, these risks are significant when data are used to help decision-making. Imperfection often derives from a restriction that hinders the correct identification of an object and/or the accurate measurement of its properties [BÉD 86]. In most cases, a representation said to be certain is used even if the object has not been completely defined. There will be a difference between the object and its representation. Finding out this difference is indeed a difficult and intricate task. Conveying this difference implies an "actual world" independent of the observer. This is often difficult and complex [BÉD 86] as the objects of the actual world are in general perceived and known through observations. According to [FIS 99], the main problem concerns the way in which a data collector and a data user understand the natures of uncertainties, which may be of different kinds.
As this book demonstrates, it is possible to avoid overlooking data imperfection. There are solutions that allow us to manipulate imperfect geographic data effectively. Over time, various specific techniques and methods have been put forward to define, represent, and deal with the imperfection of a geographic object. Each of them may be used in relation to the level of quality expected and the application targeted. As [EDO 15] and [BÉD 86] recall, using imperfect data may indeed be acceptable for some uses but not for others. This book aims to present some of the techniques and methods used to manage the imperfection of geographic data.
In order to give a (very general) trend to the theme of uncertainty of spatial data in the scientific domain, in Table 1.1 and in Figure 1.1, we present the search results in Scopus, a bibliographical and scientific database. A 25-year interval (1994-2018) is considered. Column A indicates the number of scientific publications whose keywords include the terms "spatial data" and "uncertainty". Column B shows the number of publications that include "spatial data" in their keywords. Column C shows the ratio A/B over these 25 years, which corresponds to 2.25%. The chosen terms, i.e. "spatial data" and "uncertainty", are quite emblematic of the topic we are focused on. Yet, the Scopus searches could certainly be refined, especially through a test with various keywords of concepts related to data and spatialized information as well as imprecision.
In this chapter, we will introduce the different parts of this book while also revealing which issues they tackle. We have chosen to structure this book into three different sections: an introduction of the foundations and main concepts, a part on the modes of representation, and then a description of reasoning systems and processes.
1.2. Concepts, representation, reasoning system, and data processing
1.2.1. Foundations and concepts
The first part describes the foundations and main concepts related to the imperfection of geographic data. The issue is to shed light on and provide a summary of terminologies, the origins of imperfections, as well as the concepts of quality, integrity, and confidence.
The main goal of this chapter is to clarify the terminology and the definitions assigned to various concepts that revolve around the imperfection and uncertainty of geographic information. These terms have been used in different ways over the years. This chapter will underline some definitions that can be found in the field. The analysis put forward does not lead to a new terminology. Rather, it brings into relief the diversity of uses while also highlighting the main differences and similarities between the concepts and the terminologies.
Chapter 2 introduces the principal sources of imperfections. It attempts to answer the following question: "where do the imperfections of geographic data originate?". Naturally, there is more than one answer to this question. There are different causes behind these imperfections. One of the aims of this chapter is to show and illustrate imperfection at various points during the life cycle of geographic information.
Chapter 3 provides a basic explanation of the quality and integrity of data. On several occasions, it recalls standard quality criteria as well as the way in which they are assessed. This chapter establishes the notions of data integrity and confidence, and it concretely illustrates the various problems related to these concepts through examples drawn from the field of maritime navigation.
1.2.2. Representations of imperfection
Part 2 tackles the main representations of imperfection and their applications for geographic information.
Chapter 5 describes various modeling formalisms, especially fuzzy sets and the means of representing confidence and certainty (probability, possibility, necessity, etc.). It also presents the operations used to manipulate these concepts and reveals how spatial entities like broad boundary objects and fuzzy objects can be modeled.
Chapter 6 focuses on the representation of classes of objects. When several objects share the same properties and are of the same type, they can be grouped into classes. Elements of the same class share the same characteristics. Thus, establishing classes denotes identifying the points in common among the various entities. Defining classes is very important when a dataset must be described. This chapter reveals how it is possible to describe data imperfection when drawing class diagrams.
1.2.3. Reasoning systems and data processing
Part 3 introduces a few data processing and reasoning systems that involve spatial objects. Imperfection is considered in relation to our knowledge about the objects.
Chapter 7 concerns the spatial relations among objects. It reveals how it is possible to reason specifically about these relations and then move on to modeling these relations on imperfect objects such as broad boundary objects in space.
Chapter 8 deals with a type of knowledge that is founded on rules and deduction. This chapter provides rational approaches that employ a type of modeling based on the rules in first-order logic and then in modal logic. Modal logic can describe uncertainties. This chapter includes an example involving geographic data so that this approach can be understood.
Chapter 9 deals with the case involving the gradual acquisition of information and the repeated revision of the state of knowledge. This necessity is all the more significant as geographic data may be acquired in various ways over time and their sources may be...