
Computational Models for Cognitive Vision
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Computational Models for Cognitive Vision formulates the computational models for the cognitive principles found in biological vision, and applies those models to computer vision tasks. Such principles include perceptual grouping, attention, visual quality and aesthetics, knowledge-based interpretation and learning, to name a few. The author's ultimate goal is to provide a framework for creation of a machine vision system with the capability and versatility of the human vision.
Written by Dr. Hiranmay Ghosh, the book takes readers through the basic principles and the computational models for cognitive vision, Bayesian reasoning for perception and cognition, and other related topics, before establishing the relationship of cognitive vision with the multi-disciplinary field broadly referred to as "artificial intelligence". The principles are illustrated with diverse application examples in computer vision, such as computational photography, digital heritage and social robots. The author concludes with suggestions for future research and salient observations about the state of the field of cognitive vision.
Other topics covered in the book include:
· knowledge representation techniques
· evolution of cognitive architectures
· deep learning approaches for visual cognition
Undergraduate students, graduate students, engineers, and researchers interested in cognitive vision will consider this an indispensable and practical resource in the development and study of computer vision.
More details
Other editions
Additional editions

Person
HIRANMAY GHOSH, PHD, was a Research Advisor to TATA Consultancy Services and an Adjunct Faculty Member with the National Institute of Technology Karnataka. During his long professional career, he has served several reputed organizations, including CMC, ECIL and C-DOT and TCS. He was an Adjunct Faculty Member with IIT Delhi, and with the National Institute of Technology Karnataka. He is a Senior Member of IEEE, Life Member of IUPRAI, and a Member of ACM.
Content
About the Author ix
Acknowledgments xi
Preface xiii
Acronyms xv
1 Introduction 1
1.1 What Is Cognitive Vision 2
1.2 Computational Approaches for Cognitive Vision 3
1.3 A Brief Review of Human Vision System 4
1.4 Perception and Cognition 6
1.5 Organization of the Book 7
2 Early Vision9
2.1 Feature Integration Theory 9
2.2 Structure of Human Eye 10
2.3 Lateral Inhibition 13
2.4 Convolution: Detection of Edges and Orientations 14
2.5 Color and Texture Perception 17
2.6 Motion Perception 19
2.6.1 Intensity-Based Approach 19
2.6.2 Token-Based Approach 20
2.7 Peripheral Vision 21
2.8 Conclusion 24
3 Bayesian Reasoning for Perception and Cognition 25
3.1 Reasoning Paradigms 26
3.2 Natural Scene Statistics 27
3.3 Bayesian Framework of Reasoning 28
3.4 Bayesian Networks 32
3.5 Dynamic Bayesian Networks 34
3.6 Parameter Estimation 36
3.7 On Complexity of Models and Bayesian Inference 38
3.8 Hierarchical Bayesian Models 39
3.9 Inductive Reasoning with Bayesian Framework 41
3.9.1 Inductive Generalization 41
3.9.2 Taxonomy Learning 45
3.9.3 Feature Selection 46
3.10 Conclusion 47
4 Late Vision 51
4.1 Stereopsis and Depth Perception 51
4.2 Perception of Visual Quality 53
4.3 Perceptual Grouping 55
4.4 Foreground-Background Separation 59
4.5 Multi-stability 60
4.6 Object Recognition 61
4.6.1 In-Context Object Recognition 62
4.6.2 Synthesis of Bottom-Up and Top-Down Knowledge 64
4.6.3 Hierarchical Modeling 65
4.6.4 One-Shot Learning 66
4.7 Visual Aesthetics 67
4.8 Conclusion 69
5 Visual Attention 71
5.1 Modeling of Visual Attention 72
5.2 Models for Visual Attention 75
5.2.1 Cognitive Models 75
5.2.2 Information-Theoretic Models 77
5.2.3 Bayesian Models 78
5.2.4 Context-Based Models 79
5.2.5 Object-Based Models 81
5.3 Evaluation 82
5.4 Conclusion 84
6 Cognitive Architectures 87
6.1 Cognitive Modeling 88
6.1.1 Paradigms for Modeling Cognition 88
6.1.2 Levels of Abstraction 91
6.2 Desiderata for Cognitive Architectures 92
6.3 Memory Architecture 94
6.4 Taxonomies of Cognitive Architectures 97
6.5 Review of Cognitive Architectures 99
6.5.1 STAR: Selective Tuning Attentive Reference 100
6.5.2 LIDA: Learning Intelligent Distribution Agent 102
6.6 Biologically Inspired Cognitive Architectures 105
6.7 Conclusions 106
7 Knowledge Representation for Cognitive Vision 109
7.1 Classicist Approach to Knowledge Representation 109
7.1.1 First Order Logic 111
7.1.2 Semantic Networks 113
7.1.3 Frame-Based Representation 114
7.2 Symbol Grounding Problem 117
7.3 Perceptual Knowledge 118
7.3.1 Representing Perceptual Knowledge 119
7.3.2 Structural Description of Scenes 120
7.3.3 Qualitative Spatial and Temporal Relations 122
7.3.4 Inexact Spatiotemporal Relations 124
7.4 Unifying Conceptual and Perceptual Knowledge 127
7.5 Knowledge-Based Visual Data Processing 128
7.6 Conclusion 129
8 Deep Learning for Visual Cognition 131
8.1 A Brief Introduction to Deep Neural Networks 132
8.1.1 Fully Connected Networks 132
8.1.2 Convolutional Neural Networks 134
8.1.3 Recurrent Neural Networks 137
8.1.4 Siamese Networks 140
8.1.5 Graph Neural Networks 140
8.2 Modes of Learning with DNN 142
8.2.1 Supervised Learning 142
8.2.1.1 Image Segmentation 142
8.2.1.2 Object Detection 144
8.2.2 Unsupervised Learning with Generative Networks 144
8.2.3 Meta-Learning: Learning to Learn 146
8.2.3.1 Reinforcement Learning 148
8.2.3.2 One-Shot and Few-Shot Learning 148
8.2.3.3 Zero-Shot Learning 150
8.2.3.4 Incremental Learning 150
8.2.4 Multi-task Learning 152
8.3 Visual Attention 154
8.3.1 Recurrent Attention Models 155
8.3.2 Recurrent Attention Model for Video 158
8.4 Bayesian Inferencing with Neural Networks 159
8.5 Conclusion 160
9 Applications of Visual Cognition 163
9.1 Computational Photography 163
9.1.1 Color Enhancement 164
9.1.2 Intelligent Cropping 166
9.1.3 Face Beautification 167
9.2 Digital Heritage 168
9.2.1 Digital Restoration of Images 168
9.2.2 Curating Dance Archives 170
9.3 Social Robots 172
9.3.1 Dynamic and Shared Spaces 173
9.3.2 Recognition of Visual Cues 174
9.3.3 Attention to Socially Relevant Signals 175
9.4 Content Re-purposing 177
9.5 Conclusion 179
10 Conclusion 181
10.1 "What Is Cognitive Vision" Revisited 181
10.2 Divergence of Approaches 183
10.3 Convergence on the Anvil? 185
References 187
Index 215
1
Introduction
Human vision system (HVS) has a remarkable capability of building three-dimensional models of the environment from the visual signals received through the eyes. The goal of computer vision research is to emulate this capability on man-made apparatus, such as computers. Twentieth century saw a tremendous growth in the field of computer vision. Starting with signal processing techniques for demarcating objects in space-time continuum of visual signals, the field has embraced several other disciplines like artificial intelligence and machine learning for interpreting the visual contents. As the research in computer vision matured, it has been pushed to address several real-life problems toward the turn of the century. Examples of such challenging applications include visual surveillance, medical image analysis, computational photography, digital heritage, robotic navigation, and so on.
Though computer vision has shown extremely promising results in many of applications in restricted domains, its performance lags that of HVS by a large margin. While HVS can effortlessly interpret complex scenes, e.g. those shown in Figure 1.1, artificial vision fails to do so. It is "intuitive" for humans to comprehend the semantics of the scenes at multiple levels of abstraction, and to predict the next movements with some degree of certainty. Derivation of such semantics remains a formidable challenge for artificial vision systems. Further, many real-life applications demand analysis of imperfect imagery, for example with poor lighting, blur, occlusions, noise, background clutter, and so forth. While human vision is robust to such imperfections, computer vision systems often fail to perform in such cases. These revelations motivated deeper study of HVS and to apply the principles involved into computer vision applications.
Figure 1.1 Hard challenges for computer vision. (a) "The offensive player is about to shoot the ball at the goal " (b) A facial expression in Bharatnatyam dance.
Source: File shared by Rick Dikeman through Wikimedia Commons, file name: Football_iu_1996.jpg.
Source: File shared by Suyash Dwivedi through Wikimedia Commons, file name: Bharatnatyam_different_facial_expressions_(9).jpg.
1.1 What Is Cognitive Vision
Though there is a broad agreement in the scientific community that cognitive vision pertains to application of principles of biological (especially, human) vision systems to computer vision applications, the space of cognitive vision studies are not well defined (Vernon 2006). The boundary between vision and cognition is thin, and cognitive vision operates in that gray area. Broadly speaking, cognitive vision involves the ability to survey a visual scene, recognizing and locating objects of interest, acting based on visual stimuli, learning and generation of new knowledge, dynamically updating a visual map that represents the reality, and so on. Perception and reasoning are two important pillars on which cognitive vision stands. A crucial point is that the entire gamut of activities must be in real-time to enable an agent to engage with the real world. It is an emerging area of research integrating methodologies from various disciplines like artificial intelligence, computer vision, machine learning, cognitive science, and psychology. There is no single approach to cognitive vision, and the proposed solutions to the different problems appears like islands in an ocean. In this book, we have attempted to put together computational theories for a set of cognitive vision problems and organized it in an attempt to develop a coherent narrative for the subject. We shall get more insight on what cognitive vision is as we proceed through the book, and shall characterize it in clearer terms in Chapter 10.
1.2 Computational Approaches for Cognitive Vision
Two branches of science have significantly contributed to the understanding of the processes for cognition from visual as well as other sensory signals. One of them is psychophysics, which is defined as the "study of quantitative relations between psychological events and physical events or, more specifically, between sensations and the stimuli that produce them" (Encyclopedia Britannica). The subject was established by Gustav Fechner and is a marriage between study of sensory processes and physical stimuli. The other branch of science that has facilitated our understanding of perception and cognition is neurophysiology, which combines physiology and neural sciences for an understanding of the functions of the nervous system. The two approaches are complementary to each other. While psychophysics answers what happens during cognition, neurophysiology explains how it is realized in the biological nervous system.
Researchers on cognitive vision have for long recognized it as an information processing activity by the biological neural system. However, a formal computational approach to understand cognition has been a fundamental contribution by David Marr (1976). Marr abstracted vision into three separable layers, namely (i) hardware, (ii) representation and algorithms, and (iii) computational theory. This abstraction enables computational theories of cognitive vision to be formulated independent of implementations in biological vision system. It also provides a theory for realizing cognitive functions in artificial systems made up of altogether different hardware, and possibly using different representations and algorithms. Further, Marr's model of vision assumes modularity and pipelined architecture, two important properties of information processing systems that allow independent formulation of the different cognitive processes with defined interfaces. Marr identifies three stages of processing for vision. The first involves finding the basic contours that mark the object boundaries. The second stage results in discovery of the surfaces and their orientations, that results in an observer-centric -dimensional model. The third involves knowledge-based interpretation of the model to an observer-neutral set of objects that constitute the 3D environment. These three stages roughly correspond to the early vision, perception, and cognition stages of vision, as recognized in the modern literature, and which we shall describe shortly.
As suggested by David Marr, it is possible to study computational theories of cognitive vision in isolation from the biological systems, and we propose to do exactly the same in this book. However, such computational models need to explain the what part of cognition. For that purpose, we shall refer to the results of the psychophysical experiments, wherever relevant, without going into details of the experimental setups. Further, though the goal of computational modeling is to support alternate (artificial) implementations of cognition that need not be based on biological implementation models, analysis of the latter often provides clue to plausible implementation schemes. We shall discuss the results of some relevant neurophysiological studies in the book. We shall consciously keep such discussions to a superficial level, so that the text can be followed without a deep knowledge of either psychology or neurosciences.
1.3 A Brief Review of Human Vision System
We briefly look into how human vision works in this section, in order to put rest of the text in this book in context. A broad overview of HVS is presented in Figure 1.2. It comprises a pair of eyes connected to the brain via the optic nerves. When one looks at a scene, the light rays enter the eyes to form a pair of inverted images on screens at the back of the eyes, which are known as the retina. This corresponds to mapping of the external 3D world to a pair of 2D images, with slightly different perspectives. Internal representations of the images are transmitted to the visual cortex in the rear end of the brain by a bunch of optic nerves, where the images are correlated and interpreted to reconstruct a symbolic description of the 3D world.
In this simple model of biological vision, the eyes primarily act as image capture device in the system, and the brain as the interpreter. In reality, things are much more complex. The output from the eyes is not a faithful reproduction of the images received. Significant transformations takes place on the retina, which enables efficient identification of object contours and their movements. These transformations are collectively referred to as early vision. Further processing in the neural circuits of the brain that results in interpretation of the signals received from the eye is known as late vision. The goal of late vision is to establish what and where of the objects located in the scene. It is believed that there are two distinct pathways in human brain, ventral and dorsal, through which visual information is processed, to answer these two aspects of vision (Milner and Goodale 1995). This has been emulated in several artificial vision systems, as we shall see in the following chapters of this book.
One of the initial tasks in the late vision system is to correlate the images received from the two eyes, which is facilitated by the criss-cross connection of the optic nerves connecting the eyes with the brain. Further, the late vision system achieves progressive abstraction of the correlated images and leads to perception and cognition, which we discuss in some details in...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.