Schweitzer Fachinformationen
Wenn es um professionelles Wissen geht, ist Schweitzer Fachinformationen wegweisend. Kunden aus Recht und Beratung sowie Unternehmen, öffentliche Verwaltungen und Bibliotheken erhalten komplette Lösungen zum Beschaffen, Verwalten und Nutzen von digitalen und gedruckten Medien.
Preface xiii
Part 1. Integrated Analysis in Geography: The Way to Cloud Computing xix
Introduction to Part 1 xxiDominique LAFFLY
Chapter 1. Geographical Information and Landscape, Elements of Formalization 1Dominique LAFFLY
Chapter 2. Sampling Strategies 7Dominique LAFFLY
2.1. References 18
Chapter 3. Characterization of the Spatial Structure 19Dominique LAFFLY
Chapter 4. Thematic Information Structures 27Dominique LAFFLY
Chapter 5. From the Point to the Surface, How to Link Endogenous and Exogenous Data 35Dominique LAFFLY
5.1. References 44
Chapter 6. Big Data in Geography 45Dominique LAFFLY
Conclusion to Part 1 55Dominique LAFFLY
Part 2. Basic Mathematical, Statistical and Computational Tools 59
Chapter 7. An Introduction to Machine Learning 61Hichem SAHLI
7.1. Predictive modeling: introduction 61
7.2. Bayesian modeling61
7.2.1. Basic probability theory 62
7.2.2. Bayes rule 63
7.2.3. Parameter estimation 63
7.2.4. Learning Gaussians 64
7.3. Generative versus discriminative models 66
7.4. Classification 67
7.4.1. Naïve Bayes 68
7.4.2. Support vector machines 69
7.5. Evaluation metrics for classification evaluation 71
7.5.1. Confusion matrix-based measures 71
7.5.2. Area under the ROC curve (AUC) 73
7.6. Cross-validation and over-fitting 73
7.7. References 74
Chapter 8. Multivariate Data Analysis 75Astrid JOURDAN and Dominique LAFFLY
8.1. Introduction 75
8.2. Principal component analysis 77
8.2.1. How to measure the information 78
8.2.2. Scalar product and orthogonal variables 80
8.2.3. Construction of the principal axes 81
8.2.4. Analysis of the principal axes 84
8.2.5. Analysis of the data points 86
8.3. Multiple correspondence analysis 88
8.3.1. Indicator matrix 89
8.3.2. Cloud of data points 90
8.3.3. Cloud of levels 92
8.3.4. MCA or PCA? 94
8.4. Clustering 96
8.4.1. Distance between data points 97
8.4.2. Dissimilarity criteria between clusters 98
8.4.3. Variance (inertia) decomposition 99
8.4.4. k-means method 101
8.4.5. Agglomerative hierarchical clustering 104
8.5. References 105
Chapter 9. Sensitivity Analysis 107Astrid JOURDAN and Peio LOUBIÈRE
9.1. Generalities 107
9.2. Methods based on linear regression 109
9.2.1. Presentation 109
9.2.2. R practice 111
9.3. Morris' method 114
9.3.1. Elementary effects method (Morris' method) 114
9.3.2. R practice 117
9.4. Methods based on variance analysis 119
9.4.1. Sobol' indices 120
9.4.2. Estimation of the Sobol' indices 122
9.4.3. R practice 123
9.5. Conclusion 126
9.6. References 127
Chapter 10. Using R for Multivariate Analysis 129Astrid JOURDAN
10.1. Introduction 129
10.1.1. The dataset 131
10.1.2. The variables 134
10.2. Principal component analysis 136
10.2.1. Eigenvalues 137
10.2.2. Data points (Individuals) 139
10.2.3. Supplementary variables 143
10.2.4. Other representations 143
10.3. Multiple correspondence analysis 144
10.4. Clustering 145
10.4.1. k-means algorithm 145
10.5. References 151
Part 3. Computer Science 153
Chapter 11. High Performance and Distributed Computing 155Sebastiano Fabio SCHIFANO, Eleonora LUPPI, Didin Agustian PERMADI, Thi Kim Oanh NGUYEN, Nhat Ha Chi NGUYEN and Luca TOMASSETTI
11.1. High performance computing 155
11.2. Systems based on multi-core CPUs 157
11.2.1. Systems based on GPUs 159
Chapter 12. Introduction to Distributed Computing 163Eleonora LUPPI
12.1. Introduction 163
12.1.1. A brief history 163
12.1.2. Design requirements165
12.1.3. Models 168
12.1.4. Grid computing 171
12.2. References 176
Chapter 13. Towards Cloud Computing 179Peio LOUBIÈRE and Luca TOMASSETTI
13.1. Introduction 179
13.1.1. Generalities 179
13.1.2. Benefits and drawbacks 180
13.2. Service model 180
13.2.1. Software as a Service 181
13.2.2. Platform as a Service 182
13.2.3. Infrastructure as a Service 182
13.2.4. And many more: XaaS 182
13.3. Deployment model 183
13.3.1. Public cloud 183
13.3.2. Private cloud 183
13.3.3. Hybrid cloud 184
13.4. Behind the hood, a technological overview 184
13.4.1. Structure 184
13.4.2. Virtualization 185
13.4.3. Scalability 186
13.4.4. Web-Oriented Architecture 187
13.5. Conclusion 187
13.6. References 188
Chapter 14. Web-Oriented Architecture - How to design a RESTFull API 191Florent DEVIN
14.1. Introduction 191
14.2. Web services 192
14.2.1. Introduction 192
14.2.2. SOAP web services 193
14.2.3. REST web services 195
14.3. Web-Oriented Applications - Microservice applications 198
14.3.1. Stateless and scalabilty 199
14.3.2. API 200
14.3.3. HTTP Methods 201
14.3.4. Example of an API 202
14.4. WSDL example 203
14.5. Conclusion 205
14.6. References 205
Chapter 15. SCALA - Functional Programming 207Florent DEVIN
15.1. Introduction 207
15.1.1. Programming languages 208
15.1.2. Paradigm 208
15.2. Functional programming 212
15.2.1. Introduction 212
15.2.2. Why now? 212
15.2.3. High order function 213
15.2.4. Basic functional blocks 215
15.3. Scala 217
15.3.1. Types systems 218
15.3.2. Basic manipulation of collection 222
15.4. Rational 224
15.5. Why immutability matters? 224
15.6. Conclusion 226
15.7. References 227
Chapter 16. Spark and Machine Learning Library 229Yannick LE NIR
16.1. Introduction 229
16.2. Spark 230
16.2.1. Spark introduction 230
16.2.2. RDD presentation 230
16.2.3. RDD lifecycle 231
16.2.4. Operations on RDD 232
16.2.5. Exercises for environmental sciences 236
16.3. Spark machine learning library 237
16.3.1. Local vectors 237
16.3.2. Labeled points 237
16.3.3. Learning dataset 238
16.3.4. Classification and regression algorithms in Spark 238
16.3.5. Exercises for environmental sciences 239
16.4. Conclusion 242
Chapter 17. Database for Cloud Computing 245Peio LOUBIÈRE
17.1. Introduction 245
17.2. From myGlsrdbms to NoSQL 245
17.2.1. CAP theorem 246
17.2.2. From ACID to BASE 247
17.3. NoSQL database storage paradigms 248
17.3.1. Column-family oriented storage 249
17.3.2. Key/value-oriented storage 249
17.3.3. Document-oriented storage 250
17.3.4. Graph-oriented storage 251
17.4. SQL versus NoSQL, the war will not take place 251
17.5. Example: a dive into MongoDB 252
17.5.1. Presentation 253
17.5.2. First steps 254
17.5.3. Database level commands 254
17.5.4. Data types 255
17.5.5. Modifying data 255
17.6. Conclusion 273
17.7. References 273
Chapter 18. WRF Performance Analysis and Scalability on Multicore High Performance Computing Systems 275Didin Agustian PERMADI, Sebastiano Fabio SCHIFANO, Thi Kim Oanh NGUYEN, Nhat Ha Chi NGUYEN, Eleonora LUPPI and Luca TOMASSETTI
18.1. Introduction 276
18.2. The weather research and forecast model and experimental set-up 276
18.2.1. Model architecture 276
18.3. Architecture of multicore HPC system 282
18.4. Results 283
18.4.1. Results of experiment E1 283
18.4.2. Results of experiment E2 286
18.5. Conclusion 288
18.6. References 288
List of Authors 291
Index 293
Summaries of other volumes 295
Geography, Ecology, Urbanism, Geology and Climatology - in short, all environmental disciplines are inspired by the great paradigms of Science: they were first descriptive before evolving toward systemic and complexity. The methods followed the same evolution, from the inductive of the initial observations one approached the deductive of models of prediction based on learning. For example, the Bayesian is the preferred approach in this book (see Volume 1, Chapter 5), but random trees, neural networks, classifications and data reductions could all be developed. In the end, all the methods of artificial intelligence (IA) are ubiquitous today in the era of Big Data. We are not unaware, however, that, forged in Dartmouth in 1956 by John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon, the term artificial intelligence is, after a long period of neglect at the heart of the future issues of the exploitation of massive data (just like the functional and logical languages that accompanied the theory: LISP, 1958, PROLOG, 1977 and SCALA, today - see Chapter 8).
All the environmental disciplines are confronted with this reality of massive data, with the rule of the 3+2Vs: Volume, Speed (from the French translation, "Vitesse"), Variety, Veracity, Value. Every five days - or even less - and only for the optical remote sensing data of the Sentinel 2a and 2b satellites, do we have a complete coverage of the Earth at a spatial resolution of 10 m for a dozen wavelengths. How do we integrate all this, how do we rethink the environmental disciplines where we must now consider at the pixel scale (10 m) an overall analysis of 510 million km2 or more than 5 billion pixels of which there are 1.53 billion for land only? And more important in fact, how do we validate automatic processes and accuracy of results?
Figure P.1. At the beginnig of AI, Dartmouth Summer Research Project, 1956.
Source: http://www.oezratty.net/wordpress/2017/semantique-intelligence-artificielle/
Including social network data, Internet of Things (IoT) and archive data, for many topics such as Smart Cities, it is not surprising that environmental disciplines are interested in cloud computing.
Before understanding the technique (why this shape, why a cloud?), it would seem that to represent a node of connection of a network, we have, as of the last 50 years, drawn a potatoid freehand, which, drawn took the form of a cloud. Figure P.2 gives a perfect illustration on the left, while on the right we see that the cloud is now the norm (screenshot offered by a search engine in relation to the keywords: Internet and network).
What is cloud computing? Let us remember that, even before the term was dedicated to it, cloud computing was based on networks (see Chapter 4), the Internet and this is: "since the 50s when users accessed, from their terminals, applications running on central systems" (Wikipedia). The cloud, as we understand it today, has evolved considerably since the 2000s; it consists of the mutualization of remote computing resources to store data and use services dynamically - to understand software - dedicated via browser interfaces.
Figure P.2. From freehand potatoid to the cloud icon. The first figure is a schematic illustration of a distributed SFPS switch. For a color version of this figure, see www.iste.co.uk/laffly/torus1.zip
This answers the needs of the environmental sciences overwhelmed by the massive data flows: everything is stored in the cloud, everything is processed in the cloud, even the results expected by the end-users recover them according to their needs. It is no wonder that, one after the other, Google and NASA offered in December 2016 - mid-term of TORUS! - cloud-based solutions for the management and processing of satellite data: Google Earth Engine and NASA Earth Exchange.
But how do you do it? Why is it preferable - or not - for HPC (High Performance Computing) and GRIDS? How do we evaluate "Cloud & High Scalability Computing" versus "Grid & High-Performance Computing"? What are the costs? How do you transfer the applications commonly used by environmental science to the cloud? What is the added value for environmental sciences? In short, how does it work?
All these questions and more are at the heart of the TORUS program developed to learn from each other, understand each other and communicate with a common language mastered: geoscience, computer science and information science; and the geosciences between them; computer science and information sciences. TORUS is not a research program. It is an action that aims to bring together too (often) remote scientific communities, in order to bridge the gap that now separates contemporary computing from environmental disciplines for the most part. One evolving at speeds that cannot be followed by others, one that is greedy for data that others provide, one that can offer technical solutions to scientific questioning that is being developed by others and so on.
TORUS is also the result of multiple scientific collaborations initiated in 2008-2010: between the geographer and the computer scientist, between France and Vietnam with an increasing diversity of specialties involved (e.g. remote sensing and image processing, mathematics and statistics, optimization and modeling, erosion and geochemistry, temporal dynamics and social surveys) all within various scientific and university structures (universities, engineering schools, research institutes - IRD, SFRI and IAE Vietnam, central administrations: the Midi-Pyrénées region and Son La district, France-Vietnam partnership) and between research and higher education through national and international PhDs.
Naturally, I would like to say, the Erasmus+ capacity building program of the European Union appeared to be a solution adapted to our project:
"The objectives of the Capacity Building projects are: to support the modernization, accessibility and internationalization of higher education in partner countries; improve the quality, relevance and governance of higher education in partner countries; strengthen the capacity of higher education institutions in partner countries and in the EU, in terms of international cooperation and the process of permanent modernization in particular; and to help them open up to society at large and to the world of work in order to reinforce the interdisciplinary and transdisciplinary nature of higher education, to improve the employability of university graduates, to give the European higher education more visibility and attractiveness in the world, foster the reciprocal development of human resources, promote a better understanding between the peoples and cultures of the EU and partner countries."1
In 2015, TORUS - funded to the tune of 1 million euros for three years - was part of the projects selected in a pool of more than 575 applications and only 120 retentions. The partnership brings together (Figure P.3) the University of Toulouse 2 Jean Jaurès (coordinator - FR), the International School of Information Processing Sciences (EISTI - FR), the University of Ferrara in Italy, the Vrije University of Brussels, the National University from Vietnam to Hanoi, Nong Lam University in Ho Chi Minh City and two Thai institutions: Pathumthani's Asian Institute of Technology (AIT) and Walaikak University in Nakhon Si Thammarat.
Figure P.3. The heart of TORUS, partnership between Asia and Europe. For a color version of this figure, see www.iste.co.uk/laffly/torus1.zip
With an equal share between Europe and Asia, 30 researchers, teachers-researchers and engineers are involved in learning from each other during these three years, which will be punctuated by eight workshops between France, Vietnam, Italy, Thailand and Belgium. Finally, after the installation of the two servers in Asia (Asian Institute of Technology - Thailand; and Vietnam National University Hanoi - Vietnam), more than 400 cores will fight in unison with TORUS to bring cloud computing closer to environmental sciences. More than 400 computer hearts beat in unison for TORUS, as well as those of Nathalie, Astrid, Eleonora, Ann, Imeshi, Thanh, Sukhuma, Janitra, Kim, Daniel, Yannick, Florent, Peio, Alex, Lucca, Stefano, Hichem, Hung(s), Thuy, Huy, Le Quoc, Kim Loi, Agustian, Hong, Sothea, Tongchai, Stephane, Simone, Marco, Mario, Trinh, Thiet, Massimiliano, Nikolaos, Minh Tu, Vincent and Dominique.
To all of you, a big thank you.
This book is divided into three volumes.
Volume 1 raises the problem of voluminous data in geosciences before presenting the main methods of analysis and computer solutions mobilized to meet them.
Volume 2 presents remote sensing, geographic information systems (GIS) and spatial data infrastructures (SDI) that are central to all disciplines that deal with geographic space.
Volume 3 is a collection of thematic application cases representative of the specificities of the teams involved in TORUS and which motivated their needs in terms of cloud computing.
Dominique LAFFLY
January...
Dateiformat: ePUBKopierschutz: Adobe-DRM (Digital Rights Management)
Systemvoraussetzungen:
Das Dateiformat ePUB ist sehr gut für Romane und Sachbücher geeignet – also für „fließenden” Text ohne komplexes Layout. Bei E-Readern oder Smartphones passt sich der Zeilen- und Seitenumbruch automatisch den kleinen Displays an. Mit Adobe-DRM wird hier ein „harter” Kopierschutz verwendet. Wenn die notwendigen Voraussetzungen nicht vorliegen, können Sie das E-Book leider nicht öffnen. Daher müssen Sie bereits vor dem Download Ihre Lese-Hardware vorbereiten.Bitte beachten Sie: Wir empfehlen Ihnen unbedingt nach Installation der Lese-Software diese mit Ihrer persönlichen Adobe-ID zu autorisieren!
Weitere Informationen finden Sie in unserer E-Book Hilfe.