
The Book of Alternative Data
Description
Alles über E-Books | Antworten auf Fragen rund um E-Books, Kopierschutz und Dateiformate finden Sie in unserem Info- & Hilfebereich.
Harnessing non-traditional data sources to generate alpha, analyze markets, and forecast risk is a subject of intense interest for financial professionals. A growing number of regularly-held conferences on alternative data are being established, complemented by an upsurge in new papers on the subject. Alternative data is starting to be steadily incorporated by conventional institutional investors and risk managers throughout the financial world. Methodologies to analyze and extract value from alternative data, guidance on how to source data and integrate data flows within existing systems is currently not treated in literature. Filling this significant gap in knowledge, The Book of Alternative Data is the first and only book to offer a coherent, systematic treatment of the subject.
This groundbreaking volume provides readers with a roadmap for navigating the complexities of an array of alternative data sources, and delivers the appropriate techniques to analyze them. The authors--leading experts in financial modeling, machine learning, and quantitative research and analytics--employ a step-by-step approach to guide readers through the dense jungle of generated data. A first-of-its kind treatment of alternative data types, sources, and methodologies, this innovative book:
* Provides an integrated modeling approach to extract value from multiple types of datasets
* Treats the processes needed to make alternative data signals operational
* Helps investors and risk managers rethink how they engage with alternative datasets
* Features practical use case studies in many different financial markets and real-world techniques
* Describes how to avoid potential pitfalls and missteps in starting the alternative data journey
* Explains how to integrate information from different datasets to maximize informational value
The Book of Alternative Data is an indispensable resource for anyone wishing to analyze or monetize different non-traditional datasets, including Chief Investment Officers, Chief Risk Officers, risk professionals, investment professionals, traders, economists, and machine learning developers and users.
More details
Other editions
Additional editions

Persons
SAEED AMEN is the founder of Cuemacro, where he consults on systematic trading. For 15 years, he has developed systematic trading strategies and quantitative indices including at major investment banks, Lehman Brothers and Nomura. He is also a visiting lecturer at Queen Mary University of London and a co-founder of the Thalesians, a quant think tank.
Content
- Intro
- Table of Contents
- Preface
- Acknowledgments
- PART 1: Introduction and Theory
- CHAPTER 1: Alternative Data: The Lay of the Land
- 1.1 INTRODUCTION
- 1.2 WHAT IS "ALTERNATIVE DATA"?
- 1.3 SEGMENTATION OF ALTERNATIVE DATA
- 1.4 THE MANY VS OF BIG DATA
- 1.5 WHY ALTERNATIVE DATA?
- 1.6 WHO IS USING ALTERNATIVE DATA?
- 1.7 CAPACITY OF A STRATEGY AND ALTERNATIVE DATA
- 1.8 ALTERNATIVE DATA DIMENSIONS
- 1.9 WHO ARE THE ALTERNATIVE DATA VENDORS?
- 1.10 USAGE OF ALTERNATIVE DATASETS ON THE BUY SIDE
- 1.11 CONCLUSION
- NOTES
- CHAPTER 2: The Value of Alternative Data
- 2.1 INTRODUCTION
- 2.2 THE DECAY OF INVESTMENT VALUE
- 2.3 DATA MARKETS
- 2.4 THE MONETARY VALUE OF DATA (PART I)
- 2.5 EVALUATING (ALTERNATIVE) DATA STRATEGIES WITH AND WITHOUT BACKTESTING
- 2.6 THE MONETARY VALUE OF DATA (PART II)
- 2.7 THE ADVANTAGES OF MATURING ALTERNATIVE DATASETS
- 2.8 SUMMARY
- NOTES
- CHAPTER 3: Alternative Data Risks and Challenges
- 3.1 LEGAL ASPECTS OF DATA
- 3.2 RISKS OF USING ALTERNATIVE DATA
- 3.3 CHALLENGES OF USING ALTERNATIVE DATA
- 3.4 AGGREGATING THE DATA
- 3.5 SUMMARY
- NOTES
- CHAPTER 4: Machine Learning Techniques
- 4.1. INTRODUCTION
- 4.2. MACHINE LEARNING: DEFINITIONS AND TECHNIQUES
- 4.3. WHICH TECHNIQUE TO CHOOSE?
- 4.4. ASSUMPTIONS AND LIMITATIONS OF THE MACHINE LEARNING TECHNIQUES
- 4.5. STRUCTURING IMAGES
- 4.6. NATURAL LANGUAGE PROCESSING (NLP)
- 4.7. SUMMARY
- NOTES
- CHAPTER 5: The Processes behind the Use of Alternative Data
- 5.1. INTRODUCTION
- 5.2. STEPS IN THE ALTERNATIVE DATA JOURNEY
- 5.3. STRUCTURING TEAMS TO USE ALTERNATIVE DATA
- 5.4. DATA VENDORS
- 5.5. SUMMARY
- NOTES
- CHAPTER 6: Factor Investing
- 6.1. INTRODUCTION
- 6.2. FACTOR MODELS
- 6.3. THE DIFFERENCE BETWEEN CROSS-SECTIONAL AND TIME SERIES TRADING APPROACHES
- 6.4. WHY FACTOR INVESTING?
- 6.5. SMART BETA INDICES USING ALTERNATIVE DATA INPUTS
- 6.6. ESG FACTORS
- 6.7. DIRECT AND INDIRECT PREDICTION
- 6.8. SUMMARY
- NOTES
- PART 2: Practical Applications
- CHAPTER 7: Missing Data: Background
- 7.1. INTRODUCTION
- 7.2. MISSING DATA CLASSIFICATION
- 7.3. LITERATURE OVERVIEW OF MISSING DATA TREATMENTS
- 7.4. SUMMARY
- NOTES
- CHAPTER 8: Missing Data: Case Studies
- 8.1. INTRODUCTION
- 8.2. CASE STUDY: IMPUTING MISSING VALUES IN MULTIVARIATE CREDIT DEFAULT SWAP TIME SERIES
- 8.3. CASE STUDY: SATELLITE IMAGES
- 8.4. SUMMARY
- 8.5. APPENDIX: GENERAL DESCRIPTION OF THE MICE PROCEDURE
- 8.6. APPENDIX: SOFTWARE LIBRARIES USED IN THIS CHAPTER
- NOTES
- CHAPTER 9: Outliers (Anomalies)
- 9.1. INTRODUCTION
- 9.2. OUTLIERS DEFINITION, CLASSIFICATION, AND APPROACHES TO DETECTION
- 9.3. TEMPORAL STRUCTURE
- 9.4. GLOBAL VERSUS LOCAL OUTLIERS, POINT ANOMALIES, AND MICRO-CLUSTERS
- 9.5. OUTLIER DETECTION PROBLEM SETUP
- 9.6. COMPARATIVE EVALUATION OF OUTLIER DETECTION ALGORITHMS
- 9.7. APPROACHES TO OUTLIER EXPLANATION
- 9.8. CASE STUDY: OUTLIER DETECTION ON FED COMMUNICATIONS INDEX
- 9.9. SUMMARY
- 9.10. APPENDIX
- NOTES
- CHAPTER 10: Automotive Fundamental Data
- 10.1. INTRODUCTION
- 10.2. DATA
- 10.3. APPROACH 1: INDIRECT APPROACH
- 10.4. APPROACH 2: DIRECT APPROACH
- 10.5. GAUSSIAN PROCESSES EXAMPLE
- 10.6. SUMMARY
- 10.7. APPENDIX
- NOTES
- CHAPTER 11: Surveys and Crowdsourced Data
- 11.1. INTRODUCTION
- 11.2. SURVEY DATA AS ALTERNATIVE DATA
- 11.3. THE DATA
- 11.4. THE PRODUCT
- 11.5. CASE STUDIES
- 11.6. SOME TECHNICAL CONSIDERATIONS ON SURVEYS
- 11.7. CROWDSOURCING ANALYST ESTIMATES SURVEY
- 11.8. ALPHA CAPTURE DATA
- 11.9. SUMMARY
- 11.10. APPENDIX
- NOTES
- CHAPTER 12: Purchasing Managers' Index
- 12.1. INTRODUCTION
- 12.2. PMI PERFORMANCE
- 12.3. NOWCASTING GDP GROWTH
- 12.4. IMPACTS ON FINANCIAL MARKETS
- 12.5. SUMMARY
- NOTES
- CHAPTER 13: Satellite Imagery and Aerial Photography
- 13.1. INTRODUCTION
- 13.2. FORECASTING US EXPORT GROWTH
- 13.3. CAR COUNTS AND EARNINGS PER SHARE FOR RETAILERS
- 13.4. MEASURING CHINESE PMI MANUFACTURING WITH SATELLITE DATA
- 13.5. SUMMARY
- CHAPTER 14: Location Data
- 14.1. INTRODUCTION
- 14.2. SHIPPING DATA TO TRACK CRUDE OIL SUPPLIES
- 14.3. MOBILE PHONE LOCATION DATA TO UNDERSTAND RETAIL ACTIVITY
- 14.4. TAXI RIDE DATA AND NEW YORK FED MEETINGS
- 14.5. CORPORATE JET LOCATION DATA AND M&A
- 14.6. SUMMARY
- NOTE
- CHAPTER 15: Text, Web, Social Media, and News
- 15.1. INTRODUCTION
- 15.2. COLLECTING WEB DATA
- 15.3. SOCIAL MEDIA
- 15.4. NEWS
- 15.5. OTHER WEB SOURCES
- 15.6. SUMMARY
- NOTES
- CHAPTER 16: Investor Attention
- 16.1. INTRODUCTION
- 16.2. READERSHIP OF PAYROLLS TO MEASURE INVESTOR ATTENTION
- 16.3. GOOGLE TRENDS DATA TO MEASURE MARKET THEMES
- 16.4. INVESTOPEDIA SEARCH DATA TO MEASURE INVESTOR ANXIETY
- 16.5. USING WIKIPEDIA TO UNDERSTAND PRICE ACTION IN CRYPTOCURRENCIES
- 16.6. ONLINE ATTENTION FOR COUNTRIES TO INFORM EMFX TRADING
- 16.7. SUMMARY
- CHAPTER 17: Consumer Transactions
- 17.1. INTRODUCTION
- 17.2. CREDIT AND DEBIT CARD TRANSACTION DATA
- 17.3. CONSUMER RECEIPTS
- 17.4. SUMMARY
- NOTE
- CHAPTER 18: Government, Industrial, and Corporate Data
- 18.1. INTRODUCTION
- 18.2. USING INNOVATION MEASURES TO TRADE EQUITIES
- 18.3. QUANTIFYING CURRENCY CRISIS RISK
- 18.4. MODELING CENTRAL BANK INTERVENTION IN CURRENCY MARKETS
- 18.5. SUMMARY
- CHAPTER 19: Market Data
- 19.1. INTRODUCTION
- 19.2. RELATIONSHIP BETWEEN INSTITUTIONAL FX FLOW DATA AND FX SPOT
- 19.3. UNDERSTANDING LIQUIDITY USING HIGH-FREQUENCY FX DATA
- 19.4. SUMMARY
- NOTE
- CHAPTER 20: Alternative Data in Private Markets
- 20.1. INTRODUCTION
- 20.2. DEFINING PRIVATE EQUITY AND VENTURE CAPITAL FIRMS
- 20.3. PRIVATE EQUITY DATASETS
- 20.4. UNDERSTANDING THE PERFORMANCE OF PRIVATE FIRMS
- 20.5. SUMMARY
- Conclusions
- SOME LAST WORDS
- References
- About the Authors
- Index
- End User License Agreement
CHAPTER 1
Alternative Data: The Lay of the Land
1.1 INTRODUCTION
There is a considerable amount of buzz around the topic of alternative data in finance. In this book, we seek to discuss the topic in detail, showing how alternative data can be used to enhance understanding of financial markets, improve returns, and manage risk better.
This book is aimed at investors who are in search of superior returns through nontraditional approaches. These methods are different from fundamental analysis or quantitative methods that rely solely on data widely available in financial markets. It is also aimed at risk managers who want to identify early signals of events that could have a negative impact, using information that is not present yet in any standard and broadly used datasets.1
At the moment of writing there are mixed opinions in the industry about whether alternative data can add any value in the investment process on top of the more standardized data sources. There is news in the press about hedge funds and banks who have tried, but failed to extract value from it (see e.g. Risk, 2019). We must stress, however, that the absence of predictive signals in alternative data is only one of the components of a potential failure. In fact, we will try to convince the reader, through the practical examples that we will examine, that useful signals can be gleaned from alternative data in many cases. At the same time, we will also explain why any strategy that aims to extract and make successful use of signals is a combination of algorithms, processes, technology, and careful cost-benefit analysis. Failure to tackle any of these aspects in the right way will lead to a failure to extract usable insights from alternative data. Hence, the proof of the existence of a signal in a dataset is not sufficient to benefit from a superior investment strategy, given that there are many other subtle issues at play, most of which are dynamic in nature, as we will explain later.
In this book, we will also discuss in detail the techniques that can be used to make alternative data usable for the purposes we have already noted. These will be techniques belonging to what are labeled today as the fields of Machine Learning (ML) and Artificial Intelligence (AI). However, we do not want to give the upfront impression of being unnecessarily complex, with these "sophisticated" catchall terms. Hence, we will also include simpler and more traditional techniques, such as linear and logistic regression,2 with which the financial community is already familiar. Indeed, in many instances simpler techniques can be very useful when seeking to extract signals from alternative datasets in finance. Nevertheless, this is not a machine learning textbook and hence we will not delve in the details of each technique we will use, but we will only provide a succinct introduction. We will refer the reader to the appropriate texts where necessary.
This is also not a book about the technology and the infrastructure that underlie any real-world implementations of alternative data. These topics encompassing data engineering are still, of course, very important. Indeed, they are necessary for anything found to be a signal in the data to be of any use in real life. However, given the variety and the deep expertise needed to treat them in detail, we believe that these topics deserve a book on their own. Nevertheless, we must stress that methodologies that we use in practice to extract a signal are often constrained by technological limitations. Do we need an algorithm to work fast and deliver results in almost real time or can we live with some latency? Hence, the type of algorithm we choose will be very much determined by technological constraints like these. We will hint at these important aspects throughout, although this book will not be, strictly speaking, technological.
In this book, we will go through practical case studies showing how different alternative data sources can be profitably employed for different purposes within finance. These case studies will cover a variety of data sources and for each of them will explore in detail how to solve a specific problem like, for example, predicting equity returns from fundamental industrial data or forecasting economic variables from survey indices. The case studies will be self-contained and representative of a wide array of situations that could appear in the real-world applications, across a number of different asset classes.
Finally, this book will not be a catalogue of all the alternative data sources existing at the moment of writing. We deem this to be futile because, in our dynamic world, the number and variety of such datasets increase every day. What is more important, in our view, is the process and techniques of how to make the available data useful. In doing so, we will be quite practical by also examining mundane problems that appear in sieving through datasets, the missteps and mistakes that any practical application entails.
This book is structured as follows. Part I will be a general introduction to alternative data, the processes and the techniques to make it usable in an investment strategy. In Chapter 1, we will define alternative data and create a taxonomy. In Chapter 2 we will discuss the subtle problem of how to price datasets. This subject is currently being actively debated in the industry. Chapter 3 will talk about the risks associated with alternative data, in particular the legal risks, and we will also delve more into the details of the technical problems that one faces when implementing alternative data strategies. Chapter 4 introduces many of the machine learning and structuring techniques that can be relevant for understanding alternative data. Again, we will refer the reader to the appropriate literature for a more in-depth understanding of those techniques.
Chapter 5 will examine the processes behind the testing and the implementation of alternative data signals-based strategies. We will recommend a fail-fast approach to the problem. In a world where datasets are many and further proliferating, we believe that this is the best way to proceed.
Part II will focus on some real-world use cases, beginning with an explanation of factor investing in Chapter 6, and a discussion of how alternative data can be incorporated in this framework. One of the use cases will not be directly related to an investment strategy but is a problem at the entry point of any project and must be treated before anything else is attempted - missing data, in Chapters 7 and 8. We also address another ubiquitous problem of outliers in data (see Chapter 9). We will then examine use cases for investment strategies and economic forecasting based on a broad array of different types of alternative datasets, in many different asset classes, including public markets such as equities and FX. We also look at the applicability of alternative data to understand private markets (see Chapter 20), where markets are typically opaquer given the lack of publicly available information. The alternative datasets we shall discuss include automotive supply chain data (see Chapter 10), satellite imagery (see Chapter 13), and machine readable news (see Chapter 15). In many instances, we shall also illustrate the use case with trading strategies on various asset classes.
So, to start this journey, let's explain a little bit more about what the financial community means by "alternative data" and why it is considered to be such a hot topic.
1.2 WHAT IS "ALTERNATIVE DATA"?
It is widely known that information can provide an edge. Hence, financial practitioners have historically tried to gather as much data as is feasible. The nature of this information, however, has changed over time, especially since the beginning of the Big Data revolution.3 From "standard" sources like market prices and balance sheet information, it evolved to include others, in particular those that are not strictly speaking financial. These include, for example, satellite imagery, social media, ship movements, and the Internet-of-Things (IoT). The data from these "nonstandard" sources is labeled alternative data.
In practice, alternative data has several characteristics, which we list below. It is data that has at least one of the following features:
- Less commonly used by market participants
- Tends to be more costly to collect, and hence more expensive to purchase
- Usually outside of financial markets
- Has shorter history
- More challenging to use
We must note from this list that what constitutes alternative data can vary significantly over time according to how widely available it is, as well has how embedded in a process it is. Obviously, today most financial market data is far more commoditized and more widely available than it was decades ago. Hence, it is not generally labeled as alternative. For example, a daily time series for equity closing prices is easily accessible from many sources and it is considered nonalternative. In contrast, very high frequency FX data, although financial, is far more expensive, specialized, and niche. The same is also true of comprehensive FX volume and flow data, which is less readily available. Hence, these market derived datasets may then be considered alternative. The cost and availability of a dataset are very much dependent on several factors, such as asset class and frequency. Hence, these factors determine whether the label "alternative" should...
System requirements
File format: ePUB
Copy protection: Adobe-DRM (Digital Rights Management)
System requirements:
- Computer (Windows; MacOS X; Linux): Install the free reader Adobe Digital Editions prior to download (see eBook Help).
- Tablet/smartphone (Android; iOS): Install the free app Adobe Digital Editions or the app PocketBook before downloading (see eBook Help).
- E-reader: Bookeen, Kobo, Pocketbook, Sony, Tolino and many more (not Kindle).
The file format ePub works well for novels and non-fiction books – i.e., „flowing” text without complex layout. On an e-reader or smartphone, line and page breaks automatically adjust to fit the small displays.
This eBook uses Adobe-DRM, a „hard” copy protection. If the necessary requirements are not met, unfortunately you will not be able to open the eBook. You will therefore need to prepare your reading hardware before downloading.
Please note: We strongly recommend that you authorise using your personal Adobe ID after installation of any reading software.
For more information, see our ebook Help page.