Machine Learning and Big Data with kdb+/q

Name: Machine Learning and Big Data with kdb+/q
Brand: Wiley
Price: 63.99 EUR
Availability: OnlineOnly

Jan Novotny Paul A. Bilokon Aris Galiotos Frederic Deleze(Autor*in)

Wiley (Verlag)

1. Auflage

Erschienen am 15. November 2019

640 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-40473-6 (ISBN)

63,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Upgrade your programming language to more effectively handle high-frequency data

Machine Learning and Big Data with KDB+/Q offers quants, programmers and algorithmic traders a practical entry into the powerful but non-intuitive kdb+ database and q programming language. Ideally designed to handle the speed and volume of high-frequency financial data at sell- and buy-side institutions, these tools have become the de facto standard; this book provides the foundational knowledge practitioners need to work effectively with this rapidly-evolving approach to analytical trading.

The discussion follows the natural progression of working strategy development to allow hands-on learning in a familiar sphere, illustrating the contrast of efficiency and capability between the q language and other programming approaches. Rather than an all-encompassing "bible"-type reference, this book is designed with a focus on real-world practicality -to help you quickly get up to speed and become productive with the language.

* Understand why kdb+/q is the ideal solution for high-frequency data

* Delve into "meat" of q programming to solve practical economic problems

* Perform everyday operations including basic regressions, cointegration, volatility estimation, modelling and more

* Learn advanced techniques from market impact and microstructure analyses to machine learning techniques including neural networks

The kdb+ database and its underlying programming language q offer unprecedented speed and capability. As trading algorithms and financial models grow ever more complex against the markets they seek to predict, they encompass an ever-larger swath of data -- more variables, more metrics, more responsiveness and altogether more "moving parts."

Traditional programming languages are increasingly failing to accommodate the growing speed and volume of data, and lack the necessary flexibility that cutting-edge financial modelling demands. Machine Learning and Big Data with KDB+/Q opens up the technology and flattens the learning curve to help you quickly adopt a more effective set of tools.

Weitere Details

Weitere Ausgaben

Inhalt

Preface

HISTORY OF `kdb+` AND `q`

kdb+ andq are intellectual descendants of an older programming language, APL. The acronym "APL" stands for "A Programming Language", the name of a book (Iverson, 1962) written by the Canadian computer scientist Kenneth (Ken) Eugene Iverson (1920-2004). Iverson worked on automatic data processing during his years at Harvard (1955-1960), when he found that the conventional mathematical notation wasn't well suited to the task. He proceeded to develop his own notation, borrowing ideas from linear algebra, tensor analysis, and operators à la Oliver Heaviside. This notation was further elaborated at IBM, where Iverson worked alongside Adin Falkoff (1921-2010) from 1960 until 1980. The collaboration between Iverson and Falkoff would span nearly two decades.

The two main ideas behind APL are the efficient-notation idea (Montalbano, 1982) and the stored-program idea. The stored-program idea, which dates back to John von Neumann (1903-1957), see von Neumann (1945), and amounts to being able to store (and process) code as data, has been taken a step further in languages such as q, where function names evaluate to their source code. The efficient-notation idea, the idea that developing a concise and expressive syntax is critical to solving complex iterative problems correctly and efficiently, was pioneered by Iverson. In the fpref of Iverson (1962) he defines a programming language in the following terms:

Applied mathematics is largely concerned with the design and analysis of explicit procedures for calculating the exact or approximate values of various functions. Such explicit procedures are called algorithms or programs. Because an effective notation for the description of programs exhibits considerable syntactic structure, it is called a programming language.

Later, in 1979, Iverson would give a Turing Award Lecture with the title Notation as a Tool of Thought (Iverson, 1979). Iverson's notation was effective as it was simple. It relied on simple rules of precedence based on right-to-left evaluation. The fundamental data structure in APL is a multidimensional array. Languages such as APL and its progenitors are sometimes referred to as array, vector, or multidimensional programming languages because they implicitly generalise scalar operations to higher-dimensional objects.

Iverson's notation was known as "Iverson's notation" within IBM until the name "APL" was suggested by Falkoff. After the publication of A Programming Language in 1962, the notation was used to describe the IBM System/360 computer. Iverson and Falkoff then focused on the implementation of the programming language. An implementation on System/360 was made available at IBM in 1966 and released to the outside world in 1968.

In 1980, Iverson moved to I. P. Sharp Associates (IPSA), a Canadian software firm based in Calgary. There he was joined by Roger K. W. Hui (b. 1953) and Arthur Whitney (b. 1957). The three of them continued to work on APL, adding new ideas to the programming language. Hui, whose family emigrated to Canada from Hong Kong in 1966, was first exposed to APL at the University of Alberta. Whitney had a background in pure mathematics and had worked with APL at the University of Toronto and Stanford. He met Iverson for the first time well before he joined IPSA, at the age of 11, in 1969. Iverson had been his father's friend at Harvard. Whitney's family then lived in Alberta but would visit Iverson in his house in Mount Kisco. There Iverson introduced Whitney to programming and APL.

In 1988, Whitney joined Morgan Stanley, where he helped develop A+, an APL-like programming language with a smaller set of primitive functions optimised for fast processing of large volumes of time series data. Unlike APL, A+ allowed functions to have up to nine formal parameters, used semicolons to separate statements (so a single statement could be split into multiple lines), used the last statement of a function as its result, and introduced the so-called dependencies which functioned as global variables. The programming language is now available online, http://www.aplusdev.org/. Programmers can also download the kapl font, which includes the special characters used by APL and A+.

One summer weekend in 1989, Whitney visited Iverson at Kiln Farm and produced - "'on one page and in one afternoon"' Hui (1992) - an interpreter fragment on the AT&T 3B1 computer. Hui studied this fragment and on its basis developed an interpreter for another APL variant, J. Unlike APL and A+, J used the ASCII character set. It included advanced features, such as support for parallel MIMD operations. Whitney's original fragment appears under the name Incunabulum in an appendix in Hui's book, see Hui (1992)1. Other ideas by Whitney found their way into J: orienting primitives on the leading axis, using prefix rather than suffix for agreement, and total array ordering (Hui, 2006, 1995). Ken Iverson, his son, Eric Iverson, and Hui all ended up working in a company called Jsoftware in the 1990s-2000s.

Whitney left Morgan Stanley in 1993 and co-founded Kx Systems with Janet Lustgarten, where he developed another APL variant, called k. On its basis he developed a columnar in-memory time series database called kdb. Kx Systems was under an exclusive agreement with UBS. It expired in 1996 and k and kdb became generally available. ksql was added in 1998 as a layer on top of k. Some developers regard it as part of the k language. ksql includes SQL-like constructs, such as select.

kdb+ was released in June 2003. This was more or less a total rewrite of kdb for 64-bit systems based on the 4th version of k and q, a macro language layer (or a query language, hence the name) on top of k, defined in terms of k. Both q and k compile to the same byte code that is executed in the kdb+ byte interpreter. For example, type in q is the equivalent of @: in k. q is much more readable than k and most kdb+ developers write their code in q, not k.

The q programming language contains its own table query syntax called q-sql, which in some ways resembles the traditional SQL.

MOTIVATION FOR THIS BOOK

q and kdb+ stand in a special ground between - as well as overlapping - the purely technical world of software engineering and the world of data science. On the one hand, they can be used for building data services which communicate with each other, whether this is simply to expose a time series database - something q excels at - or a chain of Complex Event Processing (CEP) engines calculating analytics and signals in real-time. On the other hand, q's fast execution on vectorised time series enables its application to data science, hypothesis testing, research, pattern recognition, and general statistical and machine learning, these being the fields more traditionally focused on by quantitative analysts.

In practice, the distinction between the two worlds can become blurry: they are both very interconnected. An idea cannot be validated without rigorous and fast2 statistical analysis and backtesting, where the algorithms and their parameters are well understood, avoiding black-box solutions. It is then natural to expand our model validation code into the actual production predictive analytics. This is facilitated by the reliability, resilience, scalability, and durability of kdb+.

q's notation as a tool of thought and expression enables it to succeed at both these tasks; its roots, traceable from APL and lambda calculus (Church, 1941), its vector language, the fast columnar database, and its query language make q a unique all-in-one tool.

Although previously notorious as a language which "cannot be googled" due to its short name and lack of documentation, q and kdb+ are nowadays well-documented from the programming language and infrastructural perspectives, with some excellent sources of material both on https://code.kx.com/ and in books, such as Borror (2015); Psaris (2015).

Moreover, kdb+ is widely used in the market as a time series database combined with real-time analytics, whereas a lot of the data science, statistical and machine learning (ML) work is done outside of q, by extracting data into, or interfacing with, Python or R.

The aim of this book is to demonstrate that a lot of the power of q can be harnessed to deal with a large part of everyday data analysis, from data retrieval and data operations - specifically on very large data sets - to performing a range of...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Machine Learning and Big Data with kdb+/q

Beschreibung

Weitere Details

Weitere Ausgaben

Inhalt

Preface

HISTORY OF kdb+ AND q

MOTIVATION FOR THIS BOOK

Systemvoraussetzungen

HISTORY OF `kdb+` AND `q`