ALIREZA JAVAHERI is the head of Equities Quantitative Research Americas at JP Morgan and an adjunct professor of Mathematical Finance at the Courant Institute of New York University, as well as Baruch College. He has worked in the field of derivatives quantitative research since 1994 in a variety of investment banks, including Goldman Sachs and Citigroup.
Introduction (First Edition)
This book focuses on developing Methodologies for Estimating Stochastic Volatility (SV) parameters from the Stock-Price Time-Series under a Classical framework. The text contains three chapters and is structured as follows:
In the first chapter, we shall introduce and discuss the concept of various parametric SV models. This chapter represents a brief survey of the existing literature on the subject of nondeterministic volatility.
We start with the concept of log-normal distribution and historic volatility. We then will introduce the Black-Scholes  framework. We shall also mention alternative interpretations as suggested by Cox and Rubinstein . We shall state how these models are unable to explain the negative-skewness and the leptokurticity commonly observed in the stock markets. Also, the famous implied-volatility smile would not exist under these assumptions.
At this point we consider the notion of level-dependent volatility as advanced by researchers such as Cox and Ross [69, 70] as well as Bensoussan, Crouhy, and Galai . Either an artificial expression of the instantaneous variance will be used, as is the case for Constant Elasticity Variance (CEV) models, or an implicit expression will be deduced from a Firm model similar to Merton's , for instance.
We also will bring up the subject of Poisson Jumps  in the distributions providing a negative-skewness and larger kurtosis. These jump-diffusion models offer a link between the volatility smile and credit phenomena.
We then discuss the idea of Local Volatility  and its link to the instantaneous unobservable volatility. Work by researchers such as Dupire , Derman, and Kani  will be cited. We shall also describe the limitations of this idea due to an ill-poised inversion phenomenon, as revealed by Avellaneda  and others.
Unlike Non-Parametric Local Volatility models, Parametric Stochastic Volatility (SV) models  define a specific stochastic differential equation for the unobservable instantaneous variance. We therefore will introduce the notion of two-factor Stochastic Volatility and its link to one-factor Generalized Auto-Regressive Conditionally Heteroskedastic (GARCH) processes . The SV model class is the one we shall focus on. Studies by scholars such as Engle , Nelson , and Heston  will be discussed at this juncture. We will briefly mention related works on Stochastic Implied Volatility by Schonbucher , as well as Uncertain Volatility by Avellaneda .
Having introduced SV, we then discuss the two-factor Partial Differential Equations (PDE) and the incompleteness of the markets when only cash and the underlying asset are used for hedging.
We then will examine Option Pricing techniques such as Inversion of the Fourier transform, Mixing Monte-Carlo, as well as a few asymptotic pricing techniques, as explained for instance by Lewis .
At this point we shall tackle the subject of pure-jump models such as Madan's Variance Gamma  or its variants VG with Stochastic Arrivals (VGSA) . The latter adds to the traditional VG a way to introduce the volatility clustering (persistence) phenomenon. We will mention the distribution of the stock market as well as various option pricing techniques under these models. The inversion of the characteristic function is clearly the method of choice for option pricing in this context.
In the second chapter we will tackle the notion of Inference (or Parameter-Estimation) for Parametric SV models. We shall first briefly analyze the Cross-Sectional Inference and will then focus on the Time-Series Inference.
We start with a concise description of cross-sectional estimation of SV parameters in a risk-neutral framework. A Least Squares Estimation (LSE) algorithm will be discussed. The Direction-Set optimization algorithm  will be also introduced at this point. The fact that this optimization algorithm does not use the gradient of the input-function is important, since we shall later deal with functions that contain jumps and are not necessarily differentiable everywhere.
We then discuss the parameter inference from a Time-Series of the underlying asset in the real world. We shall do this in a Classical (Non-Bayesian)  framework and in particular we will estimate the parameters via a Maximization of Likelihood Estimation (MLE)  methodology. We shall explain the idea of MLE, its link to the Kullback-Leibler  distance, as well as the calculation of the Likelihood function for a two-factor SV model.
We will see that unlike GARCH models, SV models do not admit an analytic (integrated) likelihood function. This is why we will need to introduce the concept of Filtering .
The idea behind Filtering is to obtain the best possible estimation of a hidden state given all the available information up to that point. This estimation is done in an iterative manner in two stages: The first step is a Time Update where the prior distribution of the hidden state, at a given point in time, is determined from all the past information via a Chapman-Kolmogorov equation. The second step would then involve a Measurement Update where this prior distribution is used together with the conditional likelihood of the newest observation in order to compute the posterior distribution of the hidden state. The Bayes rule is used for this purpose. Once the posterior distribution is determined, it could be exploited for the optimal estimation of the hidden state.
We shall start with the Gaussian case where the first two moments characterize the entire distribution. For the Gaussian-Linear case, the optimal Kalman Filter (KF)  is introduced. Its nonlinear extension, the Extended KF (EKF), is described next. A more suitable version of KF for strongly nonlinear cases, the Unscented KF (UKF) , is also analyzed. In particular we will see how this filter is related to Kushner's Nonlinear Filter (NLF) [181, 182].
EKF uses a first-order Taylor approximation upon the nonlinear transition and observation functions, in order to bring us back into a simple KF framework. On the other hand, UKF uses the true nonlinear functions without any approximation. It, however, supposes that the Gaussianity of the distribution is preserved through these functions. UKF determines the first two moments via integrals that are computed on a few appropriately chosen "sigma points." NLF does the same exact thing via a Gauss-Hermite quadrature. However NLF often introduces an extra centering step, which will avoid poor performance due to an insufficient intersection between the prior distribution and the conditional likelihood.
As we shall observe, in addition to their use in the MLE approach, the Filters above could be applied to a direct estimation of the parameters via a Joint Filter (JF) . The JF would simply involve the estimation of the parameters together with the hidden state via a dimension augmentation. In other words, one would treat the parameters as hidden states. After choosing initial conditions and applying the filter to an observation data set, one would then disregard a number of initial points and take the average upon the remaining estimations. This initial rejected period is known as the "burn in" period.
We will test various representations or State Space Models of the Stochastic Volatility models such as Heston's . The concept of Observability  will be introduced in this context. We will see that the parameter estimation is not always accurate given a limited amount of daily data.
Before a closer analysis of the performance of these estimation methods, we shall introduce simulation-based Particle Filters (PF) [84, 128], which can be applied to non-Gaussian distributions. In a PF algorithm, the Importance Sampling technique is applied to the distribution. Points are simulated via a chosen proposal distribution and the resulting weights proportional to the conditional likelihood are computed. Since the variance of these weights tends to increase over time and cause the algorithm to diverge, the simulated points go through a variance reduction technique commonly referred to as Resampling . During this stage, points with too small a weight are disregarded, and points with large weights are reiterated. This technique could cause a Sample Impoverishment, which can be corrected via a Metropolis-Hastings Accept/Reject test. Work by researchers such as Doucet , Smith, and Gordon  are cited and used in this context.
Needless to say, the choice of the proposal distribution could be fundamental in the success of the PF algorithm. The most natural choice would be to take a proposal distribution equal to the prior distribution of the hidden state. Even if this makes the computations simpler, the danger would be a non-alignment between the prior and the conditional likelihood as we previously mentioned. To avoid this, other proposal distributions taking into account the observation should be considered. The Extended PF (EPF) and the Unscented PF (UPF)  precisely do this by adding an extra Gaussian Filtering step to the process. Other techniques such as Auxiliary PF (APF) have been developed by Pitt and Shephard .
Interestingly, we will see that PF brings only marginal improvement to the traditional KF's when applied to daily data. However, for a larger time-step where the nonlinearity is stronger, the PF does help more.
At this point we also compare the Heston model to other SV models such as the "3/2" model  using real market data, and we will see that the...