Applied Modeling Techniques and Data Analysis 2

Name: Applied Modeling Techniques and Data Analysis 2 | Financial, Demographic, Stochastic and Statistical Models and Methods
Brand: Wiley
Price: 139.99 EUR
Availability: OnlineOnly

Financial, Demographic, Stochastic and Statistical Models and Methods

Yiannis Dimotikalis Alex Karagrigoriou Christina Parpoula Christos H. Skiadas(Herausgeber*in)

Wiley (Verlag)

1. Auflage

Erschienen am 13. April 2021

288 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-119-82162-5 (ISBN)

139,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

Preface xi
Yannis DIMOTIKALIS, Alex KARAGRIGORIOU, Christina PARPOULA and Christos H. SKIADAS

Part 1. Financial and Demographic Modeling Techniques 1

Chapter 1. Data Mining Application Issues in the Taxpayer Selection Process 3
Mauro BARONE, Stefano PISANI and Andrea SPINGOLA

1.1. Introduction 3

1.2. Materials and methods 5

1.2.1. Data 5

1.2.2. Interesting taxpayers 6

1.2.3. Enforced tax recovery proceedings 9

1.2.4. The models 11

1.3. Results 13

1.4. Discussion 23

1.5. Conclusion 23

1.6. References 24

Chapter 2. Asymptotics of Implied Volatility in the Gatheral Double Stochastic Volatility Model 27
Mohammed ALBUHAYRI, Anatoliy MALYARENKO, Sergei SILVESTROV, Ying NI, Christopher ENGSTRÖM, Finnan TEWOLDE and Jiahui ZHANG

2.1. Introduction 27

2.2. The results 30

2.3. Proofs 30

2.4. References 38

Chapter 3. New Dividend Strategies 39
Ekaterina BULINSKAYA

3.1. Introduction 39

3.2. Model 1 41

3.3. Model 2 48

3.4. Conclusion and further results 51

3.5. Acknowledgments 51

3.6. References 52

Chapter 4. Introduction of Reserves in Self-adjusting Steering of Parameters of a Pay-As-You-Go Pension Plan 53
Keivan DIAKITE, Abderrahim OULIDI and Pierre DEVOLDER

4.1. Introduction 53

4.2. The pension system 54

4.3. Theoretical framework of the Musgrave rule 57

4.4. Transformation of the retirement fund 60

4.5. Conclusion 63

4.6. References 64

Chapter 5. Forecasting Stochastic Volatility for Exchange Rates using EWMA 65
Jean-Paul MURARA, Anatoliy MALYARENKO, Milica RANCIC and Sergei SILVESTROV

5.1. Introduction 65

5.2. Data 66

5.3. Empirical model 67

5.4. Exchange rate volatility forecasting 69

5.5. Conclusion 73

5.6. Acknowledgments 73

5.7. References 74

Chapter 6. An Arbitrage-free Large Market Model for Forward Spread Curves 75
Hossein NOHROUZIAN, Ying NI and Anatoliy MALYARENKO

6.1. Introduction and background 75

6.1.1. Term-structure (interest rate) models 76

6.1.2. Forward-rate models versus spot-rate models 77

6.1.3. The Heath-Jarrow-Morton framework 77

6.1.4. Construction of our model 78

6.2. Construction of a market with infinitely many assets 79

6.2.1. The Cuchiero-Klein-Teichmann approach 79

6.2.2. Adapting Cuchiero-Klein-Teichmann's results to our objective 82

6.3. Existence, uniqueness and non-negativity 82

6.3.1. Existence and uniqueness: mild solutions 83

6.3.2. Non-negativity of solutions 85

6.4. Conclusion and future works 87

6.5. References 88

Chapter 7. Estimating the Healthy Life Expectancy (HLE) in the Far Past: The Case of Sweden (1751-2016) with Forecasts to 2060 91
Christos H. SKIADAS and Charilaos SKIADAS

7.1. Life expectancy and healthy life expectancy estimates 92

7.2. The logistic model 94

7.3. The HALE estimates and our direct calculations 95

7.4. Conclusion 96

7.5. References 96

Chapter 8. Vaccination Coverage Against Seasonal Influenza of Workers in the Primary Health Care Units in the Prefecture of Chania 97

Aggeliki MARAGKAKI and George MATALLIOTAKIS

8.1. Introduction 98

8.2. Material and method 98

8.3. Results 101

8.4. Discussion 105

8.5. References 107

Chapter 9. Some Remarks on the Coronavirus Pandemic in Europe 109
Konstantinos ZAFEIRIS and Marianna KOUKLI

9.1. Introduction 109

9.2. Background 110

9.2.1. CoV pathogens 110

9.2.2. Clinical characteristics of COVID-19 111

9.2.3. Diagnosis 113

9.2.4. Epidemiology and transmission of COVID-19 113

9.2.5. Country response measures 115

9.2.6. The role of statistical research in the case of COVID-19 and its challenges 119

9.3. Materials and analyses 119

9.4. The first phase of the pandemic 121

9.5. Concluding remarks 126

9.6. References 127

Part 2. Applied Stochastic and Statistical Models and Methods 135

Chapter 10. The Double Flexible Dirichlet: A Structured Mixture Model for Compositional Data 137
Roberto ASCARI, Sonia MIGLIORATI and Andrea ONGARO

10.1. Introduction 138

10.1.1. The flexible Dirichlet distribution 139

10.2. The double flexible Dirichlet distribution 140

10.2.1. Mixture components and cluster means 141

10.3. Computational and estimation issues 144

10.3.1. Parameter estimation: the EM algorithm 145

10.3.2. Simulation study 148

10.4. References 151

Chapter 11. Quantization of Transformed Lévy Measures 153
Mark Anthony CARUANA

11.1. Introduction 153

11.2. Estimation strategy 156

11.3. Estimation of masses and the atoms 159

11.4. Simulation results 165

11.5. Conclusion 166

11.6. References 167

Chapter 12. A Flexible Mixture Regression Model for Bounded Multivariate Responses 169
Agnese M. DI BRISCO and Sonia MIGLIORATI

12.1. Introduction 169

12.2. Flexible Dirichlet regression model 170

12.3. Inferential issues 172

12.4. Simulation studies 173

12.4.1. Simulation study 1: presence of outliers 174

12.4.2. Simulation study 2: generic mixture of two Dirichlet distributions 179

12.4.3. Simulation study3: FD distribution 180

12.5. Discussion 182

12.6. References 183

Chapter 13. On Asymptotic Structure of the Critical Galton-Watson Branching Processes with Infinite Variance and Allowing Immigration 185
Azam A. IMOMOV and Erkin E. TUKHTAEV

13.1. Introduction 185

13.2. Invariant measures of GW process 187

13.3. Invariant measures of GWPI 190

13.4. Conclusion 193

13.5. References 194

Chapter 14. Properties of the Extreme Points of the Joint Eigenvalue Probability Density Function of the Wishart Matrix 195
Asaph Keikara MUHUMUZA, Karl LUNDENGÅRD, Sergei SILVESTROV, John Magero MANGO and Godwin KAKUBA

14.1. Introduction 195

14.2. Background 196

14.3. Polynomial factorization of the Vandermonde and Wishart matrices 197

14.4. Matrix norm of the Vandermonde and Wishart matrices 200

14.5. Condition number of the Vandermonde and Wishart matrices 203

14.6. Conclusion 206

14.7. Acknowledgments 206

14.8. References 207

Chapter 15. Forecast Uncertainty of the Weighted TAR Predictor 211
Francesco GIORDANO and Marcella NIGLIO

15.1. Introduction 211

15.2. SETAR predictors and bootstrap prediction intervals 214

15.3. Monte Carlo simulation 218

15.4. References 222

Chapter 16. Revisiting Transitions Between Superstatistics 223
Petr JIZBA and Martin PROKs

16.1. Introduction 223

16.2. From superstatistic to transition between superstatistics 224

16.3. Transition confirmation 225

16.4. Beck's transition model 227

16.5. Conclusion 230

16.6. Acknowledgments 231

16.7. References 231

Chapter 17. Research on Retrial Queue with Two-Way Communication in a Diffusion Environment 233
Viacheslav VAVILOV

17.1. Introduction 233

17.2. Mathematical model 234

17.3. Asymptotic average characteristics 236

17.4. Deviation of the number of applications in the system 241

17.5. Probability distribution density of device states 247

17.6. Conclusion 248

17.7. References 248

List of Authors 251

Index 255

1
Data Mining Application Issues in the Taxpayer Selection Process

This chapter provides a data analysis framework designed to build an effective learning scheme aimed at improving the Italian Revenue Agency's ability to identify non-compliant taxpayers, with special regard to self-employed individuals allowed to keep simplified registers. Our procedure involves building two C4.5 decision trees, both trained and validated on a sample of 8,000 audited taxpayers, but predicting two different class values, based on two different predictive attribute sets. That is, the first model is built in order to identify the most likely non-compliant taxpayers, while the second identifies the ones that are are less likely to pay the additional due tax bill. This twofold selection process target is needed in order to maximize the overall audit effectiveness. Once both models are in place, the taxpayer selection process will be held in such a way that businesses will only be audited if they are judged as worthy by both models. This methodology will soon be validated on real cases: that is, a sample of taxpayers will be selected according to the classification criteria developed in this chapter and will subsequently be involved in some audit processes.

1.1. Introduction

Fraud detection systems are designed to automate and help reduce the manual parts of a screening/checking process (Phua et al. 2005). Data mining plays an important role in fraud detection as it is often applied to extract fraudulent behavior profiles hidden behind large quantities of data and, thus, may be useful in decision support systems for planning effective audit strategies. Indeed, huge amounts of resources (to put it bluntly, money) may be recovered from well-targeted audits. This explains the increasing interest and investments of both governments and fiscal agencies in intelligent systems for audit planning. The Italian Revenue Agency (hereafter, IRA) itself has been studying data mining application techniques in order to detect tax evasion, focusing, for instance, on the tax credit system, supposed to support investments in disadvantaged areas (de Sisti and Pisani 2007), on fraud related to credit mechanisms, with regard to value-added tax - a tax that is levied on the price of a product or service at each stage of production, distribution or sale to the end consumer, except where a business is the end consumer, which will reclaim this input value (Basta et al. 2009) and on income indicators audits (Barone et al. 2017).

This chapter contributes to the empirical literature on the development of classification models applied to the tax evasion field, presenting a case study that focuses on a dataset of 8,000 audited taxpayers on the fiscal year 2012, each of them described by a set of features, concerning, among others, their tax returns, their properties and their tax notice.1

In this context, all the taxpayers are in some way "unfaithful", since all of them have received a tax notice that somehow rectified the tax return they had filed. Thus, the predictive analysis tool we develop is designed to find patterns in data that may help tax offices recognize only the riskiest taxpayers' profiles.

Evidence on data at hand shows that our first model, which is described in detail later, is able to distinguish the taxpayers who are worthy of closer investigation from those who are not. 2

However, by defining the class value as a function of the higher due taxes, we satisfy the need of focusing on the taxpayers who are more likely to be "significant" tax evaders, but we do not ensure an efficient collection of their tax debt. Indeed, data shows that as the tax bill increases, the number of coercive collection procedures put in place also increases. Unfortunately, these procedures are highly inefficient, as they are able to only collect about 5% of the overall credits claimed against the audited taxpayers (Italian Court of Auditors 2016). As a result, the tax authorities' ability to collect the due taxes may be jeopardized.

Further analysis is thus devoted to finding a way to discover, among the "significant" evaders, the most solvent ones. We recall that the 2018-2020 Agreement between the IRA and the Ministry of Finance states that audit effectiveness is measured, among others, by an indicator that is simply equal to the sum of the collected due taxes which summarizes the effectiveness of the IRA's efforts to tackle tax evasion (Ministry of Economy and Finance - IRA Agreement for 2018-2010 2018). This is a reasonable indicator because the ordinary activities taken in the fight against tax evasion are crucial from the State budget point of view, because public expenditures (i.e. public services) strictly depend on the amount of public revenue. Of course, fraud and other incorrect fiscal behaviors may be tackled, even though no tax collection is guaranteed, in order to reach the maximum tax compliance. Such extra activities may also be jointly conducted with the Finance Guard or the Public Prosecutor if tax offenses arise.

Therefore, to tackle our second problem, i.e. to guarantee a certain degree of due tax collection, a trivial fact that we start from is that a taxpayer with no properties will not be willing to pay his dues, whereas if he had something to lose (a home or a car that could be seized), then, if the IRA's claim is right, it is more probable that he might reach an agreement with the tax authorities.

Therefore, a second model only focusing on a few features indicating whether the taxpayer owned some kind of assets or not is built, in order to predict each tax notice's final status (in this case, we only distinguish between statuses ending with an enforced recovery proceeding and statuses where such enforced recovery proceedings do not take place). Once both models are available, the taxpayer selection process is held in such a way that businesses will only be audited if they are judged as worthy by both models.

The key feature of our procedure is the twofold selection process target, needed to maximize the IRA's audit processes' effectiveness. The methodology we suggest will soon be validated in real cases i.e. a sample of taxpayers will be selected according to the classification criteria developed in this chapter and will be subsequently involved in some audit processes.

1.2. Materials and methods

1.2.1. Data

Data on hand refers to a sample of 8,028 audited self-employed individuals for fiscal year 2012, each described by a set of features, concerning, among others, their tax returns, their properties and their tax notice.3

Just for descriptive purposes, we can depict the statistical distribution of the revenues achieved by the businesses in our sample, grouped in classes (in thousands of euros), in Figure 1.1.

Most of our dataset is made up of small-sized taxpayers, of which almost 50% show revenues lower than ? 75,000 per year and only 4% higher than ? 500,000, with a sample average of ? 146,348.

Figure 1.1. Revenues distribution

For each taxpayer in the dataset, both his tax notice status and the additional due taxes (i.e. the additional requested tax amount) are known.

Here comes the first problem that needs to be tackled: the additional due tax is a numeric attribute which measures the seriousness of the taxpayer's tax evasion, whereas our algorithms, as we will show later on, need categorical values in order to predict. Thus, we cannot directly use the additional due taxes, but we need to define a class variable and decide both which values it will take and how to map each numeric value referred to the additional due taxes into such categorical values.

1.2.2. Interesting taxpayers

We must define a function f(x) which associates, to each element x in the dataset, a categorical value that shows its fraud risk degree and represents the class our first model will try to predict. Of course, a function that labels all the taxpayers in the dataset as tax evaders would be useless. Thus, a distinction needs to be drawn between serious tax evasion cases and those that are less relevant. To this purpose, we somehow follow (Basta et al. 2009) and choose to divide the taxpayers into two groups, the interesting ones and the not interesting ones, from the tax administration point of view (to a certain extent, interesting stands for "it might be interesting for the tax administration to go and check what's going on ..."), based on two criteria: profitability (i.e. the ability to identify the most serious cases of tax evasion, independently from all other factors) and fairness (i.e. the ability to identify the most serious cases of tax evasion, with respect to the taxpayer's turnover).

Honest taxpayers are treated as not interesting taxpayers, even though this label is used to indicate moderate tax evasion cases. We are somehow forced to use this approximation since we only have data on taxpayers who received a tax notice, and not on taxpayers for which an audit process may have been closed without qualifications, or may have not even been started.

Therefore, in order to take the profitability issue into account, we define a new variable, called the tax claim, which represents the higher assessed taxes if the tax notice stage is still open, or the higher settled...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Applied Modeling Techniques and Data Analysis 2

Beschreibung

Weitere Details

Weitere Ausgaben

Personen

Inhalt

1 Data Mining Application Issues in the Taxpayer Selection Process

1.1. Introduction

1.2. Materials and methods

1.2.1. Data

1.2.2. Interesting taxpayers

Systemvoraussetzungen

1
Data Mining Application Issues in the Taxpayer Selection Process