Universidade Federal de Santa Maria

Ci. e Nat., Santa Maria, v.42, e111, 2020

DOI:10.5902/2179460X39914

ISSN 2179-460X

Received: 09/09/19  Accepted: 16/04/20  Published: 23/12/20

Statistics

Classe Zeta-G: alguns aspectos computacionais e analíticos

The Zeta-G Class: some computational and analytical aspects

 

 

Ana Carla Percontini I

Frank Gomes-Silva II

Gauss Moutinho Cordeiro III

Pedro Rafael Diniz Marinho IV

 

 

 Universidade Estadual de Feira de Santana, Feira de Santana, BA - anappaixao@gmail.com

II  Universidade Federal Rural de Pernambuco, Recife, PE - franksinatrags@gmail.com

III Universidade Federal de Pernambuco, Recife, PE - gausscordeiro@gmail.com

III Universidade Federal da Paraíba, João Pessoa, PB - pedro.rafael.marinho@gmail.com

 

 

ABSTRACT

We define a new class of distributions with one extra shape parameter including some special cases. We provide numerical and computational aspects of the new class. We propose functions using the R language to fit any distribution in this family to a data set. In addition, such functions are implemented efficiently using the library Rcpp that enables the incorporation of the codes C++ in R automatically. Some examples are presented for using the implemented routines in practice. We derive some mathematical properties of this class including explicit expressions for the moments, generating function and mean deviations. We discuss the estimation of the model parameters by maximum likelihood and provide an application to a real data set.

Keywords: Computational aspects; Generating function; Mean deviations; Moment; Zeta distribution

 

 

RESUMO

Definimos uma nova classe de distribuições com um parâmetro de forma extra incluindo alguns casos especiais. Estabelecemos aspectos numéricos e computacionais da nova classe. Propomos funções usando a linguagem R para ajustar qualquer distribuição nesta família a conjuntos de dados. Tais funções são eficientemente implementadas utilizando a biblioteca Rcpp que permite incorporação automática de códigos C++ em R. Alguns exemplos são apresentados para o uso das rotinas implementadas na prática. Derivamos algumas propriedades matemáticas dessa classe, incluindo expressões explícitas para os momentos, função geradora e desvios médios. Discutimos a estimação dos parâmetros do modelo por máxima verossimilhança e fornecemos uma aplicação a dados reais.

Palavras-chave: Aspectos computacionais; Função geradora; Desvios médios; Momentos; Distribuição Zeta

 

 

1 INTRODUCTION

Recently, new families have been proposed by compounding any continuous baseline G distribution with a discrete distribution supported on integers n ≥ 1. By this method, we can obtain new classes with additional parameters to govern skewness and generate densities with heavier or ligther tails. These parameters are sought as a manner to furnish a more flexible distribution for modeling the hazard rate function (hrf). Another important method for generating continuous distributions was proposed by Alzaatreh et al. (2013). Accordingly, several new distributions have been published as, for example, an extended Weibull distribution (Cordeiro e Lemonte, 2013) that includes the Weibull as a special case and gives more flexibility to model various types of data.

We propose a general family of continuous distributions called the Zeta-G class with one additional shape parameter. The Zeta-G is a generated class from the Riemann Zeta distribution studied by Lin and Hu (2001). They proved that the Riemann Zeta random variable can be represented by a linear function of infinitely many independent geometric random variables. Gupta et al. (2008) studied the Hurwitz Lerch Zeta (HLZ) distribution and investigated its structural properties. In recent years, the HLZ distribution and its variants have been studied by various authors including Vilaplana (1988), Zörnig e Altmann (1995), Doray e Luong (1995), Doray e Luong (1997) and Gut (2005). Zörnig e Altmann (1995) have shown that many well-known discrete distribution are special cases of the HLZ family. The Riemman Zeta distribution is one of them, which has been used for describing ranking problems in linguistics.

The Zeta-G class can generate new distributions from specified baseline distributions. We demonstrate that the Zeta-G class density is a linear combination of exponentiated-G (“exp-G” for short) density functions.

Let W1, ···, WZ be a random sample from a continuous cumulative distribution function (cdf) G(·) with positive support, where Z is an unknown positive integer number. We assume that the random variable Z has a Zeta probability mass function (pmf)

 

 

 

where ζ(s) is the Riemann Zeta function. All real zeros of the function are on the negative real axis, located in points ζ(2s) = 0, s = 1,2, .... In the particular case of the origin, we have ζ(0) = − 1/2. The Riemann Zeta function is undefined for s = 1 and ζ(s) < 0 for s (0,1). Further, ζ(s) > 1 for s > 1 and ζ(s) → 1 when s → ∞. For details and other analytic properties of the Riemann Zeta function, see Lin e Hu (2001) and Gut (2006).

Let Z and Wi’s be independent random variables and X = min{W1, ···, WZ}. Then, the conditional cdf of X given Z is

 

 

 

The unconditional cdf of X has the form (for x > 0)

 

 

 

where s > 1 is a shape parameter. After some algebra, the cdf of X can be expressed as

 

(1)

 

where Lis(x) is the polylogarithm function [K. Oldham e Spainer (2009), Section 25:12] given by the power series

 

(2)

 

and |z| < 1. Equation (1) defines the cdf of the Zeta-G class.

The polylogarithm function can be represented by more general functions, for example, using the generalized hypergeometric function, the Lerch transcendent function and the Meijer G-function given in Wolfram website (http://functions.wolfram.com/10.08.26.0008.01 - Accessed 13/06/2018.).

We provide two motivations for the Zeta-G class. Let Z have a Zeta distribution. First, suppose the failure of a device occurs due to the presence of an unknown number Z of initial defects of same kind, which can be identifiable only after causing failure and are repaired perfectly. Define by Wi the time to the failure of the device due to the ith defect, for i ≥ 1. Under the assumptions that the Wi’s are independent and identically distributed (iid) random variables with cdf G(x) independent of Z, equation (1) is appropriate for modeling the time to the first failure. Secondly, suppose that an individual in the population is susceptible to a certain type of cancer. Let Z be the number of carcinogenic cells for that individual left active after the initial treatment and denote by Wi the time spent for the ith carcinogenic cell to produce a detectable cancer mass, for i ≥ 1. Under the assumptions that {Wi}i≥1 is a sequence of iid random variables independent of Z having the cdf G(x), where Z has a Zeta distribution, the time to relapse of cancer of a susceptible individual is defined by X = min {Wi}Zi=1, which follows (1).

The probability density function (pdf) corresponding to (1) is

 

(3)

 

where g(x) = dG(x)/dx. We can verify using Mathematica that . The density function (3) will be most tractable when G(x) and g(x) have simple analytic expressions.

A positive point of the Zeta-G model is that it includes as a special case the G distribution when s → ∞. Hereafter, a random variable X having density (3) is denoted by X Zeta-G(τ ,s), where τ is the parameter vector associated with G. The survival function and hrf of X are given by

 

 

and

 

 

respectively.

The rest of the paper is organized as follows. In Section 2, we present some new distributions in the Zeta-G class. Some of its numerical and computational aspects are addressed in Section 3. We obtain a useful linear representation for its density, explicit expressions for the ordinary and incomplete moments, moment generating function (mgf) and mean deviations in Sections 4 to 7. The estimation of the model parameters using the method of maximum likelihood is presented in Section 8. An application to a real data set is performed in Section 9. Finally, some conclusions are offered in Section 10.

 

 

2 SPECIAL ZETA-G DISTRIBUTIONS

The Zeta-G class of density functions (3) allows for greater flexibility of its tails and can be widely applied in many areas of engineering and biology. This class extends several widely-known distributions in the literature. Next, we present four special cases.

2.1 Zeta-Weibull (ZW) distribution

If G(x) is the Weibull cdf with scale parameter β > 0 and shape parameter α > 0, say G(x) = 1 − exp(−βxα ), the pdf (for x > 0) and cdf of the ZW distribution are, respectively,

 

 

 

Figure 1 displays some possible shapes of the ZW density function.

2.2 Zeta-Fréchet (ZFr) distribution

Consider the Fréchet distribution (for x,σ,λ > 0) with cdf and pdf G(x) = e−(σ/x) λ and g(x) = λ σλ x−λ−1 e−(σ/x) λ , respectively. The pdf and cdf of the ZFr distribution, for x > 0, are

 

(4)

 

Figura 1 - The ZW density functions for: (a) s = 3 and β = 1; (b) s = 5 and α = 0.7

(a)

(b)

Gráfico

Descrição gerada automaticamente

Uma imagem contendo Interface gráfica do usuário

Descrição gerada automaticamente

 

and

 

 

respectively, where σ > 0 is scale parameter and λ > 0 is a shape parameter. Plots of (4) for selected parameter values are displayed in Figure 2.

 

Figura 2 - The ZFr density function for: (a) s = 2 and σ = 0.5; (b) s = 2 and λ = 1.5

(a)

(b)

Interface gráfica do usuário, Histograma

Descrição gerada automaticamente

Interface gráfica do usuário

Descrição gerada automaticamente

2.3 Zeta-Fréchet (ZFr) distribution

Let G(x) be the Burr XII distribution with cdf G(x) = 1 − (1 + xc)k and pdf g(x) = ckxc−1/(1 + xc)k+1, where c > 0 is a shape parameter and k > 0 is a scale parameter. The pdf and cdf of the ZBXII distribution, for x > 0, are

 

(5)

 

and

 

 

respectively. Plots of (5) for some parameter values are displayed in Figure (3).

 

Figura 3 - The ZBXII density function for some parameter values: (a) s = 3 and k = 0.5; (b) s = 3 and c = 2

(a)

(b)

Interface gráfica do usuário, Gráfico

Descrição gerada automaticamente

Gráfico, Histograma

Descrição gerada automaticamente

2.4 Zeta-Lomax (ZLo) distribution

The pdf and cdf of the Lomax distribution are (for x ≥ 0 and α, λ > 0) g(x) = α/λ (1 + x/λ)α−1 and G(x) = 1 − (1 + x/λ)α, respectively. The pdf and cdf of the ZLo distribution, for x ≥ 0, are

 

 

and

 

 

respectively. Some plots of the ZLo density function are displayed in Figure 4.

 

Figura 4 - The ZLo density function for: (a) s = 3 and α = 1.5; (b) s = 3 and λ = 1

(a)

(b)

Interface gráfica do usuário, Histograma

Descrição gerada automaticamente

Interface gráfica do usuário

Descrição gerada automaticamente

 

 

3 NUMERICAL AND COMPUTATIONAL ASPECTS

The use and acceptance of a family of distributions is closely related to the ease of use and implementation of their particular models. Many of these families have an analytical approach in closed-form but can require the use of infinite sums and numerical approximations which are tiring to be implemented in a computationally efficient form. We facilitate the implementation of this family using some simple functions

The functions are implemented using the language C++ and can easily be invoked by the language R. The programmer of R does not need to spend time understanding the C++ code nor even configuring the R language to compile the C++ code. For configuration, the programer of R only need to run the config_cpp(dir) defined over the front. This function is responsible for installing the necessary dependencies for compiling and linking of C++ code with the R language, in which the argument dir is the directory in which the user will save the C++ code.

Once the Zeta-G family involves the functions LiS(x) and ζ(s), polylogarithm and Rimman Zeta functions, we should obtain a numerical approximation considering a large number of sums, which can be computationally intensive depending on the problem in which these functions are applied. This fact justifies writing these functions in a computationally efficient language such as the case of the C++ language. The steps for communicating the C++ code with R are summarized in three simple steps described below. Soon after, a diagram helps in the explanation.

1. Create, in some directory, the fast.cpp file and save it with the C++ Code 1 (Appendix A). This is the file with C++ code that will be compiled in R;

2. Run, in R, the code of the function of the name config_cpp (see Code 2, Appendix B);

3. After the previous step, run config_cpp (dir = "path"), where “path” is the path of the directory the user saved the file fast.cpp in step 1.

The Code 1 refers to C++ code that should be saved in a file named fast.cpp. The user must save this file into a directory of free choice. The code makes use of the Rcpp library which provides a clean API that allows you to write high-performance R code using C++. The code presents the derivationcpp(), polylogcpp() and riemann_zetacpp() functions that are used to obtain numerical derivatives and approximation of Riemann’s polylogarithm and zeta functions, respectively.

Appendix B lists the R code for the config_cpp() function that is responsible for compiling the C++ code, making the functions implemented in fast.cpp available in R. The user should pass as argument to the config_cpp() function the path of the directory containing the file fast.cpp. If no argument is passed, the function assumes that the fast.cpp file is in the default working directory that R considers in the system. To find out what this path is, open a section of R and run getwd(), that is, if the dir parameter of the config_cpp() function is not given, the fast.cpp code should be saved in the path given by getwd().

After the previous steps, the user can, finally, make use of the functions cdf_zeta() and pdf_zeta() implemented in R. These functions use the compiled functions of the C++. Given any G function (baseline cdf), the function cdf_zeta(G) provides the Zeta-G cdf. Consider the example below:

Example (Implementing the ZW cdf): At the end, the ZW cdf (α = 0.5, β = 0.5, s = 2) is evaluated at x = 0 and x = 10000, returning 0 and 1, respectively.

 

# Zeta−G class of distributions.

cdf_zeta <− function (G){

 

# Using the concept of closures.

function (par , x ){

s <− tail (par , n = 1)

stopifnot (s > 1 )

 

zeta_s <− riemann_zetacpp(s)

w <− polylogcpp (z = 1 − G(par = par[−length (par)], x), s = s )

return ((zeta_s − w) / zeta_s)

}

 

}

# Weibull distribution.

cdf_weibull <− function (par , x){

alpha <− par[1]

beta <− par[2]

1 − exp (−(x/beta)^alpha)

}

 

# Zeta−Weibull distribution.

cdf_zetag_weibull <− cdf_zeta (G = cdf_weibull)

cdf_zetag_weibull (par = c(0.5, 0.5, 2), 10)

 

#> 0

cdf_zetag_weibull (par = c(0.5, 0.5, 2), 1e4)

#> 1

 

It is important to note that making use of the cdf_zeta() function, the user simply need to implement the G function. In the example above, it is only need to implement the G cdf. The way it was implemented in cdf_weibull distribution(), other G cdfs could be implemented to generate new Zeta-G distributions. It is also important to note that the parameter s will always be the last of the vector par of the function obtained by cdf_zeta(). In this example, cdf_zetag_weibull(par = c(0.5,0.5,2), x = 1e4) we set s = 2. Considering what was exemplified above, we can easily obtain the ZW density, as it can seen in the next example.

Example (Obtaining the ZW density by means of the pdf_zeta() function): Note that we take as an argument to pdf_zeta() the ZW cdf obtained in the previous example. At the end of the example, note that the integral of the ZW density (α = 0.5, β = 0.5, s = 2) is one.

 

pdf_zeta <− function(cdf_zeta){

# Using the concept of closures.

function (par , x){

derivationcpp(cdf_zeta , par = par , x)

}

}

 

pdf_zetag_weibull <− pdf_zeta(cdf_zeta = cdf_zetag_weibull)

integrate (f = pdf_zetag_weibull , par = c(0.5, 0.5, 2), lower = 0,

upper = Inf)

#> 1

 

The cdf_zeta() and pdf_zeta() functions are interesting, since they create new functions (cdf and pdf of the Zeta-G model) in a layout accepted by the AdequacyModel package, version 2.0.0, developed by Diniz Marinho et al. (2016). This package is widely used in the area of distributions to obtain goodness-of-fits statistics, being one of the most cited packages in the literature for this purpose. Details regarding the AdequacyModel package can be seen in https://cran.r-project.org/package=AdequacyModel. Automatically, we obtain cdfs and pdfs of Zeta-G distributions in the forms accepted by the AdequacyModel package, which facilitates the implementation of distributions in the proposed class.

The cdf_zeta() and pdf_zeta() functions make use of what is known in the computation of lambda functions, which basically refer to the anonymous functions that can be applied for various purposes. These are used in the concept of programming known as “closures”. Closure refers to any function that closes in the environment in which it was defined, being able to access variables that are not in its parameter list. The introduction of these concepts into the cdf_zeta() and pdf_zeta() functions is what allows them to construct new functions by passing G and the Zeta-G cdfs, respectively, in the form they are accepted by the AdequacyModel package. It is a metaprogramming technique that can be very well explored in this class of problems.

 

 

4 USEFUL REPRESENTATIONS

Some useful expansions for (1) and (3) can be derived using the concept of exponentiated distributions. For an arbitrary baseline cdf G(x), a random variable is said to have the exponentiated-G (“exp-G”) distribution with parameter r > 0, say Yr exp-G(r), if its pdf and cdf are

 

 

 

respectively. The properties of exponentiated distributions have been studied by several authors in recent years. See Nadarajah et al. (2013) for exponentiated Weibull, Kundu e Gupta (1999) for exponentiated exponential, Nadarajah e Kotz (2006) for exponentiated Fréchet and Nadarajah e Gupta (2007) for exponentiated gamma distributions.

Using expansion (2), we can write (3) as

 

 

 

Expanding the binomial term in this equation, we have

 

(6)

 

where hr(x) denotes the exp-G(r) density function and (for r = 1,2, . . .)

 

 

 

We can prove, for example, using Mathematica that . Equation (6) reveals that the Zeta-G density function is a linear combination of exp-G densities. So, several mathematical properties of the Zeta-G class can be obtained by knowing those of the exp-G distribution, see, for example, Nadarajah et al. (2013), Nadarajah e Kotz (2006), among others.

By integrating (6), we can express F(x) as

 

 

 

where Hr(x) denotes the exp-G(r) cdf.

 

 

5 MOMENTS

A first formula for the nth moment of X, say µ’n = E(Xn), can be obtained from (6) (let Yr exp-G(r) for r ≥ 1) as

 

(7)

 

Expressions for moments of some exponentiated distributions are given by Nadarajah e Kotz (2006), which can be used to obtain E(Xn). We now provide an application of (7) by taking the baseline Weibull introduced in Section 2.1. The exp-Weibull density with power parameter r is hr(x) = r α β xα−1 eβxα (1 − eβxα)r−1, which gives the nth moment of the ZW distribution

 

 

 

Plots of the skewness and kurtosis of the ZW distribution for some choices of α and β as functions of s are displayed in Figure 5.

A second formula for µ’n can be derived from (7) in terms of the baseline quantile function QG(u) = G−1 (u)

 

(8)

 

where

(9)

 

The ordinary moments of several Zeta-G distributions can be determined directly from equations (8) and (9).

For empirical purposes, the shapes of many distributions can be described by the incomplete moments. These types of moments play an important role for measuring inequality, for example, mean deviations and Lorenz and Bonferroni curves, which depend upon the incomplete moment of a distribution. The nth incomplete moment of X can be determined from (6) as

 

(10)

 

The last integral can be computed for most baseline G distributions at least numerically.

The symbolic computational softwares Maple, Mathematica and Matlab can automate all previous formulae since they have currently the ability to deal with analytic recurrence equations and sums of formidable size and complexity. In practical terms, we can substitute in the sums by a large number such as 20 for most practical applications. Equations (7)–(10) are the main results of this section.

 

Figura 5 - Skewness and kurtosis measures of the ZW distribution for some parameter values

(a)

(b)

Diagrama

Descrição gerada automaticamente

Uma imagem contendo Diagrama

Descrição gerada automaticamente

 

 

6 GENERATING FUNCTION

The mgf M(t) = E(etX) of X follows from (6) as

 

(10)

 

where Mr(t) is the mgf of Yr. Hence, M(t) can be immediately determined from the generating function of the exp-G distribution.

Another formula for M(t) follows from (6) as

 

(11)

 

where γr(t) can be expressed in terms of QG(u) as

 

(12)

 

We can obtain the mgfs of several Zeta-G distributions directly from equations (11) and (12). For example, the mgfs of the Zeta-Exponential (with parameter λ and for t < λ−1) and Zeta-Standard Logistic (for t < 1) distributions are

 

 

 

respectively.

 

 

7 MEAN DEVIATIONS

The mean deviations about the mean (δ1(X) = E(|Xµ’n|)) and about the median (δ2(X) = E(|XM|)) of X can be expressed as

 

(13)

 

respectively, where µ’n = E(X), M = Median(X) is the median, F(µ’n) is easily calculated from (1) and m1(z) is the first incomplete moment given by (10) with n = 1.

We have two alternative ways to compute δ1(X) and δ2(X). A general equation for m1(z) can be derived from (6) as

 

(13)

 

where

(14)

 

Equation (14) is the basic quantity to compute the mean deviations of the exp-G distributions. Hence, the mean deviations in (13) depend only on the mean deviations of the exp-G distribution. So, alternative representations for δ1(X) and δ2(X) are

 

 

 

In a similar manner, the mean deviations of the Zeta-G class can be determined from equation (10) with n = 1.

 

 

8 ESTIMATION

We calculate the maximum likelihood estimates (MLEs) of the parameters of the Zeta-G class from complete samples only. Let x1, ···,xn be a observed sample of size n from the Zeta-G(s,τ) distribution, where τ is a p × 1 vector of unknown parameters in the baseline distribution G(x; τ). The log-likelihood function for the vector of parameters θ = (s,τT)T is

 

 

(15)

 

The log-likelihood can be maximized by using well established routines like nlm or optimize in the R statistical package or by solving the nonlinear likelihood equations obtained by differentiating (15). The components of the score vector U(θ) are

 

 

 

 

for j = 1, ... p, and ζ (1,s) = ζ (s)’.

 

For interval estimation and hypothesis tests on the model parameters, we require the (p+1)×(p+1) observed information matrix J = J(θ) given in Appendix C. Under standard regularity conditions, the asymptotic distribution of  is , where I(θ) is the Fisher information matrix. In practice, we can replace I(θ) by the observed information matrix evaluated at , say , to construct approximate confidence intervals for the parameters based on the multivariate normal  distribution for .

 

 

9 APPLICATION

In this section, we apply the Zeta-G class to a real data set. We compare the ZW distribution with the Exponentiated Weibull (ExpW), Modified Weibull (MW), Kumaraswamy-inverse Weibull (Kw-IW), Exponentiated Nadarajah-Haghighi (ExpNH), Fréchet (Fr) and Chen distributions.

We use the AdequacyModel package, see Diniz Marinho et al. (2016), available in the Comprehensive R Archive Network - CRAN, currently in its stable version 2.0.0. The AdequacyModel project is maintained in GitHub at https://github.com/ prdm0/AdequacyModel under the terms of the GNU General Public License, GNU-GPL (≥ 3), where improvements can be made and sent to https://github.com/prdm0/AdequacyModel/issues. The package can also be installed directly of the GitHub, which enables developing versions to be tested and used before they are even available in CRAN. For the installation of the developing version, after removing pre-installed versions of the package that can be made with the remove.packages() function, the R programmer must have installed the devtools package and run the code below:

 

devtools : : install_github (repo = "prdm0 / AdequacyModel" , ref = "development" ,

force = TRUE, dependencies = TRUE,

type = "source")

 

The data refer to the observations of the failure time of secondary pumps of a reactor installed in a RSG-GSA, see Barlow e Davis (1977). The primary pump is a single phase centrifugal type and uses mechanical seals. The primary pump design parameters are: 1570 m3/h, engine power of 160 kW and total discharge height of 27 m. The secondary pump design parameters are pump flow of 1950 m3/h, engine power of 200 kW, and total discharge height of 29 m.

The data composed of 23 observations (Suprawhardana e Prayoto, 1999) are: 2.1600, 0.7460, 0.4040, 0.9540, 0.4910, 6.5600, 4.9920, 0.3470, 0.1500, 0.3580, 0.1010, 1.3590, 3.4650, 1.0600, 0.6140, 1.9210, 4.0820, 0.1990, 0.6050, 0.2730, 0.0700, 0.0620, 5.3200. Table 1 highlights some descriptive statistics for the current data and may provide important descriptions for understanding the observations. Figure 6 displays the histogram of the data.

 

Tabela 1 - Descriptive statistics

Statistics

Real data sets

Time between failures (hours)

Mean

1.5779

Median

0.6140

Mode

0.5000

Variance

3.7275

Skewness

1.3643

Kurtosis

0.5445

Minimun

0.0620

Maximun

6.5600

n

23

 

Figura 6 - Failure time of secondary pumps of a reactor installed in an RSG-GSA

 

Table 2 gives the MLEs and corresponding standard errors (SEs) and the values of the Cramér-von Mises (W*) and AndersonDarling (A*) statistics (Chen e Balakrishnan, 1995). In general, the smaller the values of these statistics, the better the fit to the data. To obtain the statistics, one can proceed as follows: (i) compute  and , where the  are in ascending order,  is an estimate of ,  is the standard normal cumulative function and  denotes its inverse; (ii) compute , where  is the sample mean of  and  is the sample variance; (iii) compute  and , and then  and . According to Chen e Balakrishnan (1995), these steps provide an approximation to the statistics below:

 

 

 

 

 

Table 2 lists the MLEs of the parameters and their SEs (in parentheses) for the compared models using the BFGS method. The goodness-of-fit statistics are also presented, i.e., A*, W*, AIC (Akaike’s Information Criterion) by Akaike (1981), AICc (Consistent Akaike’s Information Criterion) by Hurvich e Tsai (1989), BIC (Bayesian Information Criterion) by Schwarz (1978) and HQIC (Hannan-Quinn Information Criterion) by Hannan e Quinn (1979).

Using AdequacyModel, these statistics can be easily obtained through the goodness.fit() function. The figures in Table 2 indicate that the ZW distribution provides the best fit to these data, according to the A* and W* statistics. Figure 7 displays the plots of the fitted densities.

 

Tabela 2 - MLEs, SEs and the adequacy statistics (AIC, AICc, BIC and HQIC)

Distributions

Estimates

A*

W*

AIC

AICc

BIC

HQIC

ZW(α, β, s)

0.0033  

0.4800

1.8012

0.1910

0.0194

95.2608

96.5240

98.6673

96.1176

 

(0.0032

0.2863

0.0001)

 

 

 

 

 

 

ExpW(α, β, a)

2.9924

0.3030

9.9812

0.2129

0.0234

69.6642

70.9273

73.0707

70.5209

 

(2.8568

0.2752

27.6582)

 

 

 

 

 

 

MW(β, γ, λ)

0.7523 

0.7924

0.0090

0.4443

0.0682

71.0165

72.2796

74.4229

71.8732

 

(0.2199

0.1925

0.0849)

 

 

 

 

 

 

Kw-IW(α, β, θ)

3.1975

21.5102

0.2416

0.2170

0.0244

69.6465

70.9097

73.053

70.5032

 

(4.6991

99.5367

0.3046)

 

 

 

 

 

 

ExpNH(α, λ, β)

0.3239

19.7504

2.4286

0.2027

0.0214

69.4723

70.7354

72.8788

70.3290

 

(0.1514

55.9603

2.9442)

 

 

 

 

 

 

Fr(a, b)

0.3569

0.7832

 

0.3732

0.0448

69.8834

70.4834

72.1544

70.4546

 

(0.1007

0.1234)

 

 

 

 

 

 

 

Chen(β, λ)

0.4569

0.4045

 

0.7408

0.1235

71.6804

72.2804

73.9514

72.2515

 

(0.0628

0.1026)

 

 

 

 

 

 

 

 

Figura 7 - Failure time of secondary pumps of a reactor installed in an RSG-GSA

 

 

10 CONCLUDING REMARKS

We propose a general family of continuous distributions called the Zeta-G class. It extends several common distributions such as the Weibull, Fréchet, Burr XII and Lomax distributions. In fact, for each distribution G, we can define a new Zeta-G distribution using a simple equation. We demonstrate that some mathematical properties of the Zeta-G distribution can be readily obtained from those of the exponentiated-G distribution. The ordinary and incomplete moments, generating function and mean deviations of the Zeta-G class can be expressed explicitly in terms of the baseline quantile function. We discuss maximum likelihood estimation and inference on the parameters based on the Cramér-von Mises and Anderson-Darling statistics. An example to real data illustrates the potentiality of the new class.

 

Appendix A: Code 1

#include <Rcpp.h>

#include <math.h>

 

using namespace Rcpp;

// http://www.rcpp.org/

// [[Rcpp :: export]]

NumericVector derivationcpp (Function f, NumericVector par, NumericVector x, long double h = 1 . 0 e−8)

 

if (h == 0 || h > 1){

stop("h should assume values in the interval (0, 1).");

}

 

NumericVector result , a , b;

 

a = as <NumericVector>(f(par, x + h));

b = as <NumericVector>(f(par, x));

 

return (a − b)/h;

}

 

// [[Rcpp :: export]]

NumericVector polylogcpp(NumericVector z, long int s = 2,

long int n = 1e4){

NumericVector result(z.size ( ));

 

for (int k = 1; k <= n; ++k){

result = result + pow(z, k)/pow(k, s);

}

 

return result;

}

 

// [[Rcpp :: export]]

double riemann_zetacpp(long int s, long int n = 1e4){

double result;

 

if(s <= 1) stop("s > 1 is not TRUE");

 

for (long int k = 1; k <= n; ++k){

result = result + 1/pow(k, s);

}

return result;

}

Code 1: C++ code that implements the derivationcpp(), polylogcpp() and riemann_zetacpp() functions

 

Appendix B: Code 2

config_cpp <− function (dir = NULL){

 

ifelse(is . null(dir) , base :: setwd(base :: getwd( )) ,

base :: setwd(dir))

if(!("fast.cpp" %in% base :: dir( ))){

stop("The file fast.cpp is not found in the directory: ",

base :: getwd( ))

}

 

depend <− function (...){

return("Rcpp" %in% utils :: installed.packages (...)[ , "Package"])

}

 

if(depend( )){

Rcpp :: sourceCpp("fast.cpp")

return(message("All ready. Setup completed!"))

} else {

message ("== > The Rcpp package will need to be installed.")

install <− function(...){

tryCatch(expr = utils :: install.packages(...),

warning = function(w) NA)

}

 

pkg <− NULL

pkg <− install("Rcpp")

 

if(!(is.null(pkg))){

stop(" ==> Check your internet connection!")

}

 

Rcpp :: sourceCpp("fast.cpp")

return(message("All ready. Setup completed!"))

}

}

Code 2: Setup code and compilation of fast.cpp file

 

Appendix C: Information Matrix

The elements of the observed information matrix J(θ) for the model parameters (s,τ) of the Zeta-G class are given by

 

 

 

 

 

where , , , , , and .

 

 

REFERÊNCIAS

Akaike, H. (1981). This week’s citation classic. Current Contents Engineering, Technology, and Applied Sciences, 12, 42.

Alzaatreh, A., Lee, C., Famoye, F. (2013). A new method for generating families of continuous distributions. Metron, 71, 63–79, URL http://dx.doi.org/10.1007/s40300-013-0007-y.

Barlow, R. E., Davis, B. (1977). Analysis of time between failures for repairable components. Relatório Técnico, California Uni Berkekey Operations Research Center.

Chen, G., Balakrishnan, N. (1995). A general purpose approximate goodness-of-fit test. Journal of Quality Technology, 27(2), 154–161.

Cordeiro, G. M., Lemonte, A. J. (2013). On the Marshall-Olkin extended Weibull distribution. Statistical Papers, 54, 333–353, URL http://dx.doi.org/10.1007/s00362-012-0431-8.

Diniz Marinho, P. R., Bourguignon, M., Barros Dias, C. R. (2016). AdequacyModel: Adequacy of Probabilistic Models and General Purpose Optimization. URL https://CRAN.R-project.org/package=AdequacyModel, r package version 2.0.0.

Doray, L. G., Luong, A. (1995). Quadratic distance estimators for the Zeta family. Insurance: Mathematics and Economics, 16, 255–260.

Doray, L. G., Luong, A. (1997). Efficient estimators for the good family. Communications in Statistics-Simulation and Computation, 26, 1075–1088.

Gupta, P. L., Gupta, R. C., Ong, S. H., Srivastava, H. M. (2008). A class of hurwitz–lerch zeta distributions and their applications in reliability. Applied Mathematics and Computation, 196, 521–531.

Gut, A. (2005). Probability: A Graduate Course. Springer Verlag, New York.

Gut, A. (2006). Some remarks on the Riemann Zeta distribution. Revue Roumaine de Mathematiques Pures et Appliquees, 51, 205–217.

Hannan, E. J., Quinn, B. G. (1979). The determination of the order of an autoregression. Journal of the Royal Statistical Society: Series B (Methodological), 41(2), 190–195.

Hurvich, C. M., Tsai, C. L. (1989). Regression and time series model selection in small samples. Biometrika, 76(2), 297–307.

K. Oldham, J. M., Spainer, J. (2009). An Atlas of Functions. Springer.

Kundu, D., Gupta, R. D. (1999). Generalized exponential distribution. The Australian and New Zealand Journal of Statistics, 41, 173–188.

Lin, G. D., Hu, C. Y. (2001). The Riemann Zeta distribution. Bernoulli, 7, 817–828.

Nadarajah, S., Gupta, A. K. (2007). The exponentiated gamma distribution with application to drought data. Bulletin of the Calcutta Statistical Association, 59, 29–54.

Nadarajah, S., Kotz, S. (2006). The exponentiated type distributions. Acta Applicandae Mathematica, 92, 97–111, URL http://dx.doi.org/10.1007/s10440-006-9055-0.

Nadarajah, S., Cordeiro, G. M., Ortega, E. M. M. (2013). The exponentiated Weibull distribution: a survey. Statistical Papers, 54, 839–877, URL http://dx.doi.org/10.1007/s00362-012-0466-x.

Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.

Suprawhardana, M. S., Prayoto, S. (1999). Total time on test plot analysis for mechanical components of the rsg-gas reactor. Atom Indonesia, 25(2), 155–161.

Vilaplana, J. P. (1988). The Hurwitz distribution. in: M. L. Puri, J. P. Vilaplana, W. Wertz (Eds.), New Perspectives in Theoretical and Applied Statistics (Bilbao, 1986), Wiley Series in Probability and Mathematical Statistics, John Wiley and Sons, New York.

Zörnig, P., Altmann, G. (1995). Unified representation of Zipf distributions. Computational Statistics & Data Analysis, 19, 461–473.