Finite sample estimation of non-linear equations

The FS1 Estimator

Gilles Bélanger

January 24, 2006

The FS4 and FS5 estimators were the fourth and fifth attempt at creating finite sample estimators. FS2 and FS3 are all but abandoned. FS1 can only be used on a limited range of models and does not minimize variance. It is presented in here for curiosity purposes and to show how it is possible to create many finite sample estimators.

The FS1 estimator is not as useful as FS4 and FS5. Unlike FS4 and FS5, it needs the error term to have a linear relationship with the dependent variable. Furthermore, and that makes it useless, the error term has to be a fully specified normal. In FS4 and FS5, the error term can be non normal and have unknown parameters.

The estimator that is presented below consists of a weighted sum of biased estimators. The respective bias of these estimators are controlled in a way that makes them easy to track. The intuition here is to take a biased estimator and modify it to create a bigger bias, but in a systematic way. If we can't eliminate the bias of an estimator, we know how to make it worse.

For example, suppose we have two biased estimators. The second estimators has exactly twice the bias of the first one. That means, two times the first estimator minus one time the second is an unbiased estimator. Understanding that makes the rest of this section easy to follow.

We note here that this estimator is not strictly a transformation of the maximum likelihood estimator. We do not have a minimized variance. A second drawback is that the error term has to be totally specified. In the case of a Gaussian, variance must be known.

Notation

Consider the model

\begin{displaymath}
y_t = g_t(X_t,\theta) + \varepsilon_t~,~~t=1,... ,T
\end{displaymath} (1)

where $g_t$ is many times (enough for our purposes, see below) continuously differentiable. The subscript $t$ on $g$ is for generality. As usual, $y_t$ and $\varepsilon_t$ are scalars, $X_t$ and $\theta$ are vectors of not necessarily the same dimensions, but identified by either the form of $g$ or a restriction of parameter space $\Theta$. $\varepsilon$ follows a distribution that has the property that addition of two such deviates follows the same distribution. For simplicity, we will use an homoskedastic normal.

We can concatenate all $t=1, ..., T$ using the notation

\begin{displaymath}
y = g(X,\theta) + \varepsilon
\end{displaymath} (2)

Also we write the maximum likelihood estimator as

\begin{displaymath}
\hat\theta_{\hbox{\tiny ML}} = \arg\max_\theta {\cal L}(y,X,\theta)
\equiv {\cal M}(y,X)
\end{displaymath}

where ${\cal L}$ is the likelihood and ${\cal M}$ is the maximum likelihood estimator.

Estimator

This is the most simple case, it shows well how the estimator can be build and extended to include heteroskedasticity or non-normality. Consider

\begin{displaymath}
{\cal L}(y,X,\theta) = {\cal L}(g(X,\theta_0) + \varepsilon,X,\theta)
\end{displaymath}

if $\varepsilon_t=0, ~\forall t=1, ..., T$, we will have $\hat\theta_{\hbox{\tiny ML}}={\cal M}(g(X,\theta_0),X)=\theta_0$ which will be the central point of the Taylor expansion below1.

To construct the unbiased estimator, we start by defining a group of biased estimator $\{\tilde\theta_j\}_{j=1}^J$ where

\begin{displaymath}
\tilde\theta_j = {\cal M}\left(y+(j-1)\eta,X\right)
\end{displaymath} (3)

where $\eta$ is a vector of $idd~{\cal N}(0,\sigma^2)$ deviates, $\tilde\theta_{(1)}$ is $\hat\theta_{\hbox{\tiny ML}}$ and $j\in I\!\!N$. We will make here heavy use of the additive properties of the Gaussian distribution.

These estimators will form an unbiased estimator of the form

\begin{displaymath}
\hat\theta_{\hbox{\tiny FS1}} = \sum_{j=1}^J \alpha_j\tilde\theta_j
\end{displaymath} (4)

where $\alpha_j$ is independent of $y$ (it will in fact depend on $k$ and the wanted, user specified, level of precision).

If we do a Taylor approximation of a coordinate $k$ of vector $\tilde\theta_j$ around $\varepsilon=0$ and $\eta=0$, we have

\begin{displaymath}
\tilde\theta_j^{(k)} = \theta_0^{(k)} + \sum_{i=1}^\infty
\f...
...X,\theta_0)^i}
\frac{\left((j-1)\eta+\varepsilon\right)^i}{i!}
\end{displaymath} (5)

Forgive the notation abuses. Subscript $(k)$ on vectors and vector functions simply means the $k$ coordinate. Given that both $\varepsilon$ and $\eta$ are normal, we can write
\begin{displaymath}
I\!\!E[\tilde\theta^j] = \theta_0 + \sum_{i=1}^\infty
\frac{...
...}(X,\theta_0)^i}
\frac{\left(j\sigma M^{(i)}(0)\right)^i}{i!}
\end{displaymath} (6)

where $M^{(i)}(0)$ is the $i$th moment of a ${\cal N}(0,1)$, specifically $i!/(2^{i/2}(i/2)!)$ when $i$ is even and 0 when $i$ is odd. This produces
\begin{displaymath}
I\!\!E[\tilde\theta^j] = \theta_0 + \sum_{i=1}^\infty
\frac{...
...(k)}(X,\theta_0)^{2i}}
\frac{\left(j\sigma\right)^{2i}}{2^ii!}
\end{displaymath} (7)

To find the right values of the $\alpha_j$, we need to find the order of approximation that is relevant to the level of precision we want. In plain English, if we want to publish the results with three digits after the dot, we will end the approximation at the point where there are no effects on the published results. We totally control the precision here. For our three digit example, we want $\ell$ such that

\begin{displaymath}
\sum_{j=1}^J \alpha_j \sum_{i=\ell+1}^\infty
\frac{\partial^...
...ac{\left(j.\sigma\right)^{2i}}{2^ii!}
=o(\vert\xi\vert)<0.0005
\end{displaymath} (8)

Then our estimator becomes

\begin{displaymath}
I\!\!E[\hat\theta_{\hbox{\tiny FS1}(k)}]
I\!\!E[= \sum_{j=1...
...ft(j.\sigma\right)^{2i}}{2^ii!}
+\sum_{j=1}{J}\alpha_j\theta_0
\end{displaymath} (9)

Suppose for example that $\ell =2$. We would have

\begin{displaymath}\begin{array}{l}
I\!\!E[\hat\theta_{\hbox{\tiny FS1}(k)}] = (...
...X\right)}
{\partial g_{(k)}(X,\theta_0)^2}2\sigma^2
\end{array}\end{displaymath} (10)

This equation permits us to determine conditions on $\alpha_j$s.

\begin{displaymath}\begin{array}{l}
(\alpha_1+\alpha_2)\theta_{0(k)} = \theta_{0...
...
{\partial g_{(k)}(X,\theta_0)^2}2\sigma^2 = 0 \\
\end{array} \end{displaymath} (11)

which simplifies to
\begin{displaymath}\begin{array}{l}
(\alpha_1+\alpha_2) = 1 \\
\alpha_1\frac{1}{2}
+\alpha_22 = 0 \\
\end{array} \end{displaymath} (12)

It's a linear system that is easy to solve, it yields $\alpha_1=4/3$ and $\alpha_2=-1/3$. The linearity of the conditions will always be present so the system always has one sole solution. We simply take the first column of the inverse of the matrix made from the conditions. This matrix is such that every element $(i,j)$ is equal to $j^{2(i-1)}/(2^(i-1)(i-1)!)$.


\begin{displaymath}
{ 1 ~~~ 1 \brack 1/2 ~~ 2 }^{-1} = { 4/3 ~~~ -2/3 \brack -1/3 ~~ -2/3 }
\end{displaymath}

For example, with model $y_t=\theta\exp\{\sigma\varepsilon_t\}$ with $\sigma$ known, we have

\begin{displaymath}
\hat\theta_{(j)} =
\exp\left\{T^{-1}\sum_{t=1}^T\left(\log(y...
...ta\right)\right\}
~\hbox{with}~ \eta \sim {\cal N}(0,\sigma^2)
\end{displaymath}

and

\begin{displaymath}
\frac{\partial^i{\cal M}(g(\theta_0))}{\partial g(\theta_0)^...
...theta_0)]^i}
=\exp\{\log(\theta_0)\} = \theta_0 ~,~~ \forall i
\end{displaymath}

When $j=1$, we have a simple OLS estimator. The results are shown here.

Conclusion

We rarely want to do estimation where we specify the error term unless it is for identification purposes only, in the case of a Probit or other model for which the present one can't be used.

The only practical use would be basically as a asymptotic estimator for which we would have better confidence. In that case, the estimation would be either based on an estimate of $\theta$ based on an asymptotic estimation of $\sigma$ (call it $\tilde\theta_(\hat\sigma)$). This estimator would not even minimize asymptotic variance. The estimator is therefore unusable as a finite sample alternative to asymptotic estimation for almost all practical application. Furthermore, the asymptotic estimator has much better alternatives to it.

This estimator is strongly related to the one in Gray, Watkins and Schucany (1973). A usable estimator has been developed from it in Stefanski, Novick and Devanarayan (2005). It applies to the same model, but with unknown variance. In that respect it is not as general as FS4 and FS5 which applies to models that have a nonlinear relationship between the dependent variable and a not necessarily normal error term.

References

  • Bélanger, J. G. (2006). Unbiased Finite Sample Estimators, forthcoming.

  • Gray, H. L., Watkins, T. A. and Schucany, W. R. (1973). On the Jackknife Statistic and its Relation to UMVU Estimators in the Normal Case, Communications in Statistics, 2(4), pp 285-320.

  • Stefanski, L. A., Novick, S.J. and Devanarayan, V. (2005). Monte Carlo Estimation of $g(\mu)$ from Normally Distributed Data with Applications, Biometrika, 92(3), pp 737-746.

About this document ...

The FS1 Estimator

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 1 -no_navigation fs1.tex

The translation was initiated by Gilles Bélanger on 2006-01-24


Gilles Bélanger 2006-01-24