-
Notifications
You must be signed in to change notification settings - Fork 11
/
Estimation.tex
1864 lines (1726 loc) · 83.7 KB
/
Estimation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\ifx\wholebook\relax\else
\documentclass[twoside]{book}
\usepackage[active]{srcltx}
\usepackage[LY1]{fontenc}
\input{DhbDefinitions}
\begin{document}
\fi
\chapter{Statistical analysis}
\label{ch:estimation}
\begin{flushright}
{\sl L'exp\'erience instruit plus s\^{u}rement que le
conseil.}\footnote{Experience teaches more surely than
counseling.}\\ Andr\'e Gide
\end{flushright}
\vspace{1 ex}This chapter is dedicated on how to extract
information from large amount of data using statistical analysis.
One of the best book I have read on this subject is titled {\sl
How to lies with statistics}\footnote{D. Huff, {\sl How to lies
with statistics}, Norton and Co., New York 1954.}. This admittedly
slanted title seems a little pessimistic. The truth, however, is
that most people in their daily job ignore the little statistics
they have learned in high school. As a result, statistical
argumentation is often used wrongly to produce the wrong
conclusions.
The problems addressed in this section pertain to the
interpretation of experimental measurements. For a long time such
a discipline was reserved to physicists only. Recently natural
science disciplines discovered that statistics could be used
effectively to verify hypotheses or to determine parameters based
on experimental observations. Today, the best papers on statistics
and estimations are found primarily in natural science
publications (Biometrika \eg). Recently, the use of statistics has
been extended to financial analysis.
Statistics can be applied to experimental data in two ways.
Firstly, one can test the consistency and/or accuracy of the data.
These tests are the subject of the first 3 sections. Secondly, the
values of unknown parameters can be derived from experimental
data. This very important aspect of statistical analysis is
treated in the remaining of the chapter.
Figure \ref{fig:estimationclasses} shows the classes described in
this chapter.
\begin{figure}
\centering\includegraphics[width=11cm]{Figures/EstimationClasses}
\caption{Classes related to estimation}
\label{fig:estimationclasses}
\end{figure}
The reader should be aware that the techniques used for non-linear
least square fits (section \ref{sec:lsfnonlin}) can also be also
applied to solve systems of non-linear equations.
\section{$F$-test and the Fisher-Snedecor distribution}
\label{sec:Ftest} The $F$-test tries to answer the following
question: given two series of measurements, $x_1,\ldots,x_n$ and
$y_1,\ldots,y_m$, what is the probability that the two
measurements have the same standard deviation? The $F$-test is
used when there are not enough statistics to perform a detailed
analysis of the data.
Let us assume that the distribution of the two random variables,
$x$ and $y$, are normal distributions with respective averages
$\mu_x$ and $\mu_y$, and respective standard deviations $\sigma_x$
and $\sigma_y$. Then, $\bar{s}_x$, the standard deviation of
$x_1,\ldots,x_n$ is an estimator of $\sigma_x$; $\bar{s}_y$, the
standard deviation of $y_1,\ldots,y_m$ is an estimator of
$\sigma_y$. The following statistics
\begin{equation}
F={\displaystyle \bar{s}_x^2\over\sigma_x^2}\cdot{\displaystyle\sigma_y^2\over\bar{s}_y^2}
\end{equation}
can be shown to be distributed according to a Fisher-Snedecor
distribution with degrees of freedom $n$ and $m$. In particular,
if one wants to test for the equality of the two standard
deviations, one construct the following statistics:
\begin{equation}
\label{eq:Ftest}
F={\displaystyle \bar{s}_x^2\over\bar{s}_y^2}.
\end{equation}
Traditionally one chooses $\bar{s}_x>\bar{s}_y$ so that the
variable $F$ is always greater than one.
It is important to recall that the expression above is distributed
according to a Fisher-Snedecor distribution if and only if the two
sets of data are distributed according to a normal distribution.
For experimental measurements this is often the case unless
systematic errors are present. Nevertheless this assumption must
be verified before making an $F$-test.
Table \ref{tb:Fdist} shows the properties of the Fisher-Snedecor
distribution. The Fisher-Snedecor distribution is itself rarely
used as a probability density function, however.
\begin{table}[h]
\centering
\caption{Properties of the Fisher-Snedecor distribution}\label{tb:Fdist}
\vspace{1 ex}
\begin{tabular}{|l|c|} \hline
\vbox to 3ex{}Range of random variable & $\left[0,+\infty\right[$\\ *[1ex] \hline
\vbox to 5ex{}Probability density function & $\displaystyle P\left(x\right)=
{n_1^{n_1\over 2}n_2^{n_2\over 2}x^{n_1-1\over 2}
\over B\left({n_1\over 2},{n_2\over 2}\right)
\left(n_1+n_2 x\right)^{n_1+n_2\over 2}
}$ \\*[3ex] \hline
\vbox to 3ex{}Parameters & $n_1,n_2$ \\
& two positive integers\\*[1ex] \hline
\vbox to 5ex{}Distribution function & $\displaystyle F\left(x\right)=
B\left({n_1\over n_1+n_2x};{n_1\over 2},{n_2\over 2}\right)$ \\*[1ex] \hline
\vbox to 4ex{}Average & $\displaystyle{n_2\over n_2-2}$\quad for $n>2$ \\*[2ex]
& undefined otherwise\\*[1ex] \hline
\vbox to 5ex{}Variance & $\displaystyle{2n_2^2\left(n_1+n_2-2\right)\over
n_1\left(n_2-2\right)^2\left(n_2-4\right)}$\quad for $n>4$ \\*[3ex]
& undefined otherwise\\*[1ex] \hline
\vbox to 3ex{}Skewness & $ $ \\*[1ex] \hline
\vbox to 3ex{}Kurtosis & $ $ \\*[1ex] \hline
\end{tabular}
\end{table}
The main part of figure \ref{fig:fsnedecorDistr} shows the shape
of the Fisher-Snedecor distribution for some values of the degrees
of freedom.
\begin{figure}
\centering\includegraphics[width=12cm]{Figures/FisherSnedecorDistribution}
\caption{Fisher-Snedecor distribution for a few parameters}\label{fig:fsnedecorDistr}
\end{figure}
For large $n$ and $m$, the Fisher-Snedecor distribution
tends toward a normal distribution.
The confidence level of a Fisher-Snedecor distribution is defined
as the probability expressed in percent of finding a value larger
than $F$ defined in equation \ref{eq:Ftest} or lower than $1/F$.
The confidence level is thus related to the distribution function.
We have:
\begin{equation}
\label{eq:fcl}
{\cl}_F\left(x,n\right)=100\left\{1-\left[F\left(x\right)-F\left({1\over x}\right)
\right]\right\},
\end{equation}
where $x$ is the value $F$ defined in equation \ref{eq:Ftest}. The
change of notation was made to avoid confusion between the
acceptance function and the random variable. The confidence level
corresponds to the surface of the shaded areas in the insert in
the upper right corner of figure \ref{fig:fsnedecorDistr}.
\rubrique{Example.} Here is an example used to investigate the
goodness of the two random number generators discussed in section
\ref{sec:random}. The collected data are the error of the
covariance test described in section \ref{sec:random}. The
dimension of the covariance matrix is 7 and the number of trials
on each measurement was 1000. The table \ref{tb:Ftest} presents
the obtained results, expressed in ${1\over 1000}$.
\begin{table}[h]
\vspace{1 ex}
\centering
\caption{Covariance test of random number generator}\label{tb:Ftest}
\vspace{1 ex}
\begin{tabular}{|c|c|c|} \hline
& Congruential & Mitchell-Moore \\ \hline
1 & 5.56 & 7.48 \\
2 & 5.89 & 6.75 \\
3 & 4.66 & 3.77 \\
4 & 5.69 & 5.71 \\
5 & 5.34 & 7.25 \\
6 & 4.79 & 4.73 \\
7 & 4.80 & 6.23 \\
8 & 7.86 & 5.60 \\
9 & 3.64 & 5.94 \\
10 & 5.70 & 4.58 \\ \hline
Average & 5.40 & 5.80 \\
Std. dev. & 1.10 & 1.19 \\ \hline
\end{tabular}
\end{table}
the ratio of the two variances is $1.18$ and the degrees of
freedom are both 10. The F-test applied to the two variances gives
a confidence level of $20\%$. This means that there is only a
$20\%$ probability that the variances of the two series of
measurements are different.
\subsection{Fisher-Snedecor distribution --- Smalltalk implementation}
\marginpar{Figure \ref{fig:estimationclasses} with the box {\bf
FisherSnedecorDistribution} grayed.} Listing \ref{ls:Fdist} shows
the implementation of the Fisher-Snedecor distribution in
Smalltalk. Listing \ref{ls:Ftest} shows the implementation of the
$F$-test in Smalltalk. The following code example shows how to
perform a $F$-test between two sets of experimental measurements.
\begin{codeExample}
\label{exs:Ftest}
\begin{verbatim}
| mom1 mom2 confidenceLevel |
mom1 := DhbFixedStatisticalMoments new.
\end{verbatim}
\hfil{\tt<\sl Collecting measurements of set 1 into \tt
mom1>}\hfil
\begin{verbatim}
mom2 := DhbFixedStatisticalMoments new.
\end{verbatim}
\hfil{\tt<\sl Collecting measurements of set 2 into \tt
mom1>}\hfil
\begin{verbatim}
confidenceLevel := mom1 fConfidenceLevel: mom2.
\end{verbatim}
\end{codeExample}
Two instances of statistical moments (\cf section
\ref{sec:srobustmoment}) are created. Experimental data are
accumulated into each set separately (\cf code example
\ref{ex:smoments}). The last line returns the probability in
percent that the two sets of data have the same standard
deviation.
The class {\tt DhbFisherSnedecorDistribution} is implemented as a
subclass of {\tt DhbProbabilityDensity} because its distribution
function can be computed numerically using the incomplete beta
function (\cf section \ref{sec:incbeta}).
\begin{listing} Smalltalk implementation of the Fisher-Snedecor distribution \label{ls:Fdist}
\input{Smalltalk/Statistics/DhbFisherSnedecorDistribution}
\end{listing}
The computation of the confidence level for the $F$-test is
implemented in the method {\tt fConfidenceLevel:} of the class
{\tt DhbStatisticalMoments}. It calculates the statistics $F$
according to equation \ref{eq:Ftest}, creates an instance of a
Fisher-Snedecor distribution and passes the value of $F$ to the
method {\tt confidenceLevel:} of the distribution. The method {\tt
fConfidenceLevel:} is also implemented by the class {\tt
Histogram} where it is simply delegated to the statistical moments
accumulated by the histogram. The argument of the method can be a
statistical moment or a histogram since the messages sent by the
method are polymorphic to both classes.
\begin{listing} Smalltalk implementation of the $F$-test \label{ls:Ftest}
\input{Smalltalk/Estimation/DhbStatisticalMoments(DhbEstimationF)}
\input{Smalltalk/Estimation/DhbHistogram(DhbEstimationF)}
\end{listing}
\section{$t$-test and the Student distribution}
\label{sec:ttest} The $t$-test tries to answer the following
question: given two series of measurements, $x_1,\ldots,x_n$ and
$y_1,\ldots,y_m$, what is the probability that the two
measurements have the same average? The $t$-test is used when
there are not enough statistics to perform a detailed analysis of
the data.
Let us assume that the distribution of the two random variables,
$x$ and $y$, are normal distributions with respective averages
$\mu_x$ and $\mu_y$, and the same standard deviation $\sigma$.
Then $\bar{x}$, the average of $x_1,\ldots,x_n$ is an estimator of
$\mu_x$; $\bar{y}$, the average of $y_1,\ldots,y_m$ is an
estimator of $\mu_y$. An estimation $\bar{s}$ of the standard
deviation $\sigma$ can be made using both measurement samples. We
have:
\begin{equation}
\label{eq:sbart}
\bar{s}^2={\displaystyle\sum_{i=1}^n\left(x_i-\bar{x}\right)^2
+ \sum_{i=1}^m\left(y_i-\bar{y}\right)^2\over\displaystyle n+m-2}.
\end{equation}
One can prove that the following statistics:
\begin{equation}
t={\displaystyle \left(\bar{x}-\bar{y}\right)-\left(\mu_x
-\mu_y\right)
\over\displaystyle \bar{s}\sqrt{{1\over n}+{1\over m}}}
\end{equation}
is distributed according to a Student distribution with $n+m-2$
degrees of freedom. In particular, to test for the probability
that the two series of measurements have the same average, one
uses the following statistics:
\begin{equation}
\label{eq:tTest}
t={\displaystyle \bar{x}-\bar{y}
\over\displaystyle \bar{s}\sqrt{{1\over n}+{1\over m}}}.
\end{equation}
It is important to recall the two fundamental hypotheses that have
been made so far.
\begin{enumerate}
\item The two sets of data must be distributed according to a normal distribution.
\item The two sets of data must have the same standard deviation.
\end{enumerate}
Too many people use the $t$-test without first checking the
assumptions. Assumption 1 is usually fulfilled with experimental
measurements in the absence of systematic errors. Assumption 2.
however, must be checked, for example using the F-test discussed
in section \ref{sec:Ftest}.
Because the random variable of the distribution is traditionally
labeled $t$, this distribution is often called the
$t$-distribution. Table \ref{tb:tdist} shows the properties of the
Student distribution. The Student distribution is itself rarely
used as a probability density function, however.
\begin{table}[h]
\centering
\caption{Properties of the Student distribution}\label{tb:tdist}
\vspace{1 ex}
\begin{tabular}{|l|c|} \hline
\vbox to 3ex{}Range of random variable & $\left]-\infty,+\infty\right[$\\ *[1ex] \hline
\vbox to 5ex{}Probability density function & $\displaystyle P\left(x\right)=
{1\over\sqrt{n}B\left({n\over 2},{1\over 2}\right)}
\left(1+{t^2\over n}\right)^{-{n+1\over 2}}$ \\*[2ex] \hline
\vbox to 3ex{}Parameters & $n$ \\
& a positive integer\\*[1ex] \hline
\vbox to 4ex{}Distribution function &
\parbox{6cm}{$$F\left(x\right)=\left\{
\begin{array}{ll}
{1 + B\left({n\over n + x^2};{n\over 2},{1\over 2}\right)\over 2}&\mbox{\quad for
$x\ge 0$}\\*[1ex]
{1 - B\left({n\over n + x^2};{n\over 2},{1\over 2}\right)\over 2}&\mbox{\quad for $x<0$}
\end{array}\right.$$} \\*[1ex] \hline
\vbox to 3ex{}Average & $0$ \\*[1ex] \hline
\vbox to 3ex{}Variance & ${n\over n-2}$\quad for $n>2$\\
& undefined otherwise\\*[1ex] \hline
\vbox to 3ex{}Skewness & $0$ \\*[1ex] \hline
\vbox to 3ex{}Kurtosis & ${6\over n-4}$\quad for $n>4$\\
& undefined otherwise\\*[1ex] \hline
\end{tabular}
\end{table}
For $n=1$, the Student distribution is identical to a Cauchy
distribution with $\mu=0$ and $\beta=1$. For large $n$, the
Student distribution tends toward a normal distribution with
average 0 and variance 1. The main part of figure
\ref{fig:studentDistr} shows the shapes of the Student
distribution for a few values of the degrees of freedom. The
normal distribution is also given for comparison.
\begin{figure}
\centering\includegraphics[width=12cm]{Figures/StudentDistribution}
\caption{Student distribution for a few degrees of freedom
}\label{fig:studentDistr}
\end{figure}
The confidence level of a Student distribution is defined as the
probability to find a value whose absolute value is larger than a
given value. Thus, it estimates the level of confidence that the
hypothesis --- namely, that the two sets of measurements have the
same average --- cannot be accepted. Traditionally the confidence
level is given in percent. The confidence level corresponds to the
surface of shaded area in the insert in the upper left corner of
figure \ref{fig:studentDistr}. By definition, the confidence level
is related to the interval acceptance function:
\begin{equation}
\label{eq:tcl}
{\cl}_t\left(t,n\right)=100\left[1-
F\left(-\left|t\right|,\left|t\right|\right)\right],
\end{equation}
using the definition of the interval acceptance function (equation
\ref{eq:defacceptdiff}). The value of $t$ in equation \ref{eq:tcl}
is obtained from equation \ref{eq:tTest}.
The distribution function of the Student distribution is
calculated with the incomplete beta function (\cf section
\ref{sec:incbeta}). Using the fact that the distribution is
symmetric, one can derive the following expression
\begin{equation}
\label{eq:tsymacc}
F\left(-\left|t\right|,\left|t\right|\right)=B\left({n\over n + t^2};{n\over 2},{1\over
2}\right),
\end{equation}
from the properties of the distribution (\cf table \ref{tb:tdist})
and using equations \ref{eq:tcl} and \ref{eq:incBetaFunction}.
\rubrique{Example} Now, we shall continue the analysis of the
results of table \ref{tb:Ftest}. The $t$ value computed from the
two sets of measurements is $0.112$ for a degree of freedom of 18.
the corresponding confidence level is $8.76\%$. That is, there is
only a $8.76\%$ probability that the two generators have a
different behavior. Thus, we can conclude that the Mitchell-Moore
random generator is as good as the congruential random generator.
\subsection{Student distribution --- Smalltalk implementation}
\marginpar{Figure \ref{fig:estimationclasses} with the box {\bf
StudentDistribution} grayed.} Listing \ref{ls:tdist} shows the
implementation of the Student distribution in Smalltalk. Listing
\ref{ls:tTest} shows the implementation of the $t$-test in
Smalltalk. Performing a $t$-test between two sets of experimental
measurements is very similar to performing a $F$-test. In code
example \ref{exs:Ftest} it suffices to replace the last line with
the following:
\begin{quote}
\begin{verbatim}
confidenceLevel := mom1 fConfidenceLevel: mom2.
\end{verbatim}
\end{quote}
This last line returns the probability in percent that the two
sets of data have the same average provided that the two sets have
the same standard deviation.
The class {\tt DhbStudentDistribution} is implemented as a
subclass of {\tt DhbProbabilityDensity} because its distribution
function can be computed numerically using the incomplete beta
function (\cf section \ref{sec:incbeta}).
The method {\tt symmetricAcceptance:} computes the symmetric
acceptance function defined by equation \ref{eq:tsymacc}. This
method is used to compute the disribution function and the
confidence level. The method {\tt confidenceLevel:} gives the
confidence level in percent.
\begin{listing} Smalltalk implementation of the Student distribution \label{ls:tdist}
\input{Smalltalk/Statistics/DhbStudentDistribution}
\end{listing}
The computation of the confidence level for the $t$-test is
implemented in the method {\tt tConfidenceLevel:} of the class
{\tt DhbStatisticalMoments}. It calculates the statistics $t$
according to equation \ref{eq:tTest}, creates an instance of a
Student distribution and passes the value of $t$ to the method
{\tt confidenceLevel:} of the distribution. The method {\tt
tConfidenceLevel:} is also implemented by the class {\tt
Histogram} where it is simply delegated to the statistical moments
accumulated by the histogram. The argument of the method can be a
statistical moment or a histogram since the messages sent by the
method are polymorphic to both classes.
The method {\tt unnormalizedVariance} of class {\tt
DhbStatisticalMoments} corresponds to each sums in the numerator
of equation \ref{eq:sbart}. To allow performing a $t$-test also
with instances of class {\tt DhbFastStatisticalMoments}, it was
necessary to define this for that class.
\begin{listing} Smalltalk implementation of the $t$-test \label{ls:tTest}
\input{Smalltalk/Estimation/DhbStatisticalMoments(DhbEstimation)}
\input{Smalltalk/Estimation/DhbFastStatisticalMoments(DhbEstimation)}
\input{Smalltalk/Estimation/DhbHistogram(DhbEstimation)}
\end{listing}
\section{$\chi^2$-test and $\chi^2$ distribution}
\label{sec:chitest} The $\chi^2$-test tries to answer the
following question: how well a theory is able to predict observed
results? Alternatively a $\chi^2$-test can also tell whether two
independent sets of observed results are compatible. This latter
formulation is less frequently used than the former. Admittedly
these two questions are somewhat vague. We shall now put them in
mathematical terms for a more precise definition.
Let us assume that the measurement of an observable quantity
depends on some parameters. These parameters cannot be adjusted by
the experimenter but can be measured exactly\footnote{Of course,
there is no such thing as an exact measurement. The measurement of
the parameters must be far more precise than that of the observed
quantity.}. Let $x_p$ be the measured values of the observed
quantity where $p$ is a label for the parameters; let $\sigma_p$
be the standard deviation of $x_p$.
The first question assumes that one can predict the values of the
observed quantity: let $\mu_p$ be the predicted value of $x_p$.
Then the quantity:
\begin{equation}
y_p = { x_p - \mu_p \over \sigma_p}
\end{equation}
is distributed according to a normal distribution with average 0
and standard deviation 1 if and only if the quantities $x_p$ are
distributed according to a normal distribution with average
$\mu_p$ and standard deviation $\sigma_p$.
A $\chi^2$ distribution with $n$ degrees of freedom describes the
distribution of the sum of the squares of $n$ random variables
distributed according to a normal distribution with mean 0 and
standard deviation 1. Thus, the following quantity
\begin{equation}
\label{eq:defchitest}
S=\sum_p { \left(x_p - \mu_p\right)^2\over \sigma_p^2}
\end{equation}
is distributed according to a $\chi^2$ distribution with $n$
degrees of freedom where $n$ is the number of available
measurements (that is the number of terms in the sum of equation
\ref{eq:defchitest}).
To formulate the second question one must introduce a second set
of measurement of the same quantity and at the same values of the
parameters. Let $x^{\prime}_p$ be the second set of measured
values and $\sigma^{\prime}_p$ the corresponding standard
deviations. The estimated standard deviation for the difference
$x_p - x^{\prime}_p$ is $\sqrt{\sigma_p^2+\sigma^{\prime 2}_p }$.
If the two sets of measurements are compatible, the quantity
\begin{equation}
y^{\prime}_p = { x_p - x^{\prime}_p \over
\sqrt{\sigma_p^2+\sigma^{\prime 2}_p}}
\end{equation}
is distributed according to a normal distribution with average 0
and standard deviation 1. Then the following quantity
\begin{equation}
\label{eq:defchitestcmp}
S=\sum_p { \left(x_p - x^{\prime}_p\right)^2\over \sigma_p^2+\sigma^{\prime 2}_p2}
\end{equation}
is distributed according to a $\chi^2$ distribution with $n$
degrees of freedom.
Table \ref{tb:chi2dist} shows the properties of the $\chi^2$
distribution.
\begin{table}[h]
\centering
\caption{Properties of the $\chi^2$ distribution}\label{tb:chi2dist}
\vspace{1 ex}
\begin{tabular}{|l|c|} \hline
\vbox to 3ex{}Range of random variable & $\left[0,+\infty\right[$\\ *[1ex] \hline
\vbox to 5ex{}Probability density function & $\displaystyle P\left(x\right)=
{\displaystyle x^{{n\over 2}-1}e^{-{x\over 2}}
\over\displaystyle 2^{n\over 2}\Gamma\left({n\over 2}\right)}$ \\*[4ex] \hline
\vbox to 3ex{}Parameters & $n$ \\
& a positive integer\\*[1ex] \hline
\vbox to 4ex{}Distribution function & $\displaystyle F\left(x\right)=
\Gamma\left({x\over 2};{n\over 2}\right)$ \\*[1ex] \hline
\vbox to 3ex{}Average & $n$ \\*[1ex] \hline
\vbox to 3ex{}Variance & $2n$ \\*[1ex] \hline
\vbox to 4ex{}Skewness & $2\sqrt{\displaystyle 2\over\displaystyle n}$ \\*[1ex] \hline
\vbox to 4ex{}Kurtosis & ${\displaystyle 12\over\displaystyle n}$ \\*[1ex] \hline
\end{tabular}
\end{table}
It is a special case of the gamma distribution with
$\alpha={n\over 2}$ and $\beta=2$ (\cf section
\ref{sec:gammadist}). For $n>30$ one can prove that the variable
$y=\sqrt{2x}-\sqrt{2n-1}$ is approximately distributed according
to a normal distribution with average 0 and standard deviation 1.
If $n$ is very large the $\chi^2$ distribution tends toward a
normal distribution with average $n$ and standard deviation
$\sqrt{2n}$.
Figure \ref{fig:chi2Distr} shows the shape of the $\chi^2$
distribution for a few values of the degree of freedom.
\begin{figure}
\centering\includegraphics[width=12cm]{Figures/Chi2Distribution}
\caption{$\chi^2$ distribution for a few degrees of freedom
}\label{fig:chi2Distr}
\end{figure}
To perform a $\chi^2$-test, it is customary to evaluate the
probability of finding a value larger than the value obtained in
equations \ref{eq:defchitest} or \ref{eq:defchitestcmp}. In this
form, the result of a $\chi^2$-test gives the probability that the
set of measurements is {\sl not} compatible with the prediction or
with another set of measurements. The confidence level of a
$\chi^2$ value is defined as the probability of finding a value
larger than $\chi^2$ expressed in percent. It is thus related to
the distribution function as follows:
\begin{equation}
{\cl}_{S}=100\left[ 1-F\left(S\right)\right],
\end{equation}
where $S$ is the quantity defined in equation \ref{eq:defchitest}
or \ref{eq:defchitestcmp}. The confidence level corresponds to the
surface of the shaded area of the insert in the upper right corner
of figure \ref{fig:chi2Distr}.
Since the $\chi^2$ distribution is a special case of the gamma
distribution the confidence level can be expressed with the
incomplete gamma function (\cf section \ref{sec:incGamma}):
\begin{equation}
\label{eq:chiTest}
{\cl}_{S}=100\left[ 1-\Gamma\left({S \over 2},{n\over 2}\right)\right].
\end{equation}
For large $n$ the $\chi^2$ confidence level can be computed from
the error function (\cf section \ref{sec:errorFunctionDef}):
\begin{equation}
\label{eq:chiTestAsymp}
{\cl}_{S}=100\left[ 1-\erf\left(\sqrt{2S}-\sqrt{2n-1}\right)\right].
\end{equation}
\subsection{$\chi^2$ distribution --- Smalltalk implementation}
\marginpar{Figure \ref{fig:estimationclasses} with the box {\bf
ChiSquaredDistribution} grayed.} Listing \ref{ls:chidist} shows
the implementation of the $\chi^2$ distribution in Smalltalk. The
asymptotic limit is implemented directly in the class creation
method.
\begin{listing} Smalltalk implementation of the $\chi^2$ distribution \label{ls:chidist}
\input{Smalltalk/Statistics/DhbChiSquareDistribution}
\end{listing}
\subsection{Weighted point implementation}
\label{sec:weightedPoint} \marginpar{Figure
\ref{fig:estimationclasses} with the box {\bf WeightedPoint}
grayed.} As we shall see in the rest of this chapter, the
evaluation of equation \ref{eq:defchitest} is performed at many
places. Thus, it is convenient to create a new class handling this
type of calculation. The new class is called {\tt
DhbWeightedPoint} in Smalltalk.
\noindent A weighted point has the following instance variables:
\begin{description}
\item[\tt xValue] the $x$ value of the data point, that is $x_i$,
\item[\tt yValue] the $y$ value of the data point, that is $y_i$,
\item[\tt weight] the weight of the point, that is $1/\sigma_i^2$
and
\item[\tt error] the error of the $y$ value, that is $\sigma_i$.
\end{description}
Accessor methods for each of these instance variables are
provided. The accessor method for the error is using lazy
initialization to compute the error from the weight in case the
error has not yet been defined.
The method {\tt chi2Contribution} --- with an added semicolon at
the end of the name for Smalltalk --- implements the computation
of one term of the sum in equation \ref{eq:defchitest}. The
argument of the method is any object implementing the behavior of
a one-variable function defined in section \ref{sec:function}.
Creating instances of the classes can be done in many ways. The
fundamental method takes as arguments $x_i$, $y_i$ and the weight
$1/\sigma_i^2$. However convenience methods are provided for
frequent cases:
\begin{enumerate}
\item $x_i$, $y_i$ and the error on $y_i$, $\sigma_i$;
\item $x_i$ and the content of a histogram bin; the weight is derived
from the bin contents as explained in section \ref{sec:chitesthist};
\item $x_i$, $y_i$ without known error; the weight of the point is set to
1; points without error should not be used together with points
with errors;
\item $x_i$ and a statistical moment; in this case, the value $y_i$
is an average over a set of measurements; the weight is determined
from the error on the average (\cf section \ref{sec:moments});
\end{enumerate}
Examples of use of weighted points appear in many sections of this
chapter (\ref{sec:chitesthist}, \ref{sec:lsfpol},
\ref{sec:lsfnonlin}).
In the Smalltalk class {\tt DhbWeightedPoint} the values $x_i$ and
$y_i$ are always supplied as an instance of the class {\tt Point}.
The class {\tt DhbWeightedPoint} has the following class creation
methods:
\begin{description}
\item[\tt point:weight:] fundamental method;
\item[\tt point:error:] convenience method 1;
\item[\tt point:count:] convenience method 2;
\item[\tt point:] convenience method 3
\end{description}
The convenience method 4 is implemented by the method {\tt
asWeightedPoint} of the class {\tt DhbStatisticalMoments}. This
kind of technique is quite common in Smalltalk instead of making a
class creation method with an explicit name ({\tt fromMoment:}
\eg).
\begin{listing} Smalltalk implementation of the weighted point class
\label{ls:weightedPoint}
\input{Smalltalk/Estimation/DhbWeightedPoint}
\input{Smalltalk/Estimation/DhbStatisticalMoments(DhbEstimationChi2)}
\end{listing}
\section{$\chi^2$-test on histograms}
\label{sec:chitesthist} As we have seen in section
\ref{sec:histogram} histograms are often used to collect
experimental data. Performing a $\chi^2$-test of data accumulated
into a histogram against a function is a frequent task of data
analysis.
The $\chi^2$ statistics defined by equation \ref{eq:defchitest}
requires an estimate of the standard deviation of the content of
each bin. One can show that the contents of a histogram bin is
distributed according to a Poisson distribution. The Poisson
distribution is a discrete distribution\footnote{A discrete
distribution is a probability distribution whose random variable
is an integer.} whose average is equal to the variance. The
probability of observing the integer $k$ is defined by:
\begin{equation}
P_{\mu}\left(k\right)= {\mu^k\over k!}e^{\mu},
\end{equation}
where $\mu$ is the average of the distribution. In the case of a
histogram, the estimated variance of the bin content is then the
bin content itself. Therefore equation \ref{eq:defchitest}
becomes:
\begin{equation}
\label{eq:defchitesthist}
S=\sum_{i=1}^n
{ \left\{n_i - \mu\left[x_{\min} + \left(i+{1\over 2} \right)w\right]\right\}^2 \over
n_i},
\end{equation}
where $n$ is the number of bins of the histogram, $x_{\min}$ its
minimum and $w$ its bin width. The estimation of the bin content
against which the $\chi^2$ statistics is computed, $\mu$, is now a
function evaluated at the middle of each bin to average out
variations of the function over the bin interval.
In fact, the function $\mu$ is often related to a probability
density function since histograms are measuring probability
distributions. In this case the evaluation is somewhat different.
Let $P\left(x\right)$ be the probability density function against
which the $\chi^2$ statistics is computed. Then, the predicted bin
content for bin $i$ is given by:
\begin{equation}
\label{eq:probbincontents}
\mu_i = wNP\left[x_{\min} + \left(i+{1\over 2} \right)w\right],
\end{equation}
where $N$ is the total number of values accumulated in the
histogram. This is a symmetric version of the definition of a
probability density function: $w$ plays the role of $dx$ in
equation \ref{eq:probdensity}. Plugging equation
\ref{eq:probbincontents} into equation \ref{eq:defchitesthist}
yields the expression of the $\chi^2$ statistics for a histogram
computed against a probability density function $P\left(x\right)$:
\begin{equation}
\label{eq:defchitesthistprob}
S=\sum_{i=1}^n
{ \left\{n_i - wNP\left[x_{\min} + \left(i+{1\over 2} \right)w\right]\right\}^2 \over
n_i},
\end{equation}
This equation cannot be applied for empty bins. If the bin is
empty one can set the weight to 1. This corresponds to a $63\%$
probability of observing no counts if the expected number of
measurement is larger than 0.
In both implementations a single class is in charge of evaluating
the predicted bin contents. This class is called a scaled
probability density function. It is defined by a probability
distribution and a histogram.
\subsection{$\chi^2$-test on histograms --- Smalltalk implementation}
\marginpar{Figure \ref{fig:estimationclasses} with the box {\bf
ScaledProbabilityDistribution} grayed.} Listing
\ref{ls:scaleddist} shows the implementation of a scaled
probability density function in Smalltalk. Listing
\ref{ls:chitesthist} shows the additional methods for the class
{\tt DhbHistogram} needed to perform a $\chi^2$-test. Examples of
use are given in sections \ref{sec:slsfnonlin} and
\ref{sec:smlfhist}. Here is a simple example showing how to
compute a $\chi^2$-confidence level to estimate the goodness of a
random number generator.
\begin{codeExample}
\label{exs:chitest}
\begin{verbatim}
| trials probDistr histogram |
trials := 5000.
probDistr := DhbNormalDistribution new.
histogram := DhbHistogram new.
histogram freeExtent: true; setDesiredNumberOfBins: 100.
trials timesRepeat: [ histogram accumulate: probDistr random ].
histogram chi2ConfidenceLevelAgainst:
( DhbScaledProbabilityDensityFunction histogram: histogram
against: probDistr)
\end{verbatim}
\end{codeExample}
The first line after the declaration defines the number of data to
be generated to 5000. After, an new instance of a probability
distribution --- in this case a normal distribution with average 0
and variance 1 --- is created. Then, a new instance of a histogram
is created and the next line defines it with a rough number of
bins of 100 and the ability to automatically adjust its limits.
After all instances have been created, random data generated by
the probability distribution are generated. The last statement
--- extending itself over the last three lines --- calculates
the confidence level. The argument of the method {\tt
chi2ConfidenceLevelAgainst:} is a scaled probability distribution
constructed over the histogram and the probability distribution
used to generate the accumulated data.
The class {\tt DhbScaledProbabilityDensityFunction} has two class
creation methods. The class method {\tt histogram:against:} takes
two arguments, a histogram and a probability distribution. This
method is used to perform a $\chi^2$-test of the specified
histogram against the given probability distribution. The class
method {\tt histogram:distributionClass:} first create a
probability distribution of the given class using parameters
estimated from the histogram. This method is used to create a
scaled probability density function whose parameters will be
determined with least square or maximum likelihood fits.
\begin{listing} Smalltalk implementation of a scaled
probability density function \label{ls:scaleddist}
\input{Smalltalk/Statistics/DhbScaledProbabilityDensityFunction}
\end{listing}
The evaluation of equation \ref{eq:defchitesthistprob} is
performed by the method {\tt chi2Against:} of the class {\tt
DhbHistogram}. This method uses the iterator method {\tt
pointsAndErrorsDo:}. This method iterates on all bins and performs
on each of them a block using as argument a weighted point as
described in section \ref{sec:weightedPoint}. This iterator method
is also used for least square and maximum likelihood fits (\cf
sections \ref{sec:slsfnonlin} and \ref{sec:smlfhist}).
\begin{listing} Smalltalk implementation of $\chi^2$-test on histograms \label{ls:chitesthist}
\input{Smalltalk/Estimation/DhbHistogram(DhbEstimationChi2)}
\end{listing}
\section{Definition of estimation}
Let us assume that an observable quantity $y$ is following a
probability distribution described by a set of observable
quantities $x_1,x_2\ldots$ - called the experimental conditions -
and a set of parameters $p_1,p_2\ldots$. In other words, the
probability density function of the random variable\footnote{For
simplicity we shall use the same notation for the random variable
and the observable quantity.} corresponding to the observable
quantity $y$ can be written as
\begin{equation}
\label{eq:estimprob}
P\left(y\right)=P\left(y;{\bf x},{\bf p}\right),
\end{equation}
where ${\bf x}$ is the vector $\left(x_1,x_2\ldots\right)$ and
${\bf p}$ the vector $\left(p_1,p_2\ldots\right)$.
The estimation of the values of the parameters $p_1,p_2\ldots$ is
the determination of the parameters $p_1,p_2\ldots$ by performing
several measurements of the observable $y$ for different
experimental conditions $x_1,x_2\ldots$.
Let $N$ be the number of measurements; let $y_i$ be the $\mbox{\rm
i}^{\mbox{\rm th}}$ measured value of the observable $y$ under the
experimental conditions ${\bf x}_i$.
\subsection{Maximum likelihood estimation}
\label{sec:mlf} The maximum likelihood estimation of the
parameters ${\bf p}$ is the set of values $\bar{\bf p}$ maximizing
the following function:
\begin{equation}
\label{eq:maxlike}
L\left({\bf p}\right)=\prod_{i=1}^N P\left(y_i;{\bf x}_i,{\bf p}\right)
\end{equation}
By definition the likelihood function $L\left({\bf p}\right)$ is
the probability of making the $N$ measurements. The maximum
likelihood estimation determines the estimation $\bar{\bf p}$ of
the parameters ${\bf p}$ such that the series of measurements
performed is the most probable, hence the name maximum likelihood.
One can show that the maximum likelihood estimation is robust and
unbiased. The robustness and the bias of an estimation are defined
mathematically. For short, {\sl robust} means that the estimation
converges toward the true value of the parameters for an infinite
number of measurements; {\sl unbiased} means that the deviations
between the estimated parameters and their true value are
symmetrically distributed around 0 for any finite number of
measurements.
Equation \ref{eq:maxlike} is often rewritten in logarithmic form
to ease the computation of the likelihood function.
\begin{equation}
\label{eq:logmaxlike}
I\left({\bf p}\right)=\ln L\left({\bf p}\right)=\sum_{i=1}^N\ln P\left(y_i;{\bf x}_i,{\bf p}\right)
\end{equation}
The function $I\left({\bf p}\right)$ is related to information and
is used in information theory.
\subsection{Least square estimation}
Let us assume that the random variable $y$ is distributed
according to a normal distribution of given standard deviation
$\sigma$ and that the average of the normal distribution is given
by a function $F\left({\bf x},{\bf p}\right)$ of the experimental
conditions and the parameters. In this case equation
\ref{eq:estimprob} becomes:
\begin{equation}
\label{eq:estimprobnorm}
P\left(y\right)={1\over\sqrt{2\pi\sigma^2}}e^{-{
\left[y-F\left({\bf x},{\bf p}\right)\right]^2\over 2\sigma^2}}
\end{equation}
Plugging equation \ref{eq:estimprobnorm} into equation
\ref{eq:logmaxlike} yields:
\begin{equation}
I\left({\bf p}\right)=-N\sqrt{2\pi\sigma^2}-
\sum_{i=1}^N {
\left[y-F\left({\bf x},{\bf p}\right)\right]^2\over 2\sigma^2}
\end{equation}
The problem of finding the maximum of $I\left({\bf p}\right)$ is
now equivalent to the problem of finding the minimum of the
function:
\begin{equation}
\label{eq:lsmlestim}
S_{\mathop{ML}}\left({\bf p}\right)=\sum_{i=1}^N {
\left[y-F\left({\bf x},{\bf p}\right)\right]^2\over \sigma^2},
\end{equation}
where a redundant factor 2 has been removed. This kind of
estimation is called least square estimation. Written as in
equation \ref{eq:lsmlestim} least square estimation is fully
equivalent to maximum likelihood estimation. By definition, the
quantity $S_{\mathop{ML}}\left({\bf p}\right)$ is distributed as a
$\chi^2$ random variable with $N-m$ degrees of freedom where $m$
is the number of parameters, that is the dimension of the vector
${\bf p}$.
In practice, however, the standard deviation $\sigma$ is not known
and frequently depends on the parameters ${\bf p}$. In that case,
one uses instead an estimation for the standard deviation. Either
the standard deviation of each measurement is determined
experimentally by making several measurements under the same
experimental conditions or it is estimated from the measurement
error. Then, equation \ref{eq:lsmlestim} can be rewritten as:
\begin{equation}
\label{eq:lsestim}
S\left({\bf p}\right)=\sum_{i=1}^N {
\left[y-F\left({\bf x},{\bf p}\right)\right]^2\over \sigma_i^2}.
\end{equation}
The least square estimation is obtained by minimizing the quantity
$S\left({\bf p}\right)$ with respect to the parameters ${\bf p}$.
This kind of estimation can be used to determine the parameters of
a functional dependence of the variable $y$ from the observable
quantities ${\bf x}$. For this reason it is also called a least
square fit when it is used to fit the parameter of a functional
dependence to the measurements.
In general the distribution of the random variable $y$ may not be
a normal distribution. One can nevertheless show that the least
square estimation is robust. However, it is biased. Depending on
the nature of the distribution of the random variable $y$ the
parameters may be over- or underestimated. This is especially the
case when working with histograms.
We have said that all measurements of the observable quantities
$y$ must be distributed according to a normal distribution so that
the quantity $S\left({\bf p}\right)$ of equation \ref{eq:lsestim}
is distributed as a $\chi^2$ random variable. In general this is
often the case\footnote{This is a consequence of a theorem known
as the law of large numbers.} when dealing with a large quantity
of measurements. Thus, a least square fit is also called a
$\chi^2$ fit. In this case one can apply the $\chi^2$-test
described in section \ref{sec:chitest} to assess the goodness of
the fitted function.
If $S\left({\bf p}\right)$ has a minimum respective to ${\bf p}$
then all partial derivatives of the function $S\left({\bf
p}\right)$ respective to each of the components of the vector
${\bf p}$ are zero. Since the function is positive and quadratic
in ${\bf p}$, it is clear that the function must have at least one
minimum. Under this circumstances the minimum can be obtained by
solving the following set of equations:
\begin{equation}
\label{eq:lsderiv}
{\displaystyle \partial\over\displaystyle\partial p_j}
F\left({\bf x}_i;p_1,\ldots,p_m\right)=0\mbox{\quad for $j=1,\ldots,m$}
\end{equation}
where $m$ is the number of parameters, that is the dimension of
the vector ${\bf p}$. When a solution is found, one should in
principle verify that it is really a minimum. Solving equation
\ref{eq:lsderiv} gives the following system of equations:
\begin{equation}
\label{eq:lsequs}
\sum_{i=1}^N {
y-F\left({\bf x},{\bf p}\right)\over \sigma_i^2}\cdot
{\displaystyle \partial\over\displaystyle\partial p_j} S\left(p_1,\ldots,p_m\right)
=0\mbox{\quad for $j=1,\ldots,m$}
\end{equation}
Once the system above has been solved, one can compute the value
of $S\left({\bf p}\right)$ using equations \ref{eq:lsestim} or,
better, the value $S_{\mathop{ML}}\left({\bf p}\right)$ using
equation \ref{eq:lsmlestim}. Computing the $\chi^2$ confidence
level of that value (\cf section \ref{sec:chitest}) using a
$\chi^2$ distribution with $N-m$ degrees of freedom gives the
probability that the fit is acceptable.
\section{Least square fit with linear dependence}
\label{eq:lslinear} If the function $F\left({\bf x},{\bf
p}\right)$ is a linear function of the vector ${\bf p}$, it can be
written in the following form:
\begin{equation}
F\left({\bf x},{\bf p}\right)=\sum_{j=1}^m f_j\left({\bf
x}\right)\cdot p_j.
\end{equation}
In that case, equation \ref{eq:lsequs} become a system of linear
equations of the form:
\begin{equation}
\label{eq:lsmequs}
{\bf M}\cdot{\bf p}={\bf c},
\end{equation}
where the coefficients of the matrix ${\bf M}$ are given by:
\begin{equation}
M_{jk}=\sum_{i=1}^N {\displaystyle f_j\left({\bf x}_i\right)f_k\left({\bf x}_i\right)
\over\displaystyle \sigma_i^2}\mbox{\quad for $j,k=1,\ldots,m$},
\end{equation}
and the components of the constant vector ${\bf c}$ are given by:
\begin{equation}
c_j=\sum_{i=1}^N {\displaystyle y_if_j\left({\bf x}_i\right)
\over\displaystyle \sigma_i^2}\mbox{\quad for $j=1,\ldots,m$}.
\end{equation}
Equation \ref{eq:lsmequs} is a system of linear equation which can
be solved according to the algorithms exposed in sections
\ref{sec:lineqs} and \ref{sec:lup}. If one is interested only in
the solution this is all there is to do.
A proper fit, however, should give an estimation of the error in
estimating the parameters. The inverse of the matrix ${\bf M}$ is
the error matrix for the fit. The error matrix is used to compute
the estimation of variance on the function $F\left({\bf x},{\bf
p}\right)$ as follows:
\begin{equation}
\label{eq:fitError}
\var\left[F\left({\bf x},{\bf p}\right)\right]=\sum_{j=1}^m \sum_{k=1}^m
M_{jk}^{-1}f_j\left({\bf x}\right)f_k\left({\bf x}\right).
\end{equation}
The estimated error on the function $F\left({\bf x},{\bf
p}\right)$ is the square root of the estimated variance.
The diagonal elements of the error matrix are the variance of the
corresponding parameter. That is:
\begin{equation}
\var\left(p_j\right)=M_{jj}^{-1}\mbox{\quad for $j=1,\ldots,m$}.
\end{equation}
The off diagonal elements describe the correlation between the
errors on the parameters. One defines the correlation coefficient
of parameter $p_j$ and $p_j$ by:
\begin{equation}
\cor\left(p_j,p_k\right)={\displaystyle M_{jk}^{-1}
\over\displaystyle\sqrt{M_{jj}^{-1}M_{kk}^{-1}}}\mbox{\quad for
$j,k=1,\ldots,m$ and $j\ne k$}.
\end{equation}
All correlation coefficients are comprised between -1 and 1. If
the absolute value of a correlation coefficient is close to 1, it
means that one of the two corresponding two parameters is
redundant for the fit. In other word, one parameter can be
expressed as a function of the other.
\section{Linear regression}
A linear regression is a least square fit with a linear function