Parallel Colt 0.7.2

cern.colt.matrix.tdouble.algo
Class DoubleStatistic

java.lang.Object
  extended by cern.colt.matrix.tdouble.algo.DoubleStatistic

public class DoubleStatistic
extends Object

Basic statistics operations on matrices. Computation of covariance, correlation, distance matrix. Random sampling views. Conversion to histograms with and without OLAP cube operators. Conversion to bins with retrieval of statistical bin measures. Also see cern.jet.stat and hep.aida.tdouble.bin, in particular DynamicDoubleBin1D.

Examples:

A covariance(A) correlation(covariance(A)) distance(A,EUCLID)
4 x 3 matrix
1  2   3
2  4   6
3  6   9
4 -8 -10
3 x 3 matrix
 1.25 -3.5 -4.5
-3.5  29   39  
-4.5  39   52.5
3 x 3 matrix
 1        -0.581318 -0.555492
-0.581318  1         0.999507
-0.555492  0.999507  1       
3 x 3 matrix
 0        12.569805 15.874508
12.569805  0         4.242641
15.874508  4.242641  0       
     

Version:
1.0, 09/24/99
Author:
wolfgang.hoschek@cern.ch

Nested Class Summary
static interface DoubleStatistic.VectorVectorFunction
          Interface that represents a function object: a function that takes two argument vectors and returns a single value.
 
Field Summary
static DoubleStatistic.VectorVectorFunction BRAY_CURTIS
          Bray-Curtis distance function; Sum( abs(x[i]-y[i]) ) / Sum( x[i]+y[i] ).
static DoubleStatistic.VectorVectorFunction CANBERRA
          Canberra distance function; Sum( abs(x[i]-y[i]) / abs(x[i]+y[i]) ).
static DoubleStatistic.VectorVectorFunction EUCLID
          Euclidean distance function; Sqrt(Sum( (x[i]-y[i])^2 )).
static DoubleStatistic.VectorVectorFunction MANHATTAN
          Manhattan distance function; Sum( abs(x[i]-y[i]) ).
static DoubleStatistic.VectorVectorFunction MAXIMUM
          Maximum distance function; Max( abs(x[i]-y[i]) ).
 
Method Summary
static DoubleMatrix2D aggregate(DoubleMatrix2D matrix, DoubleBinFunction1D[] aggr, DoubleMatrix2D result)
          Applies the given aggregation functions to each column and stores the results in a the result matrix.
static DynamicDoubleBin1D bin(DoubleMatrix1D vector)
          Fills all cell values of the given vector into a bin from which statistics measures can be retrieved efficiently.
static DoubleMatrix2D correlation(DoubleMatrix2D covariance)
          Modifies the given covariance matrix to be a correlation matrix (in-place).
static DoubleMatrix2D covariance(DoubleMatrix2D matrix)
          Constructs and returns the covariance matrix of the given matrix.
static DoubleIHistogram2D cube(DoubleMatrix1D x, DoubleMatrix1D y, DoubleMatrix1D weights)
          2-d OLAP cube operator; Fills all cells of the given vectors into the given histogram.
static DoubleIHistogram3D cube(DoubleMatrix1D x, DoubleMatrix1D y, DoubleMatrix1D z, DoubleMatrix1D weights)
          3-d OLAP cube operator; Fills all cells of the given vectors into the given histogram.
static void demo1()
          Demonstrates usage of this class.
static void demo2(int rows, int columns, boolean print)
          Demonstrates usage of this class.
static void demo3(DoubleStatistic.VectorVectorFunction norm)
          Demonstrates usage of this class.
static DoubleMatrix2D distance(DoubleMatrix2D matrix, DoubleStatistic.VectorVectorFunction distanceFunction)
          Constructs and returns the distance matrix of the given matrix.
static DoubleIHistogram1D[][] histogram(DoubleIHistogram1D[][] histo, DoubleMatrix2D matrix, int m, int n)
          Splits the given matrix into m x n pieces and computes 1D histogram of each piece.
static DoubleIHistogram1D histogram(DoubleIHistogram1D histo, DoubleMatrix1D vector)
          Fills all cells of the given vector into the given histogram.
static DoubleIHistogram1D histogram(DoubleIHistogram1D histo, DoubleMatrix2D matrix)
          Fills all cells of the given matrix into the given histogram.
static DoubleIHistogram2D histogram(DoubleIHistogram2D histo, DoubleMatrix1D x, DoubleMatrix1D y)
          Fills all cells of the given vectors into the given histogram.
static DoubleIHistogram2D histogram(DoubleIHistogram2D histo, DoubleMatrix1D x, DoubleMatrix1D y, DoubleMatrix1D weights)
          Fills all cells of the given vectors into the given histogram.
static DoubleIHistogram3D histogram(DoubleIHistogram3D histo, DoubleMatrix1D x, DoubleMatrix1D y, DoubleMatrix1D z, DoubleMatrix1D weights)
          Fills all cells of the given vectors into the given histogram.
static void main(String[] args)
          Benchmarks covariance computation.
static DoubleMatrix1D viewSample(DoubleMatrix1D matrix, double fraction, DoubleRandomEngine randomGenerator)
          Constructs and returns a sampling view with a size of round(matrix.size() * fraction).
static DoubleMatrix2D viewSample(DoubleMatrix2D matrix, double rowFraction, double columnFraction, DoubleRandomEngine randomGenerator)
          Constructs and returns a sampling view with round(matrix.rows() * rowFraction) rows and round(matrix.columns() * columnFraction) columns.
static DoubleMatrix3D viewSample(DoubleMatrix3D matrix, double sliceFraction, double rowFraction, double columnFraction, DoubleRandomEngine randomGenerator)
          Constructs and returns a sampling view with round(matrix.slices() * sliceFraction) slices and round(matrix.rows() * rowFraction) rows and round(matrix.columns() * columnFraction) columns.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EUCLID

public static final DoubleStatistic.VectorVectorFunction EUCLID
Euclidean distance function; Sqrt(Sum( (x[i]-y[i])^2 )).


BRAY_CURTIS

public static final DoubleStatistic.VectorVectorFunction BRAY_CURTIS
Bray-Curtis distance function; Sum( abs(x[i]-y[i]) ) / Sum( x[i]+y[i] ).


CANBERRA

public static final DoubleStatistic.VectorVectorFunction CANBERRA
Canberra distance function; Sum( abs(x[i]-y[i]) / abs(x[i]+y[i]) ).


MAXIMUM

public static final DoubleStatistic.VectorVectorFunction MAXIMUM
Maximum distance function; Max( abs(x[i]-y[i]) ).


MANHATTAN

public static final DoubleStatistic.VectorVectorFunction MANHATTAN
Manhattan distance function; Sum( abs(x[i]-y[i]) ).

Method Detail

aggregate

public static DoubleMatrix2D aggregate(DoubleMatrix2D matrix,
                                       DoubleBinFunction1D[] aggr,
                                       DoubleMatrix2D result)
Applies the given aggregation functions to each column and stores the results in a the result matrix. If matrix has shape m x n, then result must have shape aggr.length x n. Tip: To do aggregations on rows use dice views (transpositions), as in aggregate(matrix.viewDice(),aggr,result.viewDice()).

Parameters:
matrix - any matrix; a column holds the values of a given variable.
aggr - the aggregation functions to be applied to each column.
result - the matrix to hold the aggregation results.
Returns:
result (for convenience only).
See Also:
DoubleFormatter, DoubleBinFunction1D, DoubleBinFunctions1D

bin

public static DynamicDoubleBin1D bin(DoubleMatrix1D vector)
Fills all cell values of the given vector into a bin from which statistics measures can be retrieved efficiently. Cells values are copied.
Tip: Use System.out.println(bin(vector)) to print most measures computed by the bin. Example:
         Size: 20000
         Sum: 299858.02350278624
         SumOfSquares: 5399184.154095971
         Min: 0.8639113139711261
         Max: 59.75331890541892
         Mean: 14.992901175139313
         RMS: 16.43043540825375
         Variance: 45.17438077634358
         Standard deviation: 6.721188940681818
         Standard error: 0.04752598277592142
         Geometric mean: 13.516615397064466
         Product: Infinity
         Harmonic mean: 11.995174297952191
         Sum of inversions: 1667.337172700724
         Skew: 0.8922838940067878
         Kurtosis: 1.1915828121825598
         Sum of powers(3): 1.1345828465808412E8
         Sum of powers(4): 2.7251055344494686E9
         Sum of powers(5): 7.367125643433887E10
         Sum of powers(6): 2.215370909100143E12
         Moment(0,0): 1.0
         Moment(1,0): 14.992901175139313
         Moment(2,0): 269.95920770479853
         Moment(3,0): 5672.914232904206
         Moment(4,0): 136255.27672247344
         Moment(5,0): 3683562.8217169433
         Moment(6,0): 1.1076854545500715E8
         Moment(0,mean()): 1.0
         Moment(1,mean()): -2.0806734113421045E-14
         Moment(2,mean()): 45.172122057305664
         Moment(3,mean()): 270.92018671421
         Moment(4,mean()): 8553.8664869067
         Moment(5,mean()): 153357.41712233616
         Moment(6,mean()): 4273757.570142922
         25%, 50% and 75% Quantiles: 10.030074811938091, 13.977982089912224,
         18.86124362967137
         quantileInverse(mean): 0.559163335012079
         Distinct elements & frequencies not printed (too many).
 
 

Parameters:
vector - the vector to analyze.
Returns:
a bin holding the statistics measures of the vector.

correlation

public static DoubleMatrix2D correlation(DoubleMatrix2D covariance)
Modifies the given covariance matrix to be a correlation matrix (in-place). The correlation matrix is a square, symmetric matrix consisting of nothing but correlation coefficients. The rows and the columns represent the variables, the cells represent correlation coefficients. The diagonal cells (i.e. the correlation between a variable and itself) will equal 1, for the simple reason that the correlation coefficient of a variable with itself equals 1. The correlation of two column vectors x and y is given by corr(x,y) = cov(x,y) / (stdDev(x)*stdDev(y)) (Pearson's correlation coefficient). A correlation coefficient varies between -1 (for a perfect negative relationship) to +1 (for a perfect positive relationship). See the math definition and another def. Compares two column vectors at a time. Use dice views to compare two row vectors at a time.

Parameters:
covariance - a covariance matrix, as, for example, returned by method covariance(DoubleMatrix2D).
Returns:
the modified covariance, now correlation matrix (for convenience only).

covariance

public static DoubleMatrix2D covariance(DoubleMatrix2D matrix)
Constructs and returns the covariance matrix of the given matrix. The covariance matrix is a square, symmetric matrix consisting of nothing but covariance coefficients. The rows and the columns represent the variables, the cells represent covariance coefficients. The diagonal cells (i.e. the covariance between a variable and itself) will equal the variances. The covariance of two column vectors x and y is given by cov(x,y) = (1/n) * Sum((x[i]-mean(x)) * (y[i]-mean(y))). See the math definition. Compares two column vectors at a time. Use dice views to compare two row vectors at a time.

Parameters:
matrix - any matrix; a column holds the values of a given variable.
Returns:
the covariance matrix (n x n, n=matrix.columns).

cube

public static DoubleIHistogram2D cube(DoubleMatrix1D x,
                                      DoubleMatrix1D y,
                                      DoubleMatrix1D weights)
2-d OLAP cube operator; Fills all cells of the given vectors into the given histogram. If you use hep.aida.ref.Converter.toString(histo) on the result, the OLAP cube of x-"column" vs. y-"column" , summing the weights "column" will be printed. For example, aggregate sales by product by region.

Computes the distinct values of x and y, yielding histogram axes that capture one distinct value per bin. Then fills the histogram.

Example output:

         Cube:
            Entries=5000, ExtraEntries=0
            MeanX=4.9838, RmsX=NaN
            MeanY=2.5304, RmsY=NaN
            xAxis: Min=0, Max=10, Bins=11
            yAxis: Min=0, Max=5, Bins=6
         Heights:
               | X
               | 0   1   2   3   4   5   6   7   8   9   10  | Sum 
         ----------------------------------------------------------
         Y 5   |  30  53  51  52  57  39  65  61  55  49  22 |  534
           4   |  43 106 112  96  92  94 107  98  98 110  47 | 1003
           3   |  39 134  87  93 102 103 110  90 114  98  51 | 1021
           2   |  44  81 113  96 101  86 109  83 111  93  42 |  959
           1   |  54  94 103  99 115  92  98  97 103  90  44 |  989
           0   |  24  54  52  44  42  56  46  47  56  53  20 |  494
         ----------------------------------------------------------
           Sum | 234 522 518 480 509 470 535 476 537 493 226 |
 
 

Returns:
the histogram containing the cube.
Throws:
IllegalArgumentException - if x.size() != y.size() || y.size() != weights.size().

cube

public static DoubleIHistogram3D cube(DoubleMatrix1D x,
                                      DoubleMatrix1D y,
                                      DoubleMatrix1D z,
                                      DoubleMatrix1D weights)
3-d OLAP cube operator; Fills all cells of the given vectors into the given histogram. If you use hep.aida.ref.Converter.toString(histo) on the result, the OLAP cube of x-"column" vs. y-"column" vs. z-"column", summing the weights "column" will be printed. For example, aggregate sales by product by region by time.

Computes the distinct values of x and y and z, yielding histogram axes that capture one distinct value per bin. Then fills the histogram.

Returns:
the histogram containing the cube.
Throws:
IllegalArgumentException - if x.size() != y.size() || x.size() != z.size() || x.size() != weights.size() .

demo1

public static void demo1()
Demonstrates usage of this class.


demo2

public static void demo2(int rows,
                         int columns,
                         boolean print)
Demonstrates usage of this class.


demo3

public static void demo3(DoubleStatistic.VectorVectorFunction norm)
Demonstrates usage of this class.


distance

public static DoubleMatrix2D distance(DoubleMatrix2D matrix,
                                      DoubleStatistic.VectorVectorFunction distanceFunction)
Constructs and returns the distance matrix of the given matrix. The distance matrix is a square, symmetric matrix consisting of nothing but distance coefficients. The rows and the columns represent the variables, the cells represent distance coefficients. The diagonal cells (i.e. the distance between a variable and itself) will be zero. Compares two column vectors at a time. Use dice views to compare two row vectors at a time.

Parameters:
matrix - any matrix; a column holds the values of a given variable (vector).
distanceFunction - (EUCLID, CANBERRA, ..., or any user defined distance function operating on two vectors).
Returns:
the distance matrix (n x n, n=matrix.columns).

histogram

public static DoubleIHistogram1D histogram(DoubleIHistogram1D histo,
                                           DoubleMatrix1D vector)
Fills all cells of the given vector into the given histogram.

Returns:
histo (for convenience only).

histogram

public static DoubleIHistogram1D histogram(DoubleIHistogram1D histo,
                                           DoubleMatrix2D matrix)
Fills all cells of the given matrix into the given histogram.

Returns:
histo (for convenience only).

histogram

public static DoubleIHistogram1D[][] histogram(DoubleIHistogram1D[][] histo,
                                               DoubleMatrix2D matrix,
                                               int m,
                                               int n)
Splits the given matrix into m x n pieces and computes 1D histogram of each piece.

Returns:
histo (for convenience only).

histogram

public static DoubleIHistogram2D histogram(DoubleIHistogram2D histo,
                                           DoubleMatrix1D x,
                                           DoubleMatrix1D y)
Fills all cells of the given vectors into the given histogram.

Returns:
histo (for convenience only).
Throws:
IllegalArgumentException - if x.size() != y.size().

histogram

public static DoubleIHistogram2D histogram(DoubleIHistogram2D histo,
                                           DoubleMatrix1D x,
                                           DoubleMatrix1D y,
                                           DoubleMatrix1D weights)
Fills all cells of the given vectors into the given histogram.

Returns:
histo (for convenience only).
Throws:
IllegalArgumentException - if x.size() != y.size() || y.size() != weights.size().

histogram

public static DoubleIHistogram3D histogram(DoubleIHistogram3D histo,
                                           DoubleMatrix1D x,
                                           DoubleMatrix1D y,
                                           DoubleMatrix1D z,
                                           DoubleMatrix1D weights)
Fills all cells of the given vectors into the given histogram.

Returns:
histo (for convenience only).
Throws:
IllegalArgumentException - if x.size() != y.size() || x.size() != z.size() || x.size() != weights.size() .

main

public static void main(String[] args)
Benchmarks covariance computation.


viewSample

public static DoubleMatrix1D viewSample(DoubleMatrix1D matrix,
                                        double fraction,
                                        DoubleRandomEngine randomGenerator)
Constructs and returns a sampling view with a size of round(matrix.size() * fraction). Samples "without replacement" from the uniform distribution.

Parameters:
matrix - any matrix.
fraction - the percentage to be included in the view.
randomGenerator - a uniform random number generator; set this parameter to null to use a default generator seeded with the current time.
Returns:
the sampling view.
Throws:
IllegalArgumentException - if ! (0 <= rowFraction <= 1 && 0 <= columnFraction <= 1) .
See Also:
DoubleRandomSampler

viewSample

public static DoubleMatrix2D viewSample(DoubleMatrix2D matrix,
                                        double rowFraction,
                                        double columnFraction,
                                        DoubleRandomEngine randomGenerator)
Constructs and returns a sampling view with round(matrix.rows() * rowFraction) rows and round(matrix.columns() * columnFraction) columns. Samples "without replacement". Rows and columns are randomly chosen from the uniform distribution. Examples:
matrix
rowFraction=0.2
columnFraction=0.2
rowFraction=0.2
columnFraction=1.0
rowFraction=1.0
columnFraction=0.2
10 x 10 matrix
 1  2  3  4  5  6  7  8  9  10
11 12 13 14 15 16 17 18 19  20
21 22 23 24 25 26 27 28 29  30
31 32 33 34 35 36 37 38 39  40
41 42 43 44 45 46 47 48 49  50
51 52 53 54 55 56 57 58 59  60
61 62 63 64 65 66 67 68 69  70
71 72 73 74 75 76 77 78 79  80
81 82 83 84 85 86 87 88 89  90
91 92 93 94 95 96 97 98 99 100
2 x 2 matrix
43 50
53 60
2 x 10 matrix
41 42 43 44 45 46 47 48 49  50
91 92 93 94 95 96 97 98 99 100
10 x 2 matrix
 4  8
14 18
24 28
34 38
44 48
54 58
64 68
74 78
84 88
94 98

Parameters:
matrix - any matrix.
rowFraction - the percentage of rows to be included in the view.
columnFraction - the percentage of columns to be included in the view.
randomGenerator - a uniform random number generator; set this parameter to null to use a default generator seeded with the current time.
Returns:
the sampling view.
Throws:
IllegalArgumentException - if ! (0 <= rowFraction <= 1 && 0 <= columnFraction <= 1) .
See Also:
DoubleRandomSampler

viewSample

public static DoubleMatrix3D viewSample(DoubleMatrix3D matrix,
                                        double sliceFraction,
                                        double rowFraction,
                                        double columnFraction,
                                        DoubleRandomEngine randomGenerator)
Constructs and returns a sampling view with round(matrix.slices() * sliceFraction) slices and round(matrix.rows() * rowFraction) rows and round(matrix.columns() * columnFraction) columns. Samples "without replacement". Slices, rows and columns are randomly chosen from the uniform distribution.

Parameters:
matrix - any matrix.
sliceFraction - the percentage of slices to be included in the view.
rowFraction - the percentage of rows to be included in the view.
columnFraction - the percentage of columns to be included in the view.
randomGenerator - a uniform random number generator; set this parameter to null to use a default generator seeded with the current time.
Returns:
the sampling view.
Throws:
IllegalArgumentException - if ! (0 <= sliceFraction <= 1 && 0 <= rowFraction <= 1 && 0 <= columnFraction <= 1) .
See Also:
DoubleRandomSampler

Parallel Colt 0.7.2

Jump to the Parallel Colt Homepage