DynamicFloatBin1D (Parallel Colt 0.7.2

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

Parallel Colt 0.7.2

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

hep.aida.tfloat.bin
Class DynamicFloatBin1D

java.lang.Object
  cern.colt.PersistentObject
      hep.aida.tfloat.bin.AbstractFloatBin
          hep.aida.tfloat.bin.AbstractFloatBin1D
              hep.aida.tfloat.bin.StaticFloatBin1D
                  hep.aida.tfloat.bin.MightyStaticFloatBin1D
                      hep.aida.tfloat.bin.QuantileFloatBin1D
                          hep.aida.tfloat.bin.DynamicFloatBin1D

All Implemented Interfaces:: FloatBufferConsumer, Serializable, Cloneable

public class DynamicFloatBin1D
extends QuantileFloatBin1D
extends QuantileFloatBin1D

1-dimensional rebinnable bin holding float elements; Efficiently computes advanced statistics of data sequences. Technically speaking, a multiset (or bag) with efficient statistics operations defined upon. First see the package summary and javadoc tree view to get the broad picture.

The data filled into a DynamicBin1D is internally preserved in the bin. As a consequence this bin can compute more than only basic statistics. On the other hand side, if you add huge amounts of elements, you may run out of memory (each element takes 8 bytes). If this drawbacks matter, consider to use StaticFloatBin1D, which overcomes them at the expense of limited functionality.

This class is fully thread safe (all public methods are synchronized). Thus, you can have one or more threads adding to the bin as well as one or more threads reading and viewing the statistics of the bin while it is filled. For high performance, add data in large chunks (buffers) via method addAllOf rather than piecewise via method add.

If your favourite statistics measure is not directly provided by this class, check out FloatDescriptive in combination with methods elements() and sortedElements().

Implementation: Lazy evaluation, caching, incremental maintainance.

Version:: 0.9, 03-Jul-99
Author:: wolfgang.hoschek@cern.ch
See Also:: FloatDescriptive, Serialized Form

Field Summary

Fields inherited from class cern.colt.PersistentObject
`serialVersionUID`

Constructor Summary
`DynamicFloatBin1D()` Constructs and returns an empty bin; implicitly calls `setFixedOrder(false)`.

Method Summary
`void`	`add(float element)` Adds the specified element to the receiver.
`void`	`addAllOfFromTo(FloatArrayList list, int from, int to)` Adds the part of the specified list between indexes `from` (inclusive) and `to` (inclusive) to the receiver.
`float`	`aggregate(FloatFloatFunction aggr, FloatFunction f)` Applies a function to each element and aggregates the results.
`void`	`clear()` Removes all elements from the receiver.
`Object`	`clone()` Returns a deep copy of the receiver.
`float`	`correlation(DynamicFloatBin1D other)` Returns the correlation of two bins, which is `corr(x,y) = covariance(x,y) / (stdDev(x)*stdDev(y))` (Pearson's correlation coefficient).
`float`	`covariance(DynamicFloatBin1D other)` Returns the covariance of two bins, which is `cov(x,y) = (1/size()) * Sum((x[i]-mean(x)) * (y[i]-mean(y)))`.
`FloatArrayList`	`elements()` Returns a copy of the currently stored elements.
`boolean`	`equals(Object object)` Returns whether two bins are equal.
`void`	`frequencies(FloatArrayList distinctElements, IntArrayList frequencies)` Computes the frequency (number of occurances, count) of each distinct element.
`int`	`getMaxOrderForSumOfPowers()` Returns `Integer.MAX_VALUE`, the maximum order `k` for which sums of powers are retrievable.
`int`	`getMinOrderForSumOfPowers()` Returns `Integer.MIN_VALUE`, the minimum order `k` for which sums of powers are retrievable.
`boolean`	`isRebinnable()` Returns `true`.
`float`	`max()` Returns the maximum.
`float`	`min()` Returns the minimum.
`float`	`moment(int k, float c)` Returns the moment of `k`-th order with value `c`, which is `Sum( (x[i]-c)^k ) / size()`.
`float`	`quantile(float phi)` Returns the exact `phi-`quantile; that is, the smallest contained element `elem` for which holds that `phi` percent of elements are less than `elem`.
`float`	`quantileInverse(float element)` Returns exactly how many percent of the elements contained in the receiver are `<= element`.
`FloatArrayList`	`quantiles(FloatArrayList percentages)` Returns the exact quantiles of the specified percentages.
`boolean`	`removeAllOf(FloatArrayList list)` Removes from the receiver all elements that are contained in the specified list.
`void`	`sample(int n, boolean withReplacement, FloatRandomEngine randomGenerator, FloatBuffer buffer)` Uniformly samples (chooses) `n` random elements with or without replacement from the contained elements and adds them to the given buffer.
`DynamicFloatBin1D`	`sampleBootstrap(DynamicFloatBin1D other, int resamples, FloatRandomEngine randomGenerator, FloatBinBinFunction1D function)` Generic bootstrap resampling.
`void`	`setFixedOrder(boolean fixedOrder)` Determines whether the receivers internally preserved elements may be reordered or not.
`int`	`size()` Returns the number of elements contained in the receiver.
`FloatArrayList`	`sortedElements()` Returns a copy of the currently stored elements, sorted ascending.
`void`	`standardize(float mean, float standardDeviation)` Modifies the receiver to be standardized.
`float`	`sum()` Returns the sum of all elements, which is `Sum( x[i] )`.
`float`	`sumOfInversions()` Returns the sum of inversions, which is `Sum( 1 / x[i] )`.
`float`	`sumOfLogarithms()` Returns the sum of logarithms, which is `Sum( Log(x[i]) )`.
`float`	`sumOfPowers(int k)` Returns the `k-th` order sum of powers, which is `Sum( x[i]^k )`.
`float`	`sumOfSquares()` Returns the sum of squares, which is `Sum( x[i] * x[i] )`.
`String`	`toString()` Returns a String representation of the receiver.
`void`	`trim(int s, int l)` Removes the `s` smallest and `l` largest elements from the receiver.
`float`	`trimmedMean(int s, int l)` Returns the trimmed mean.
`void`	`trimToSize()` Trims the capacity of the receiver to be the receiver's current size.

Methods inherited from class hep.aida.tfloat.bin.QuantileFloatBin1D
`compareWith, median, sizeOfRange, splitApproximately, splitApproximately`

Methods inherited from class hep.aida.tfloat.bin.MightyStaticFloatBin1D
`geometricMean, harmonicMean, hasSumOfInversions, hasSumOfLogarithms, hasSumOfPowers, kurtosis, product, skew`

Methods inherited from class hep.aida.tfloat.bin.AbstractFloatBin1D
`addAllOf, buffered, mean, rms, standardDeviation, standardError, variance`

Methods inherited from class hep.aida.tfloat.bin.AbstractFloatBin
`center, center, error, error, offset, offset, value, value`

Methods inherited from class java.lang.Object
`getClass, hashCode, notify, notifyAll, wait, wait, wait`

Constructor Detail

DynamicFloatBin1D

public DynamicFloatBin1D()

Constructs and returns an empty bin; implicitly calls setFixedOrder(false).

Method Detail

add

public void add(float element)

Adds the specified element to the receiver.

Overrides:: add in class StaticFloatBin1D

Parameters:: element - element to be appended.

addAllOfFromTo

public void addAllOfFromTo(FloatArrayList list,
                           int from,
                           int to)

Adds the part of the specified list between indexes from (inclusive) and to (inclusive) to the receiver.

Overrides:: addAllOfFromTo in class QuantileFloatBin1D

Parameters:: list - the list of which elements shall be added.; from - the index of the first element to be added (inclusive).; to - the index of the last element to be added (inclusive).
Throws:: IndexOutOfBoundsException - if list.size()>0 && (from<0 || from>to || to>=list.size()) .

aggregate

public float aggregate(FloatFloatFunction aggr,
                       FloatFunction f)

Applies a function to each element and aggregates the results. Returns a value v such that v==a(size()) where a(i) == aggr( a(i-1), f(x(i)) ) and terminators are a(1) == f(x(0)), a(0)==Float.NaN.

Example:

         cern.jet.math.Functions F = cern.jet.math.Functions.functions;
         bin = 0 1 2 3 
 
         // Sum( x[i]*x[i] ) 
         bin.aggregate(F.plus,F.square);
         --> 14

For further examples, see the package doc.

Parameters:: aggr - an aggregation function taking as first argument the current aggregation and as second argument the transformed current element.; f - a function transforming the current element.
Returns:: the aggregated measure.
See Also:: FloatFunctions

clear

public void clear()

Removes all elements from the receiver. The receiver will be empty after this call returns.

Overrides:: clear in class QuantileFloatBin1D

clone

public Object clone()

Returns a deep copy of the receiver.

Overrides:: clone in class QuantileFloatBin1D

Returns:: a deep copy of the receiver.

correlation

public float correlation(DynamicFloatBin1D other)

Returns the correlation of two bins, which is corr(x,y) = covariance(x,y) / (stdDev(x)*stdDev(y)) (Pearson's correlation coefficient). A correlation coefficient varies between -1 (for a perfect negative relationship) to +1 (for a perfect positive relationship). See the math definition and another def.

Parameters:: other - the bin to compare with.
Returns:: the correlation.
Throws:: IllegalArgumentException - if size() != other.size().

covariance

public float covariance(DynamicFloatBin1D other)

Returns the covariance of two bins, which is cov(x,y) = (1/size()) * Sum((x[i]-mean(x)) * (y[i]-mean(y))). See the math definition.

Parameters:: other - the bin to compare with.
Returns:: the covariance.
Throws:: IllegalArgumentException - if size() != other.size().

elements

public FloatArrayList elements()

Returns a copy of the currently stored elements. Concerning the order in which elements are returned, see setFixedOrder(boolean).

Returns:: a copy of the currently stored elements.

equals

public boolean equals(Object object)

Returns whether two bins are equal. They are equal if the other object is of the same class or a subclass of this class and both have the same size, minimum, maximum, sum and sumOfSquares and have the same elements, order being irrelevant (multiset equality).

Definition of Equality for multisets: A,B are equal <=> A is a superset of B and B is a superset of A. (Elements must occur the same number of times, order is irrelevant.)

Overrides:: equals in class AbstractFloatBin1D

frequencies

public void frequencies(FloatArrayList distinctElements,
                        IntArrayList frequencies)

Computes the frequency (number of occurances, count) of each distinct element. After this call returns both distinctElements and frequencies have a new size (which is equal for both), which is the number of distinct elements currently contained.

Distinct elements are filled into distinctElements, starting at index 0. The frequency of each distinct element is filled into frequencies, starting at index 0. Further, both distinctElements and frequencies are sorted ascending by "element" (in sync, of course). As a result, the smallest distinct element (and its frequency) can be found at index 0, the second smallest distinct element (and its frequency) at index 1, ..., the largest distinct element (and its frequency) at index distinctElements.size()-1.

Example:
elements = (8,7,6,6,7) --> distinctElements = (6,7,8), frequencies = (2,2,1)

Parameters:: distinctElements - a list to be filled with the distinct elements; can have any size.; frequencies - a list to be filled with the frequencies; can have any size; set this parameter to null to ignore it.

getMaxOrderForSumOfPowers

public int getMaxOrderForSumOfPowers()

Returns Integer.MAX_VALUE, the maximum order k for which sums of powers are retrievable.

Overrides:: getMaxOrderForSumOfPowers in class MightyStaticFloatBin1D

See Also:: MightyStaticFloatBin1D.hasSumOfPowers(int), sumOfPowers(int)

getMinOrderForSumOfPowers

public int getMinOrderForSumOfPowers()

Returns Integer.MIN_VALUE, the minimum order k for which sums of powers are retrievable.

Overrides:: getMinOrderForSumOfPowers in class MightyStaticFloatBin1D

See Also:: MightyStaticFloatBin1D.hasSumOfPowers(int), sumOfPowers(int)

isRebinnable

public boolean isRebinnable()

Returns true. Returns whether a client can obtain all elements added to the receiver. In other words, tells whether the receiver internally preserves all added elements. If the receiver is rebinnable, the elements can be obtained via elements() methods.

Overrides:: isRebinnable in class StaticFloatBin1D

max

public float max()

Returns the maximum.

Overrides:: max in class StaticFloatBin1D

min

public float min()

Returns the minimum.

Overrides:: min in class StaticFloatBin1D

moment

public float moment(int k,
                    float c)

Returns the moment of k-th order with value c, which is Sum( (x[i]-c)^k ) / size().

Overrides:: moment in class MightyStaticFloatBin1D

Parameters:: k - the order; any number - can be less than zero, zero or greater than zero.; c - any number.
Returns:: Float.NaN if !hasSumOfPower(k).

quantile

public float quantile(float phi)

Returns the exact phi-quantile; that is, the smallest contained element elem for which holds that phi percent of elements are less than elem.

Overrides:: quantile in class QuantileFloatBin1D

Parameters:: phi - must satisfy 0 < phi < 1.
Returns:: the phi quantile element.

quantileInverse

public float quantileInverse(float element)

Returns exactly how many percent of the elements contained in the receiver are <= element. Does linear interpolation if the element is not contained but lies in between two contained elements.

Overrides:: quantileInverse in class QuantileFloatBin1D

Parameters:: element - the element to search for.
Returns:: the exact percentage phi of elements <= element (0.0 <= phi <= 1.0).

quantiles

public FloatArrayList quantiles(FloatArrayList percentages)

Returns the exact quantiles of the specified percentages.

Overrides:: quantiles in class QuantileFloatBin1D

Parameters:: percentages - the percentages for which quantiles are to be computed. Each percentage must be in the interval (0.0,1.0]. percentages must be sorted ascending.
Returns:: the exact quantiles.

removeAllOf

public boolean removeAllOf(FloatArrayList list)

Removes from the receiver all elements that are contained in the specified list.

Parameters:: list - the elements to be removed.
Returns:: true if the receiver changed as a result of the call.

sample

public void sample(int n,
                   boolean withReplacement,
                   FloatRandomEngine randomGenerator,
                   FloatBuffer buffer)

Uniformly samples (chooses) n random elements with or without replacement from the contained elements and adds them to the given buffer. If the buffer is connected to a bin, the effect is that the chosen elements are added to the bin connected to the buffer. Also see buffered.

Parameters:: n - the number of elements to choose.; withReplacement - true samples with replacement, otherwise samples without replacement.; randomGenerator - a random number generator. Set this parameter to null to use a default random number generator seeded with the current time.; buffer - the buffer to which chosen elements will be added.
Throws:: IllegalArgumentException - if !withReplacement && n > size().
See Also:: cern.jet.random.tfloat.sampling

sampleBootstrap

public DynamicFloatBin1D sampleBootstrap(DynamicFloatBin1D other,
                                         int resamples,
                                         FloatRandomEngine randomGenerator,
                                         FloatBinBinFunction1D function)

Generic bootstrap resampling. Quite optimized - Don't be afraid to try it. Executes resamples resampling steps. In each resampling step does the following:

Uniformly samples (chooses) size() random elements with replacement from this and fills them into an auxiliary bin b1.
Uniformly samples (chooses) other.size() random elements with replacement from other and fills them into another auxiliary bin b2.
Executes the comparison function function on both auxiliary bins (function.apply(b1,b2)) and adds the result of the function to an auxiliary bootstrap bin b3.

Finally returns the auxiliary bootstrap bin b3 from which the measure of interest can be read off.

Background:

Also see a more in-depth discussion on bootstrapping and related randomization methods. The classical statistical test for comparing the means of two samples is the t-test. Unfortunately, this test assumes that the two samples each come from a normal distribution and that these distributions have the same standard deviation. Quite often, however, data has a distribution that is non-normal in many ways. In particular, distributions are often unsymmetric. For such data, the t-test may produce misleading results and should thus not be used. Sometimes asymmetric data can be transformed into normally distributed data by taking e.g. the logarithm and the t-test will then produce valid results, but this still requires postulation of a certain distribution underlying the data, which is often not warranted, because too little is known about the data composition.

Bootstrap resampling of means differences (and other differences) is a robust replacement for the t-test and does not require assumptions about the actual distribution of the data. The idea of bootstrapping is quite simple: simulation. The only assumption required is that the two samples a and b are representative for the underlying distribution with respect to the statistic that is being tested - this assumption is of course implicit in all statistical tests. We can now generate lots of further samples that correspond to the two given ones, by sampling with replacement. This process is called resampling. A resample can (and usually will) have a different mean than the original one and by drawing hundreds or thousands of such resamples a_r from a and b_r from b we can compute the so-called bootstrap distribution of all the differences "mean of a_r minus mean of b_r". That is, a bootstrap bin filled with the differences. Now we can compute, what fraction of these differences is, say, greater than zero. Let's assume we have computed 1000 resamples of both a and b and found that only 8 of the differences were greater than zero. Then 8/1000 or 0.008 is the p-value (probability) for the hypothesis that the mean of the distribution underlying a is actually larger than the mean of the distribution underlying b. From this bootstrap test, we can clearly reject the hypothesis.

Instead of using means differences, we can also use other differences, for example, the median differences.

Instead of p-values we can also read arbitrary confidence intervals from the bootstrap bin. For example, 90% of all bootstrap differences are left of the value -3.5, hence a left 90% confidence interval for the difference would be (3.5,infinity); in other words: the difference is 3.5 or larger with probability 0.9.

Sometimes we would like to compare not only means and medians, but also the variability (spread) of two samples. The conventional method of doing this is the F-test, which compares the standard deviations. It is related to the t-test and, like the latter, assumes the two samples to come from a normal distribution. The F-test is very sensitive to data with deviations from normality. Instead we can again resort to more robust bootstrap resampling and compare a measure of spread, for example the inter-quartile range. This way we compute a bootstrap resampling of inter-quartile range differences in order to arrive at a test for inequality or variability.

Example:

         // v1,v2 - the two samples to compare against each other
         float[] v1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9,10,  21,  22,23,24,25,26,27,28,29,30,31};
         float[] v2 = {10,11,12,13,14,15,16,17,18,19,  20,  30,31,32,33,34,35,36,37,38,39};
         hep.aida.bin.DynamicBin1D X = new hep.aida.bin.DynamicBin1D();
         hep.aida.bin.DynamicBin1D Y = new hep.aida.bin.DynamicBin1D();
         X.addAllOf(new cern.colt.list.FloatArrayList(v1));
         Y.addAllOf(new cern.colt.list.FloatArrayList(v2));
         cern.jet.random.engine.RandomEngine random = new cern.jet.random.engine.MersenneTwister();
 
         // bootstrap resampling of differences of means:
         BinBinFunction1D diff = new BinBinFunction1D() {
            public float apply(DynamicBin1D x, DynamicBin1D y) {return x.mean() - y.mean();}
         };
 
         // bootstrap resampling of differences of medians:
         BinBinFunction1D diff = new BinBinFunction1D() {
            public float apply(DynamicBin1D x, DynamicBin1D y) {return x.median() - y.median();}
         };
 
         // bootstrap resampling of differences of inter-quartile ranges:
         BinBinFunction1D diff = new BinBinFunction1D() {
            public float apply(DynamicBin1D x, DynamicBin1D y) {return (x.quantile(0.75)-x.quantile(0.25)) - (y.quantile(0.75)-y.quantile(0.25)); }
         };
 
         DynamicBin1D boot = X.sampleBootstrap(Y,1000,random,diff);
 
         cern.jet.math.Functions F = cern.jet.math.Functions.functions;
         System.out.println("p-value="+ (boot.aggregate(F.plus, F.greater(0)) / boot.size()));
         System.out.println("left 90% confidence interval = ("+boot.quantile(0.9) + ",infinity)");
 
         -->
         // bootstrap resampling of differences of means:
         p-value=0.0080
         left 90% confidence interval = (-3.571428571428573,infinity)
 
         // bootstrap resampling of differences of medians:
         p-value=0.36
         left 90% confidence interval = (5.0,infinity)
 
         // bootstrap resampling of differences of inter-quartile ranges:
         p-value=0.5699
         left 90% confidence interval = (5.0,infinity)

Parameters:: other - the other bin to compare the receiver against.; resamples - the number of times resampling shall be done.; randomGenerator - a random number generator. Set this parameter to null to use a default random number generator seeded with the current time.; function - a difference function comparing two samples; takes as first argument a sample of this and as second argument a sample of other.
Returns:: a bootstrap bin holding the results of function of each resampling step.
See Also:: GenericPermuting.permutation(long,int)

setFixedOrder

public void setFixedOrder(boolean fixedOrder)

Determines whether the receivers internally preserved elements may be reordered or not.

fixedOrder==false allows the order in which elements are returned by method elements() to be different from the order in which elements are added.
fixedOrder==true guarantees that under all circumstances the order in which elements are returned by method elements() is identical to the order in which elements are added. However, the latter consumes twice as much memory if operations involving sorting are requested. This option is usually only required if a 2-dimensional bin, formed by two 1-dimensional bins, needs to be rebinnable.

Naturally, if fixedOrder is set to true you should not already have added elements to the receiver; it should be empty.

size

public int size()

Returns the number of elements contained in the receiver.

Overrides:: size in class StaticFloatBin1D

Returns:: the number of elements contained in the receiver.

sortedElements

public FloatArrayList sortedElements()

Returns a copy of the currently stored elements, sorted ascending. Concerning the memory required for operations involving sorting, see setFixedOrder(boolean).

Returns:: a copy of the currently stored elements, sorted ascending.

standardize

public void standardize(float mean,
                        float standardDeviation)

Modifies the receiver to be standardized. Changes each element x[i] as follows: x[i] = (x[i]-mean)/standardDeviation.

sum

public float sum()

Returns the sum of all elements, which is Sum( x[i] ).

Overrides:: sum in class StaticFloatBin1D

sumOfInversions

public float sumOfInversions()

Returns the sum of inversions, which is Sum( 1 / x[i] ).

Overrides:: sumOfInversions in class MightyStaticFloatBin1D

Returns:: the sum of inversions; Float.NaN if !hasSumOfInversions().
See Also:: MightyStaticFloatBin1D.hasSumOfInversions()

sumOfLogarithms

public float sumOfLogarithms()

Returns the sum of logarithms, which is Sum( Log(x[i]) ).

Overrides:: sumOfLogarithms in class MightyStaticFloatBin1D

Returns:: the sum of logarithms; Float.NaN if !hasSumOfLogarithms().
See Also:: MightyStaticFloatBin1D.hasSumOfLogarithms()

sumOfPowers

public float sumOfPowers(int k)

Returns the k-th order sum of powers, which is Sum( x[i]^k ).

Overrides:: sumOfPowers in class MightyStaticFloatBin1D

Parameters:: k - the order of the powers.
Returns:: the sum of powers.
See Also:: MightyStaticFloatBin1D.hasSumOfPowers(int)

sumOfSquares

public float sumOfSquares()

Returns the sum of squares, which is Sum( x[i] * x[i] ).

Overrides:: sumOfSquares in class StaticFloatBin1D

toString

public String toString()

Returns a String representation of the receiver.

Overrides:: toString in class QuantileFloatBin1D

trim

public void trim(int s,
                 int l)

Removes the s smallest and l largest elements from the receiver. The receivers size will be reduced by s + l elements.

Parameters:: s - the number of smallest elements to trim away (s >= 0 ).; l - the number of largest elements to trim away (l >= 0).

trimmedMean

public float trimmedMean(int s,
                         int l)

Returns the trimmed mean. That is the mean of the data if the s smallest and l largest elements would be removed from the receiver (they are not removed).

Parameters:: s - the number of smallest elements to trim away (s >= 0 ).; l - the number of largest elements to trim away (l >= 0).
Returns:: the trimmed mean.

trimToSize

public void trimToSize()

Trims the capacity of the receiver to be the receiver's current size. (This has nothing to do with trimming away smallest and largest elements. The method name is used to be consistent with JDK practice.)

Releases any superfluos internal memory. An application can use this operation to minimize the storage of the receiver. Does not affect functionality.

Overrides:: trimToSize in class AbstractFloatBin1D