Performs chi-squared contingency table tests and goodness-of-fit tests.
If the optional argument :y is not provided then a goodness-of-fit test
is performed. In this case, the hypothesis tested is whether the
population probabilities equal those in :probs, or are all equal if
:probs is not given.
If :y is provided, it must be a sequence of integers that is the
same length as x. A contingency table is computed from x and :y.
Then, Pearson's chi-squared test of the null hypothesis that the joint
distribution of the cell counts in a 2-dimensional contingency
table is the product of the row and column marginals is performed.
By default the Yates' continuity correction for 2x2 contingency
tables is performed, this can be disabled by setting the :correct
option to false.
Options:
:x -- a sequence of numbers.
:y -- a sequence of numbers
:table -- a contigency table. If one dimensional, the test is a goodness-of-fit
:probs (when (nil? y) -- (repeat n-levels (/ n-levels)))
:freq (default nil) -- if given, these are rescaled to probabilities
:correct (default true) -- use Yates' correction for continuity for 2x2 contingency tables
Returns:
:X-sq -- the Pearson X-squared test statistics
:p-value -- the p-value for the test statistic
:df -- the degress of freedom
Examples:
(use '(incanter core stats))
(chisq-test :x [1 2 3 2 3 2 4 3 5]) ;; X-sq 2.6667
;; create a one-dimensional table of this data
(def table (matrix [1 3 3 1 1]))
(chisq-test :table table) ;; X-sq 2.6667
(chisq-test :table (trans table)) ;; throws exception
(chisq-test :x [1 0 0 0 1 1 1 0 0 1 0 0 1 1 1 1]) ;; 0.25
(use '(incanter core stats datasets))
(def math-prog (to-matrix (get-dataset :math-prog)))
(def x (sel math-prog :cols 1))
(def y (sel math-prog :cols 2))
(chisq-test :x x :y y) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :x x :y y :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
(def table (matrix [[31 12] [9 8]]))
(chisq-test :table table) ;; X-sq = 1.24145, df=1, p-value = 0.26519
(chisq-test :table table :correct false) ;; X-sq = 2.01094, df=1, p-value = 0.15617
;; use the detabulate function to create data rows corresponding to the table
(def detab (detabulate :table table))
(chisq-test :x (sel detab :cols 0) :y (sel detab :cols 1))
;; look at the hair-eye-color data
;; turn the count data for males into a contigency table
(def male (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16)) 4))
(chisq-test :table male) ;; X-sq = 41.280, df = 9, p-value = 4.44E-6
;; turn the count data for females into a contigency table
(def female (matrix (sel (get-dataset :hair-eye-color) :cols 3 :rows (range 16 32)) 4))
(chisq-test :table female) ;; X-sq = 106.664, df = 9, p-value = 7.014E-19,
;; supply probabilities to goodness-of-fit test
(def table [89 37 30 28 2])
(def probs [0.40 0.20 0.20 0.19 0.01])
(chisq-test :table table :probs probs) ;; X-sq = 5.7947, df = 4, p-value = 0.215
;; use frequencies instead of probabilities
(def freq [40 20 20 15 5])
(chisq-test :table table :freq freq) ;; X-sq = 9.9901, df = 4, p-value = 0.04059
References:
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm
http://en.wikipedia.org/wiki/Pearson's_chi-square_test
http://en.wikipedia.org/wiki/Yates'_chi-square_test