mars.tensor.stats.chisquare#

mars.tensor.stats.chisquare(f_obs, f_exp=None, ddof=0, axis=0)[source]#

Calculate a one-way chi-square test.

The chi-square test tests the null hypothesis that the categorical data has the given frequencies.

Parameters
  • f_obs (array_like) – Observed frequencies in each category.

  • f_exp (array_like, optional) – Expected frequencies in each category. By default the categories are assumed to be equally likely.

  • ddof (int, optional) – “Delta degrees of freedom”: adjustment to the degrees of freedom for the p-value. The p-value is computed using a chi-squared distribution with k - 1 - ddof degrees of freedom, where k is the number of observed frequencies. The default value of ddof is 0.

  • axis (int or None, optional) – The axis of the broadcast result of f_obs and f_exp along which to apply the test. If axis is None, all values in f_obs are treated as a single data set. Default is 0.

Returns

  • chisq (float or ndarray) – The chi-squared test statistic. The value is a float if axis is None or f_obs and f_exp are 1-D.

  • p (float or ndarray) – The p-value of the test. The value is a float if ddof and the return value chisq are scalars.

Notes

This test is invalid when the observed or expected frequencies in each category are too small. A typical rule is that all of the observed and expected frequencies should be at least 5.

The default degrees of freedom, k-1, are for the case when no parameters of the distribution are estimated. If p parameters are estimated by efficient maximum likelihood then the correct degrees of freedom are k-1-p. If the parameters are estimated in a different way, then the dof can be between k-1-p and k-1. However, it is also possible that the asymptotic distribution is not chi-square, in which case this test is not appropriate.

References

1

Lowry, Richard. “Concepts and Applications of Inferential Statistics”. Chapter 8. https://web.archive.org/web/20171022032306/http://vassarstats.net:80/textbook/ch8pt1.html

2

“Chi-squared test”, https://en.wikipedia.org/wiki/Chi-squared_test

Examples

When just f_obs is given, it is assumed that the expected frequencies are uniform and given by the mean of the observed frequencies.

>>> import mars.tensor as mt
>>> from mars.tensor.stats import chisquare
>>> chisquare([16, 18, 16, 14, 12, 12])
(2.0, 0.84914503608460956)

With f_exp the expected frequencies can be given.

>>> chisquare([16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8]).execute()
(3.5, 0.62338762774958223)

When f_obs is 2-D, by default the test is applied to each column.

>>> obs = mt.array([[16, 18, 16, 14, 12, 12], [32, 24, 16, 28, 20, 24]]).T
>>> obs.shape
(6, 2)
>>> chisquare(obs).execute()
(array([ 2.        ,  6.66666667]), array([ 0.84914504,  0.24663415]))

By setting axis=None, the test is applied to all data in the array, which is equivalent to applying the test to the flattened array.

>>> chisquare(obs, axis=None).execute()
(23.31034482758621, 0.015975692534127565)
>>> chisquare(obs.ravel()).execute()
(23.31034482758621, 0.015975692534127565)

ddof is the change to make to the default degrees of freedom.

>>> chisquare([16, 18, 16, 14, 12, 12], ddof=1).execute()
(2.0, 0.73575888234288467)

f_obs and f_exp are also broadcast. In the following, f_obs has shape (6,) and f_exp has shape (2, 6), so the result of broadcasting f_obs and f_exp has shape (2, 6). To compute the desired chi-squared statistics, we use axis=1:

>>> chisquare([16, 18, 16, 14, 12, 12],
...           f_exp=[[16, 16, 16, 16, 16, 8], [8, 20, 20, 16, 12, 12]],
...           axis=1).execute()
(array([ 3.5 ,  9.25]), array([ 0.62338763,  0.09949846]))