mars.tensor.stats.ks_1samp#
- mars.tensor.stats.ks_1samp(x: Union[ndarray, list, TileableType], cdf: Callable, args: Tuple = (), alternative: str = 'two-sided', mode: str = 'auto')[源代码]#
Performs the one-sample Kolmogorov-Smirnov test for goodness of fit.
This test compares the underlying distribution F(x) of a sample against a given continuous distribution G(x). See Notes for a description of the available null and alternative hypotheses.
- 参数
x (array_like) – a 1-D array of observations of iid random variables.
cdf (callable) – callable used to calculate the cdf.
args (tuple, sequence, optional) – Distribution parameters, used with cdf.
alternative ({'two-sided', 'less', 'greater'}, optional) – Defines the null and alternative hypotheses. Default is ‘two-sided’. Please see explanations in the Notes below.
mode ({'auto', 'exact', 'approx', 'asymp'}, optional) –
Defines the distribution used for calculating the p-value. The following options are available (default is ‘auto’):
’auto’ : selects one of the other options.
’exact’ : uses the exact distribution of test statistic.
’approx’ : approximates the two-sided probability with twice the one-sided probability
’asymp’: uses asymptotic distribution of test statistic
- 返回
statistic (float) – KS test statistic, either D, D+ or D- (depending on the value of ‘alternative’)
pvalue (float) – One-tailed or two-tailed p-value.
参见
ks_2samp
,kstest
备注
There are three options for the null and corresponding alternative hypothesis that can be selected using the alternative parameter.
two-sided: The null hypothesis is that the two distributions are identical, F(x)=G(x) for all x; the alternative is that they are not identical.
less: The null hypothesis is that F(x) >= G(x) for all x; the alternative is that F(x) < G(x) for at least one x.
greater: The null hypothesis is that F(x) <= G(x) for all x; the alternative is that F(x) > G(x) for at least one x.
Note that the alternative hypotheses describe the CDFs of the underlying distributions, not the observed values. For example, suppose x1 ~ F and x2 ~ G. If F(x) > G(x) for all x, the values in x1 tend to be less than those in x2.
示例
>>> import numpy as np >>> from scipy import stats >>> import mars.tensor as mt >>> from mars.tensor.stats import ks_1samp
>>> np.random.seed(12345678) #fix random seed to get the same result >>> x = mt.linspace(-15, 15, 9, chunk_size=5) >>> ks_1samp(x, stats.norm.cdf).execute() (0.44435602715924361, 0.038850142705171065)
>>> ks_1samp(stats.norm.rvs(size=100), stats.norm.cdf).execute() KstestResult(statistic=0.165471391799..., pvalue=0.007331283245...)
Test against one-sided alternative hypothesis
Shift distribution to larger values, so that `` CDF(x) < norm.cdf(x)``:
>>> x = stats.norm.rvs(loc=0.2, size=100) >>> ks_1samp(x, stats.norm.cdf, alternative='less').execute() KstestResult(statistic=0.235488541678..., pvalue=1.158315030683...)
Reject null hypothesis in favor of alternative hypothesis: less
>>> ks_1samp(x, stats.norm.cdf, alternative='greater').execute() KstestResult(statistic=0.010167165616..., pvalue=0.972494973653...)
Reject null hypothesis in favor of alternative hypothesis: greater
>>> ks_1samp(x, stats.norm.cdf).execute() KstestResult(statistic=0.235488541678..., pvalue=2.316630061366...)
Don’t reject null hypothesis in favor of alternative hypothesis: two-sided
Testing t distributed random variables against normal distribution
With 100 degrees of freedom the t distribution looks close to the normal distribution, and the K-S test does not reject the hypothesis that the sample came from the normal distribution:
>>> ks_1samp(stats.t.rvs(100, size=100), stats.norm.cdf).execute() KstestResult(statistic=0.077844250253..., pvalue=0.553155412513...)
With 3 degrees of freedom the t distribution looks sufficiently different from the normal distribution, that we can reject the hypothesis that the sample came from the normal distribution at the 10% level:
>>> ks_1samp(stats.t.rvs(3, size=100), stats.norm.cdf).execute() KstestResult(statistic=0.118967105356..., pvalue=0.108627114578...)