mars.tensor.random.
zipf
Draw samples from a Zipf distribution.
Samples are drawn from a Zipf distribution with specified parameter a > 1.
The Zipf distribution (also known as the zeta distribution) is a continuous probability distribution that satisfies Zipf’s law: the frequency of an item is inversely proportional to its rank in a frequency table.
a (float or array_like of floats) – Distribution parameter. Should be greater than 1.
size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a is a scalar. Otherwise, mt.array(a).size samples are drawn.
(m, n, k)
m * n * k
None
a
mt.array(a).size
chunk_size (int or tuple of int or tuple of ints, optional) – Desired chunk size on each dimension
gpu (bool, optional) – Allocate the tensor on GPU if True, False as default
dtype (data-type, optional) – Data-type of the returned tensor.
out – Drawn samples from the parameterized Zipf distribution.
Tensor or scalar
See also
scipy.stats.zipf
probability density function, distribution, or cumulative density function, etc.
Notes
The probability density for the Zipf distribution is
where \(\zeta\) is the Riemann Zeta function.
It is named for the American linguist George Kingsley Zipf, who noted that the frequency of any word in a sample of a language is inversely proportional to its rank in the frequency table.
References
Zipf, G. K., “Selected Studies of the Principle of Relative Frequency in Language,” Cambridge, MA: Harvard Univ. Press, 1932.
Examples
Draw samples from the distribution:
>>> import mars.tensor as mt
>>> a = 2. # parameter >>> s = mt.random.zipf(a, 1000)
Display the histogram of the samples, along with the probability density function:
>>> import matplotlib.pyplot as plt >>> from scipy import special
Truncate s values at 50 so plot is interesting:
>>> count, bins, ignored = plt.hist(s[s<50].execute(), 50, normed=True) >>> x = mt.arange(1., 50.) >>> y = x**(-a) / special.zetac(a) >>> plt.plot(x.execute(), (y/mt.max(y)).execute(), linewidth=2, color='r') >>> plt.show()