Statistics
KernelDensitySample
sample a kernel density estimate
Calling Sequence
Parameters
Description
Options
Notes
Examples
KernelDensitySample(data, n, options)
data
-
data sample
n
n; size of output sample
options
(optional) equation(s) of the form option=value where option is one of kernel, bandwidth, bins, left, right, ignore or weights; specify options for the KernelDensitySample function
The KernelDensitySample function creates a probability density function for the data using kernel density estimation and then uses the resulting probability density as a basis for drawing a random sample. The resulting data is thus very similar to the original data.
The first parameter data is the data set, given as e.g. a Vector.
The second parameter n is the size of the output sample, which must be a positive integer.
The options argument can contain one or more of the options shown below.
kernel = 'gaussian', 'biweight', 'epanechnikov', 'triangular', 'rectangular' -- By default this is 'gaussian'. This option allows a non-Gaussian kernel to be used in developing the estimate. The Gaussian and biweight kernels are smooth - if they are used no sharp edges will be present in the final estimate. The Epanechnikov kernel is often considered to be the optimal kernel. The triangular and rectangular kernels are sharp kernels - they will impose sharp edges in the final estimate.
bandwidth = realcons -- The bandwidth is a positive quantity that specifies the width of the kernel (the amount each data point affects distant portions of the probability density estimate). Each kernel is scaled such that the bandwidth is equal to the standard deviation of the kernel.
bins = posint -- The number of bins in which to categorize data points (512 by default). The number of bins determines the accuracy of the kernel density estimate.
left = realcons -- This option specifies the lower boundary on valid data values. Any data values that are smaller than this value are discarded. By default the algorithm attempts to determine the minimum value in the data set and further includes padding of width 3*bandwidth.
right = realcons -- This option specifies the upper boundary on valid data values. Any data values that are smaller than this value are discarded. By default the algorithm attempts to determine the maximum value in the data set and further includes padding of width 3*bandwidth.
ignore = truefalse -- This option is used to specify how to handle non-numeric data. If ignore is set to true all non-numeric items in data will be ignored.
weights = rtable -- Vector of weights (one-dimensional rtable). If weights are given, the KernelDensitySample function will weight each data point accordingly. Note that the weights provided must have type,realcons and the results are floating-point, even if the problem is specified with exact values. The data array and the weights array must have the same number of elements.
Note that points that do not fall into the range left..right and missing data are not considered for this operation and are normally discarded. If ignore=false and this procedure encounters missing data, it will spawn a userinfo message.
withStatistics:
Sample similar points from a distribution given by an array.
A≔Array−1.2,−1.1,−1.0,−0.9,−0.8,0.8,0.9,1.0,1.1,1.2
A≔−1.2−1.1−1.0−0.9−0.80.80.91.01.11.2
KernelDensitySampleA,5,bandwidth=0.05
1.062803100023841.15380244369467−1.120604719535091.161582175157920.880809677283239
Sample from a distribution similar to the exponential of a normal distribution by remapping points generated by the normal distribution.
N≔RandomVariableNormal0,1:
S≔SampleN,100:
M≔mapt↦expt,S:
KernelDensitySampleM,8,bins=512,kernel=biweight,left=0,right=exp5
2.803159265988450.7574784283123244.795481037874070.9719888426339380.6902841971721630.7727093017280261.526660474824341.20132544102366
See Also
Statistics[Computation]
Statistics[KernelDensity]
Download Help Document