sDistance
is supposed to compute and return the distance matrix
between the rows of a data matrix using a specified distance metric
sDistance(data, metric = c("pearson", "spearman", "kendall", "euclidean", "manhattan", "cos", "mi"))
dist
: a symmetric distance matrix of nRow x nRow, where
nRow is the number of rows of input data matrix
The distance metrics are supported:
(x_i, y_i)
and (x_j, y_j)
, it calls a pair of points either
as concordant (Nc
in total) if (x_i - x_j)*(y_i - y_j)>0
,
or as discordant (Nd
in total) if (x_i - x_j)*(y_i -
y_j)<0
. Finally, it calculates gamma coefficient (Nc-Nd)/(Nc+Nd)
as a measure of association which is highly resistant to tied data
MI
provides a general
measure of dependencies between variables, in particular, positive,
negative and nonlinear correlations. The caclulation of MI
is
implemented via applying adaptive partitioning method for deriving
equal-probability bins (i.e., each bin contains approximately the same
number of data points). The number of bins is heuristically determined
(the lower bound): 1+log2(n)
, where n is the length of the
vector. Because MI
increases with entropy, we normalize it to
allow comparison of different pairwise clone similarities:
2*MI/[H(x)+H(y)]
, where H(x)
and H(y)
stand for the
entropy for the vector x
and y
, respectively
# 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) # 2) calculate distance matrix using different metric sMap <- sPipeline(data=data)Start at 2018-01-18 16:56:03 First, define topology of a map grid (2018-01-18 16:56:03)... Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2018-01-18 16:56:03)... Third, get training at the rough stage (2018-01-18 16:56:03)... 1 out of 7 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 2 out of 7 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 3 out of 7 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 4 out of 7 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 5 out of 7 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 6 out of 7 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 7 out of 7 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) Fourth, get training at the finetune stage (2018-01-18 16:56:03)... 1 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 2 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 3 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 4 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 5 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 6 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 7 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 8 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 9 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 10 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 11 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 12 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 13 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 14 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 15 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 16 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 17 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 18 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 19 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 20 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 21 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 22 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 23 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 24 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) 25 out of 25 (2018-01-18 16:56:03) updated (2018-01-18 16:56:03) Next, identify the best-matching hexagon/rectangle for the input data (2018-01-18 16:56:03)... Finally, append the response data (hits and mqe) into the sMap object (2018-01-18 16:56:03)... Below are the summaries of the training results: dimension of input data: 100x10 xy-dimension of map grid: xdim=9, ydim=9, r=5 grid lattice: hexa grid shape: suprahex dimension of grid coord: 61x2 initialisation method: linear dimension of codebook matrix: 61x10 mean quantization error: 4.79218942397731 Below are the details of trainology: training algorithm: batch alpha type: invert training neighborhood kernel: gaussian trainlength (x input data length): 7 at rough stage; 25 at finetune stage radius (at rough stage): from 3 to 1 radius (at finetune stage): from 1 to 1 End at 2018-01-18 16:56:03 Runtime in total is: 0 secs# 2a) using "pearson" metric dist <- sDistance(data=data, metric="pearson") # 2b) using "cos" metric # dist <- sDistance(data=data, metric="cos") # 2c) using "spearman" metric # dist <- sDistance(data=data, metric="spearman") # 2d) using "kendall" metric # dist <- sDistance(data=data, metric="kendall") # 2e) using "euclidean" metric # dist <- sDistance(data=data, metric="euclidean") # 2f) using "manhattan" metric # dist <- sDistance(data=data, metric="manhattan") # 2g) using "mi" metric # dist <- sDistance(data=data, metric="mi")