## Function to compute the pairwise distance for a given data matrix

### Description

`sDistance` is supposed to compute and return the distance matrix between the rows of a data matrix using a specified distance metric

### Usage

```sDistance(data, metric = c("pearson", "spearman", "kendall", "euclidean", "manhattan",
"cos", "mi"))```

### Arguments

data
a data frame or matrix of input data
metric
distance metric used to calculate a symmetric distance matrix. See 'Note' below for options available

### Value

• `dist`: a symmetric distance matrix of nRow x nRow, where nRow is the number of rows of input data matrix

### Note

The distance metrics are supported:

• "pearson": Pearson correlation. Note that two curves that have identical shape, but different magnitude will still have a correlation of 1
• "spearman": Spearman rank correlation. As a nonparametric version of the pearson correlation, it calculates the correlation between the ranks of the data values in the two vectors (more robust against outliers)
• "kendall": Kendall tau rank correlation. Compared to spearman rank correlation, it goes a step further by using only the relative ordering to calculate the correlation. For all pairs of data points `(x_i, y_i)` and `(x_j, y_j)`, it calls a pair of points either as concordant (`Nc` in total) if `(x_i - x_j)*(y_i - y_j)>0`, or as discordant (`Nd` in total) if ```(x_i - x_j)*(y_i - y_j)<0```. Finally, it calculates gamma coefficient `(Nc-Nd)/(Nc+Nd)` as a measure of association which is highly resistant to tied data
• "euclidean": Euclidean distance. Unlike the correlation-based distance measures, it takes the magnitude into account (input data should be suitably normalized
• "manhattan": Cityblock distance. The distance between two vectors is the sum of absolute value of their differences along any coordinate dimension
• "cos": Cosine similarity. As an uncentered version of pearson correlation, it is a measure of similarity between two vectors of an inner product space, i.e., measuring the cosine of the angle between them (using a dot product and magnitude)
• "mi": Mutual information (MI). `MI` provides a general measure of dependencies between variables, in particular, positive, negative and nonlinear correlations. The caclulation of `MI` is implemented via applying adaptive partitioning method for deriving equal-probability bins (i.e., each bin contains approximately the same number of data points). The number of bins is heuristically determined (the lower bound): `1+log2(n)`, where n is the length of the vector. Because `MI` increases with entropy, we normalize it to allow comparison of different pairwise clone similarities: `2*MI/[H(x)+H(y)]`, where `H(x)` and `H(y)` stand for the entropy for the vector `x` and `y`, respectively

### Examples

```# 1) generate an iid normal random matrix of 100x10
data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10)

# 2) calculate distance matrix using different metric
sMap <- sPipeline(data=data)

Start at 2018-01-18 16:56:03

First, define topology of a map grid (2018-01-18 16:56:03)...
Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2018-01-18 16:56:03)...
Third, get training at the rough stage (2018-01-18 16:56:03)...
1 out of 7 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
2 out of 7 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
3 out of 7 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
4 out of 7 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
5 out of 7 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
6 out of 7 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
7 out of 7 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
Fourth, get training at the finetune stage (2018-01-18 16:56:03)...
1 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
2 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
3 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
4 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
5 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
6 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
7 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
8 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
9 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
10 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
11 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
12 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
13 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
14 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
15 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
16 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
17 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
18 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
19 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
20 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
21 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
22 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
23 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
24 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
25 out of 25 (2018-01-18 16:56:03)
updated (2018-01-18 16:56:03)
Next, identify the best-matching hexagon/rectangle for the input data (2018-01-18 16:56:03)...
Finally, append the response data (hits and mqe) into the sMap object (2018-01-18 16:56:03)...

Below are the summaries of the training results:
dimension of input data: 100x10
xy-dimension of map grid: xdim=9, ydim=9, r=5
grid lattice: hexa
grid shape: suprahex
dimension of grid coord: 61x2
initialisation method: linear
dimension of codebook matrix: 61x10
mean quantization error: 4.79218942397731

Below are the details of trainology:
training algorithm: batch
alpha type: invert
training neighborhood kernel: gaussian
trainlength (x input data length): 7 at rough stage; 25 at finetune stage
radius (at rough stage): from 3 to 1
radius (at finetune stage): from 1 to 1

End at 2018-01-18 16:56:03
Runtime in total is: 0 secs

# 2a) using "pearson" metric
dist <- sDistance(data=data, metric="pearson")
# 2b) using "cos" metric
# dist <- sDistance(data=data, metric="cos")
# 2c) using "spearman" metric
# dist <- sDistance(data=data, metric="spearman")
# 2d) using "kendall" metric
# dist <- sDistance(data=data, metric="kendall")
# 2e) using "euclidean" metric
# dist <- sDistance(data=data, metric="euclidean")
# 2f) using "manhattan" metric
# dist <- sDistance(data=data, metric="manhattan")
# 2g) using "mi" metric
# dist <- sDistance(data=data, metric="mi")
```

## Source code

`sDistance.r`

## Source man

`sDistance.Rd` `sDistance.pdf`

`sDmatCluster`