Function to setup the pipeline for completing ab initio training given the input data

Description

sPipeline is supposed to finish ab inito training for the input data. It returns an object of class "sMap".

Usage

sPipeline(data = NULL, xdim = NULL, ydim = NULL, nHex = NULL, lattice = c("hexa", 
  "rect"), shape = c("suprahex", "sheet", "triangle", "diamond", "hourglass", "trefoil", 
      "ladder", "butterfly", "ring", "bridge"), scale = 5, init = c("linear", "uniform", 
      "sample"), algorithm = c("batch", "sequential"), alphaType = c("invert", "linear", 
      "power"), neighKernel = c("gaussian", "bubble", "cutgaussian", "ep", "gamma"), 
      finetuneSustain = FALSE, verbose = TRUE)

Arguments

data
a data frame or matrix of input data
xdim
an integer specifying x-dimension of the grid
ydim
an integer specifying y-dimension of the grid
nHex
the number of hexagons/rectangles in the grid
lattice
the grid lattice, either "hexa" for a hexagon or "rect" for a rectangle
shape
the grid shape, either "suprahex" for a supra-hexagonal grid or "sheet" for a hexagonal/rectangle sheet. Also supported are suprahex's variants (including "triangle" for the triangle-shaped variant, "diamond" for the diamond-shaped variant, "hourglass" for the hourglass-shaped variant, "trefoil" for the trefoil-shaped variant, "ladder" for the ladder-shaped variant, "butterfly" for the butterfly-shaped variant, "ring" for the ring-shaped variant, and "bridge" for the bridge-shaped variant)
scale
the scaling factor. Only used when automatically estimating the grid dimension from input data matrix. By default, it is 5 (big map). Other suggested values: 1 for small map, and 3 for median map
init
an initialisation method. It can be one of "uniform", "sample" and "linear" initialisation methods
algorithm
the training algorithm. It can be one of "sequential" and "batch" algorithm. By default, it uses 'batch' algorithm purely because of its fast computations (probably also without the compromise of accuracy). However, it is highly recommended not to use 'batch' algorithm if the input data contain lots of zeros; it is because matrix multiplication used in the 'batch' algorithm can be problematic in this context. If much computation resource is at hand, it is alwasy safe to use the 'sequential' algorithm
alphaType
the alpha type. It can be one of "invert", "linear" and "power" alpha types
neighKernel
the training neighborhood kernel. It can be one of "gaussian", "bubble", "cutgaussian", "ep" and "gamma" kernels
finetuneSustain
logical to indicate whether sustain the "finetune" training. If true, it will repeat the "finetune" stage until the mean quantization error does get worse. By default, it sets to true
verbose
logical to indicate whether the messages will be displayed in the screen. By default, it sets to false for no display

Value

an object of class "sMap", a list with following components:

  • nHex: the total number of hexagons/rectanges in the grid
  • xdim: x-dimension of the grid
  • ydim: y-dimension of the grid
  • r: the hypothetical radius of the grid
  • lattice: the grid lattice
  • shape: the grid shape
  • coord: a matrix of nHex x 2, with rows corresponding to the coordinates of all hexagons/rectangles in the 2D map grid
  • polygon: a data frame of three columns ('x','y','id') storing polygon location per hexagon in the 2D map grid
  • init: an initialisation method
  • neighKernel: the training neighborhood kernel
  • codebook: a codebook matrix of nHex x ncol(data), with rows corresponding to prototype vectors in input high-dimensional space
  • hits: a vector of nHex, each element meaning that a hexagon/rectangle contains the number of input data vectors being hit wherein
  • mqe: the mean quantization error for the "best" BMH
  • call: the call that produced this result

Note

The pipeline sequentially consists of:

  • i) sTopology used to define the topology of a grid (with "suprahex" shape by default ) according to the input data;
  • ii) sInitial used to initialise the codebook matrix given the pre-defined topology and the input data (by default using "uniform" initialisation method);
  • iii) sTrainology and sTrainSeq used to get the grid map trained at both "rough" and "finetune" stages. If instructed, sustain the "finetune" training until the mean quantization error does get worse;
  • iv) sBMH used to identify the best-matching hexagons/rectangles (BMH) for the input data, and these response data are appended to the resulting object of "sMap" class.

References

Hai Fang and Julian Gough. (2014) supraHex: an R/Bioconductor package for tabular omics data analysis using a supra-hexagonal map. Biochemical and Biophysical Research Communications, 443(1), 285-289.

Examples

# 1) generate an iid normal random matrix of 100x10 data <- matrix( rnorm(100*10,mean=0,sd=1), nrow=100, ncol=10) colnames(data) <- paste(rep('S',10), seq(1:10), sep="") # 2) get trained using by default setup but with different neighborhood kernels # 2a) with "gaussian" kernel sMap <- sPipeline(data=data, neighKernel="gaussian")
Start at 2018-01-18 16:56:09 First, define topology of a map grid (2018-01-18 16:56:09)... Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2018-01-18 16:56:09)... Third, get training at the rough stage (2018-01-18 16:56:09)... 1 out of 7 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 2 out of 7 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 3 out of 7 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 4 out of 7 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 5 out of 7 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 6 out of 7 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 7 out of 7 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) Fourth, get training at the finetune stage (2018-01-18 16:56:09)... 1 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 2 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 3 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 4 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 5 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 6 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 7 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 8 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 9 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 10 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 11 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 12 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 13 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 14 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 15 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 16 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 17 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 18 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 19 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 20 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 21 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 22 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 23 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 24 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) 25 out of 25 (2018-01-18 16:56:09) updated (2018-01-18 16:56:09) Next, identify the best-matching hexagon/rectangle for the input data (2018-01-18 16:56:09)... Finally, append the response data (hits and mqe) into the sMap object (2018-01-18 16:56:09)... Below are the summaries of the training results: dimension of input data: 100x10 xy-dimension of map grid: xdim=9, ydim=9, r=5 grid lattice: hexa grid shape: suprahex dimension of grid coord: 61x2 initialisation method: linear dimension of codebook matrix: 61x10 mean quantization error: 4.92761300512866 Below are the details of trainology: training algorithm: batch alpha type: invert training neighborhood kernel: gaussian trainlength (x input data length): 7 at rough stage; 25 at finetune stage radius (at rough stage): from 3 to 1 radius (at finetune stage): from 1 to 1 End at 2018-01-18 16:56:09 Runtime in total is: 0 secs
# 2b) with "bubble" kernel # sMap <- sPipeline(data=data, neighKernel="bubble") # 2c) with "cutgaussian" kernel # sMap <- sPipeline(data=data, neighKernel="cutgaussian") # 2d) with "ep" kernel # sMap <- sPipeline(data=data, neighKernel="ep") # 2e) with "gamma" kernel # sMap <- sPipeline(data=data, neighKernel="gamma") # 3) visualise multiple component planes of a supra-hexagonal grid visHexMulComp(sMap, colormap="jet", ncolors=20, zlim=c(-1,1), gp=grid::gpar(cex=0.8)) # 4) get trained using by default setup but using the shape "butterfly" sMap <- sPipeline(data=data, shape="trefoil", algorithm=c("batch","sequential")[2])
Start at 2018-01-18 16:56:09 First, define topology of a map grid (2018-01-18 16:56:09)... Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2018-01-18 16:56:09)... Third, get training at the rough stage (2018-01-18 16:56:09)... 1 out of 700 (2018-01-18 16:56:09) 70 out of 700 (2018-01-18 16:56:09) 140 out of 700 (2018-01-18 16:56:09) 210 out of 700 (2018-01-18 16:56:09) 280 out of 700 (2018-01-18 16:56:09) 350 out of 700 (2018-01-18 16:56:09) 420 out of 700 (2018-01-18 16:56:09) 490 out of 700 (2018-01-18 16:56:09) 560 out of 700 (2018-01-18 16:56:09) 630 out of 700 (2018-01-18 16:56:09) 700 out of 700 (2018-01-18 16:56:09) Fourth, get training at the finetune stage (2018-01-18 16:56:09)... 1 out of 2500 (2018-01-18 16:56:09) 250 out of 2500 (2018-01-18 16:56:10) 500 out of 2500 (2018-01-18 16:56:10) 750 out of 2500 (2018-01-18 16:56:10) 1000 out of 2500 (2018-01-18 16:56:10) 1250 out of 2500 (2018-01-18 16:56:10) 1500 out of 2500 (2018-01-18 16:56:10) 1750 out of 2500 (2018-01-18 16:56:10) 2000 out of 2500 (2018-01-18 16:56:10) 2250 out of 2500 (2018-01-18 16:56:10) 2500 out of 2500 (2018-01-18 16:56:10) Next, identify the best-matching hexagon/rectangle for the input data (2018-01-18 16:56:10)... Finally, append the response data (hits and mqe) into the sMap object (2018-01-18 16:56:10)... Below are the summaries of the training results: dimension of input data: 100x10 xy-dimension of map grid: xdim=11, ydim=11, r=6 grid lattice: hexa grid shape: trefoil dimension of grid coord: 61x2 initialisation method: linear dimension of codebook matrix: 61x10 mean quantization error: 5.91367829503479 Below are the details of trainology: training algorithm: sequential alpha type: invert training neighborhood kernel: gaussian trainlength (x input data length): 7 at rough stage; 25 at finetune stage radius (at rough stage): from 3 to 1 radius (at finetune stage): from 1 to 1 End at 2018-01-18 16:56:10 Runtime in total is: 1 secs
visHexMulComp(sMap, colormap="jet", ncolors=20, zlim=c(-1,1), gp=grid::gpar(cex=0.8))