How to choose the training algorithm: sequential vs batch? @ supraHex 1.13.3

      # The choice of 'sequential vs batch' training algorithm largely depends on the compromise: runtime, accuracy, and the nature of the input data.

# Generally speaking, the 'batch' algorithm should be used when: 1) you care about the runtime, 2) input data is huge (both of rows and columns in number), and 3) input data do not contain too many zero entries. For these reasons, the function sPipeline uses 'batch' algorithm as a default choise.

# However, the 'sequential' algorithm should be favored when: 1) you really care about the accuracy and 2) are not sure the nature of the input data. For these reasons, the function sCompReorder uses 'sequential' algorithm as a default choise.

# Special note: if the input data do contain a great amount of zero entries (very sparse), the 'sequential' algorithm must be used. Otherwise, using the 'batch' algorithm will lead to most of data points being clustered into one or a few of dominant hexagons/nodes, which is usually abnormal (see the example below).

# Generate data with an iid matrix of 100 x 10
data <- matrix(rnorm(100*10,mean=0,sd=1), nrow=100,ncol=10)
colnames(data) <- paste('S', seq(1:10), sep="")
# Force those negatives to be zeros, and thus being very sparse
data[data<0] <- 0
# Train using the 'batch' algorthm
sMap <- sPipeline(data, algorithm="batch")

Start at 2017-03-27 18:59:44

First, define topology of a map grid (2017-03-27 18:59:44)...
Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2017-03-27 18:59:44)...
Third, get training at the rough stage (2017-03-27 18:59:44)...
	1 out of 7 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	2 out of 7 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	3 out of 7 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	4 out of 7 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	5 out of 7 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	6 out of 7 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	7 out of 7 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
Fourth, get training at the finetune stage (2017-03-27 18:59:44)...
	1 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	2 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	3 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	4 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	5 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	6 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	7 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	8 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	9 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	10 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	11 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	12 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	13 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	14 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	15 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	16 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	17 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	18 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	19 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	20 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	21 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	22 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	23 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	24 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
	25 out of 25 (2017-03-27 18:59:44)
	updated (2017-03-27 18:59:44)
Next, identify the best-matching hexagon/rectangle for the input data (2017-03-27 18:59:44)...
Finally, append the response data (hits and mqe) into the sMap object (2017-03-27 18:59:44)...

Below are the summaries of the training results:
   dimension of input data: 100x10
   xy-dimension of map grid: xdim=9, ydim=9, r=5
   grid lattice: hexa
   grid shape: suprahex
   dimension of grid coord: 61x2
   initialisation method: linear
   dimension of codebook matrix: 61x10
   mean quantization error: 1.35513779440565

Below are the details of trainology:
   training algorithm: batch
   alpha type: invert
   training neighborhood kernel: gaussian
   trainlength (x input data length): 7 at rough stage; 25 at finetune stage
   radius (at rough stage): from 3 to 1
   radius (at finetune stage): from 1 to 1

End at 2017-03-27 18:59:44
Runtime in total is: 0 secs

## Look at the number of input data vectors hitting the hexagons
visHexMapping(sMap, mappingType="hits")
# Now, train using the 'sequential' algorthm
sMap <- sPipeline(data, algorithm="sequential")

Start at 2017-03-27 18:59:44

First, define topology of a map grid (2017-03-27 18:59:44)...
Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2017-03-27 18:59:44)...
Third, get training at the rough stage (2017-03-27 18:59:44)...
	1 out of 700 (2017-03-27 18:59:44)
	70 out of 700 (2017-03-27 18:59:44)
	140 out of 700 (2017-03-27 18:59:44)
	210 out of 700 (2017-03-27 18:59:44)
	280 out of 700 (2017-03-27 18:59:44)
	350 out of 700 (2017-03-27 18:59:44)
	420 out of 700 (2017-03-27 18:59:44)
	490 out of 700 (2017-03-27 18:59:44)
	560 out of 700 (2017-03-27 18:59:44)
	630 out of 700 (2017-03-27 18:59:44)
	700 out of 700 (2017-03-27 18:59:44)
Fourth, get training at the finetune stage (2017-03-27 18:59:44)...
	1 out of 2500 (2017-03-27 18:59:44)
	250 out of 2500 (2017-03-27 18:59:44)
	500 out of 2500 (2017-03-27 18:59:44)
	750 out of 2500 (2017-03-27 18:59:44)
	1000 out of 2500 (2017-03-27 18:59:44)
	1250 out of 2500 (2017-03-27 18:59:44)
	1500 out of 2500 (2017-03-27 18:59:44)
	1750 out of 2500 (2017-03-27 18:59:44)
	2000 out of 2500 (2017-03-27 18:59:44)
	2250 out of 2500 (2017-03-27 18:59:44)
	2500 out of 2500 (2017-03-27 18:59:44)
Next, identify the best-matching hexagon/rectangle for the input data (2017-03-27 18:59:44)...
Finally, append the response data (hits and mqe) into the sMap object (2017-03-27 18:59:44)...

Below are the summaries of the training results:
   dimension of input data: 100x10
   xy-dimension of map grid: xdim=9, ydim=9, r=5
   grid lattice: hexa
   grid shape: suprahex
   dimension of grid coord: 61x2
   initialisation method: linear
   dimension of codebook matrix: 61x10
   mean quantization error: 1.88486291245215

Below are the details of trainology:
   training algorithm: sequential
   alpha type: invert
   training neighborhood kernel: gaussian
   trainlength (x input data length): 7 at rough stage; 25 at finetune stage
   radius (at rough stage): from 3 to 1
   radius (at finetune stage): from 1 to 1

End at 2017-03-27 18:59:44
Runtime in total is: 0 secs


## Look at the number of input data vectors hitting the hexagons
visHexMapping(sMap, mappingType="hits")
Source faq

FAQ6.r
How to choose the training algorithm: sequential vs batch?

Source faq

Functions used in this FAQ