How to choose the training algorithm: sequential vs batch?

Notes:
  • All results are based on supraHex (version 1.13.3).
  • R scripts (i.e. R expressions) plus necessary comments are highlighted in light-cyan background, and the rest are outputs in the screen.
  • Images displayed below may be distorted, but should be normal in your screen.
  • Functions contained in supraHex 1.13.3 are hyperlinked in-place and also listed on the right side.
  • Key texts are underlined, in bold and in pumpkin-orange color.
  •       
    # The choice of 'sequential vs batch' training algorithm largely depends on the compromise: runtime, accuracy, and the nature of the input data. # Generally speaking, the 'batch' algorithm should be used when: 1) you care about the runtime, 2) input data is huge (both of rows and columns in number), and 3) input data do not contain too many zero entries. For these reasons, the function sPipeline uses 'batch' algorithm as a default choise. # However, the 'sequential' algorithm should be favored when: 1) you really care about the accuracy and 2) are not sure the nature of the input data. For these reasons, the function sCompReorder uses 'sequential' algorithm as a default choise. # Special note: if the input data do contain a great amount of zero entries (very sparse), the 'sequential' algorithm must be used. Otherwise, using the 'batch' algorithm will lead to most of data points being clustered into one or a few of dominant hexagons/nodes, which is usually abnormal (see the example below). # Generate data with an iid matrix of 100 x 10 data <- matrix(rnorm(100*10,mean=0,sd=1), nrow=100,ncol=10) colnames(data) <- paste('S', seq(1:10), sep="") # Force those negatives to be zeros, and thus being very sparse data[data<0] <- 0 # Train using the 'batch' algorthm sMap <- sPipeline(data, algorithm="batch")
    Start at 2017-03-27 18:59:44 First, define topology of a map grid (2017-03-27 18:59:44)... Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2017-03-27 18:59:44)... Third, get training at the rough stage (2017-03-27 18:59:44)... 1 out of 7 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 2 out of 7 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 3 out of 7 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 4 out of 7 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 5 out of 7 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 6 out of 7 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 7 out of 7 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) Fourth, get training at the finetune stage (2017-03-27 18:59:44)... 1 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 2 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 3 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 4 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 5 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 6 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 7 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 8 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 9 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 10 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 11 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 12 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 13 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 14 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 15 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 16 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 17 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 18 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 19 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 20 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 21 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 22 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 23 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 24 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) 25 out of 25 (2017-03-27 18:59:44) updated (2017-03-27 18:59:44) Next, identify the best-matching hexagon/rectangle for the input data (2017-03-27 18:59:44)... Finally, append the response data (hits and mqe) into the sMap object (2017-03-27 18:59:44)... Below are the summaries of the training results: dimension of input data: 100x10 xy-dimension of map grid: xdim=9, ydim=9, r=5 grid lattice: hexa grid shape: suprahex dimension of grid coord: 61x2 initialisation method: linear dimension of codebook matrix: 61x10 mean quantization error: 1.35513779440565 Below are the details of trainology: training algorithm: batch alpha type: invert training neighborhood kernel: gaussian trainlength (x input data length): 7 at rough stage; 25 at finetune stage radius (at rough stage): from 3 to 1 radius (at finetune stage): from 1 to 1 End at 2017-03-27 18:59:44 Runtime in total is: 0 secs
    ## Look at the number of input data vectors hitting the hexagons visHexMapping(sMap, mappingType="hits") # Now, train using the 'sequential' algorthm sMap <- sPipeline(data, algorithm="sequential")
    Start at 2017-03-27 18:59:44 First, define topology of a map grid (2017-03-27 18:59:44)... Second, initialise the codebook matrix (61 X 10) using 'linear' initialisation, given a topology and input data (2017-03-27 18:59:44)... Third, get training at the rough stage (2017-03-27 18:59:44)... 1 out of 700 (2017-03-27 18:59:44) 70 out of 700 (2017-03-27 18:59:44) 140 out of 700 (2017-03-27 18:59:44) 210 out of 700 (2017-03-27 18:59:44) 280 out of 700 (2017-03-27 18:59:44) 350 out of 700 (2017-03-27 18:59:44) 420 out of 700 (2017-03-27 18:59:44) 490 out of 700 (2017-03-27 18:59:44) 560 out of 700 (2017-03-27 18:59:44) 630 out of 700 (2017-03-27 18:59:44) 700 out of 700 (2017-03-27 18:59:44) Fourth, get training at the finetune stage (2017-03-27 18:59:44)... 1 out of 2500 (2017-03-27 18:59:44) 250 out of 2500 (2017-03-27 18:59:44) 500 out of 2500 (2017-03-27 18:59:44) 750 out of 2500 (2017-03-27 18:59:44) 1000 out of 2500 (2017-03-27 18:59:44) 1250 out of 2500 (2017-03-27 18:59:44) 1500 out of 2500 (2017-03-27 18:59:44) 1750 out of 2500 (2017-03-27 18:59:44) 2000 out of 2500 (2017-03-27 18:59:44) 2250 out of 2500 (2017-03-27 18:59:44) 2500 out of 2500 (2017-03-27 18:59:44) Next, identify the best-matching hexagon/rectangle for the input data (2017-03-27 18:59:44)... Finally, append the response data (hits and mqe) into the sMap object (2017-03-27 18:59:44)... Below are the summaries of the training results: dimension of input data: 100x10 xy-dimension of map grid: xdim=9, ydim=9, r=5 grid lattice: hexa grid shape: suprahex dimension of grid coord: 61x2 initialisation method: linear dimension of codebook matrix: 61x10 mean quantization error: 1.88486291245215 Below are the details of trainology: training algorithm: sequential alpha type: invert training neighborhood kernel: gaussian trainlength (x input data length): 7 at rough stage; 25 at finetune stage radius (at rough stage): from 3 to 1 radius (at finetune stage): from 1 to 1 End at 2017-03-27 18:59:44 Runtime in total is: 0 secs
    ## Look at the number of input data vectors hitting the hexagons visHexMapping(sMap, mappingType="hits")

    Source faq

    FAQ6.r

    Functions used in this FAQ