Algorithm Overview - SVM

Algorithm Description 

SVM is an acronym for Support Vector Machine.  The objective of the SVM algorithm is to draw a line (jn fact, likely a very complex multi-dimensional curve) that does a good job of separating the data in a manner to minimize error.  For example, in a classification problem (see picture below), finding a line that best separates green from blue dots is pretty straightforward.  That line, when determined in this case, can be used to determine whether a record is "blue" or "green" with 100% accuracy knowing just two inputs (x1 and x2).  Of course, in real examples, the data do not separate so cleanly.  The purpose of the SVM algorithm is to use the data to find a curve through multidimensional space that best separates blue from green.

 

 

Additional Links  

https://en.wikipedia.org/wiki/Support_vector_machine 

https://web.stanford.edu/~hastie/Papers/ESLII.pdf#page=153 

 

Lityx IQ Parameters  

 

Kernel - The kernel is the method by which the mathematical relationship is produced.  Linear is the simplest and provides the fastest run times, and can be considered similar to a linear regression line.  Other options produce more complex potential patterns to be detected, with a tradeoff of longer run times and potential for over-fitting.  Options include:

  • Linear: fit a straight line
  • Polynomial: fit a curved line (see Polynomial Degree)
  • Radial basis: uses distance (similarity) between the observations as the basis for separation
  • Sigmoid (also known as Hyperbolic Tangent): this uses the tanh function to model the relationship and so can pick up highly non-linear relationships in the data.  

Polynomial Degree - This is only used if the polynomial kernel is selected. Higher values will increase processing time, but allow for more complex relationshiip

Kernel Intercept - For polynomial and sigmoidal kernels, this is the additive intercept within the kernel definition.

Gamma - For all but the linear kernel, this is the multiplicative parameter within the kernel definition. Leave it as 0 to use the internal default which is 1 divided by the number of variables. 

Cost Coefficient - the cost factor associated with constraints violation within the Lagrange regularization term. This parameter helps control overfitting.

Tolerance -  controls the stopping criterion of the training process. Higher values will shorten processing time but may lead to underfitting. .

Maximum Number of Model Terms - The maximum number of terms used during the variable selection process. Larger values may take longer processing time, but too small a value may miss important variables.