Support Vector Machines:
	Suppose a 2D feature vector
		As in, 2 features
		That's not very many
	When we graph them, we get groups that can be separated
		Good!
		So our hypothesis is a line
		We'll need to calculate a line that maximizes the margin between the points
	What if we had three features?
		We'd have to graph them in 3D
	 	Harder to draw, but I'm sure we can picture it
		The hypothesis is now a plane instead of a line
	How about 4 features?
		Now it's hard to imagine, not just draw!
		It's a hyperplane
		hyperplane = subspace with one less dimension than its space
		So a line is a hyperplane in 2D
		N-dimensional space will have hyperplane with n-1 dimensions

Representing a hyperplane:
	Normal vector and offset along the vector
	w is the normal vector, b is the offset
	That's not the only way to describe a hyperplane
	Back to 3D:  Any two non-overlapping vectors describe a plane
	But in an SVM, we'll use w and b

Hard-margin SVM:
	Just define two hyperplanes, with the margin between them
	Follow hyperplane normal vector to data to compute distance
		Is there a closed form for that?
			And is it important that there be?
		Yes!  Distance from a point to a plane
		We'll avoid getting into the weeds on this point
	Data between the planes is uncertain
		If pressed, we can use the middle 
		There wasn't any training data there
		

To train it, we calculate that hyperplane
	But how?
	Consider it a constraint that no training examples are in the margin
	The data closest to the hyperplanes are support vectors
	This forms an optimization problem, to find the largest margin
		Given the support vectors
	
Not quite done yet:
	What if the training data can't be separated?
		As in, it's not linearly separabl
	Soft-margin and parameter C
	quadratic programming (not computer programming)

Kernel trick
	Why does distance have to be calculated only in the standard dimensionality?
		Couldn't we have a more flexible definition of distance?
	kernel function:  Offload distance calculation to a "kernel"
	RBF kernel features in a lot of example pictures
		Loosly, distance from a given point
		There's a bit more going on in there than just that though
	There a fair number of other kernels out there
	You can substitute a different kernel
		Practical note:  Ok as long as your feature vectors match your kernel
		So the feature vectors could be graphs!

A use I made of SVM:
	Graph kernel, no feature vectors
	Precomputed as a table on a cluster
	Looks like I was using Aeolus	

SVM in Weka