Why learn a solution?
	If you can solve it an easier way, that *might* be better

But some reasons:
	The solution changes over time, and needs to be updated
	The solution depends on the install location
	The solution has too many parameters
	The solution is too complicated to figure out

Let's talk a little about how to deal with a bunch of parameters:
	This kind of situation comes up pretty often
	Example:  Detecting purple dye in microscope images
	Finding a local maximum or minimum
	Evolutionary strategies
	
Last time we covered decision trees
	They're good at some things

Weka:
	Note:  How's the online class supposed to see this?
	Tennis dataset:
		Kind of a classic, but not many examples
		Training and test sets will be pretty small!
		.arff format, etc
		Let's make a tree for it!

Cross-validation:
	Consider that the training set and test set are probably pulled from the same data
	Could we just pick a different set of examples to hold out for testing?
	Cross-validation:  Hold 10% back for testing
		But do it 10 times, each time with a different 10%
		Each fold might generate a little different hypothesis
		So we're validating a process, not an exact hypothesis
		Final version could use all data for training

Naive Bayes:
	Bayes Theorom and Conditional Probability
	Naive assumption, and why it's naive
	Note for classification:  We only need to compare classifications
		So the probability doesn't really have to sum to 1
		In defiance of the usual practice for other things...

Think about a problem:  Spam detection
	One method would be to create a feature vector for words
		Each feature is a word, number of the vector is the number of occurances
	So there will be a lot of features, but a lot of them will be significant
		This is a poor case for a decision tree
	Works well for naive bayes!

A few benefits:
	It's easy to see what each feature accomplishes
	Training consists of straightforward statistics
		C4.5/J48 had this too
	A large number of marginally-significant features can be represented
	Output carries level of certainty
	

Not everything is a benefit:
	Takes longer to run than a decision tree
		This includes assessing the effect of useless features
	Doesn't really take into account "or", "xor"
		Each feature is independent
		Each feature always pushes toward the same classification
		The effect of a feature can't change in response to other features