Why learn a solution? If you can solve it an easier way, that *might* be better But some reasons: The solution changes over time, and needs to be updated The solution depends on the install location The solution has too many parameters The solution is too complicated to figure out Let's talk a little about how to deal with a bunch of parameters: This kind of situation comes up pretty often Example: Detecting purple dye in microscope images Finding a local maximum or minimum Evolutionary strategies Last time we covered decision trees They're good at some things Weka: Note: How's the online class supposed to see this? Tennis dataset: Kind of a classic, but not many examples Training and test sets will be pretty small! .arff format, etc Let's make a tree for it! Cross-validation: Consider that the training set and test set are probably pulled from the same data Could we just pick a different set of examples to hold out for testing? Cross-validation: Hold 10% back for testing But do it 10 times, each time with a different 10% Each fold might generate a little different hypothesis So we're validating a process, not an exact hypothesis Final version could use all data for training Naive Bayes: Bayes Theorom and Conditional Probability Naive assumption, and why it's naive Note for classification: We only need to compare classifications So the probability doesn't really have to sum to 1 In defiance of the usual practice for other things... Think about a problem: Spam detection One method would be to create a feature vector for words Each feature is a word, number of the vector is the number of occurances So there will be a lot of features, but a lot of them will be significant This is a poor case for a decision tree Works well for naive bayes! A few benefits: It's easy to see what each feature accomplishes Training consists of straightforward statistics C4.5/J48 had this too A large number of marginally-significant features can be represented Output carries level of certainty Not everything is a benefit: Takes longer to run than a decision tree This includes assessing the effect of useless features Doesn't really take into account "or", "xor" Each feature is independent Each feature always pushes toward the same classification The effect of a feature can't change in response to other features