Artificial Intelligence Introduction: Artificial Intelligence includes Machine Learning But machine learning isn't all there is to AI "solved" games: Think about tic-tac-toe 9 possible moves, but only 3 unique Each of these leads to 8 responses, but not all are unique Center: 2 (side or corner) Corner: 5 Side: 5 So now there are 12 possible games, each of which continue No more than 9 moves, but the last only has one option So we're 2 moves in, 1 doesn't count, leaving 6 If the average number of options is 4: 4^6 = 4096 But a lot of those lead to an immediate loss! So how many possible positions are there? No more than 12 * 4096, which is 49,152 From any point, a computer could calculate all possible outcomes mimimax tree Chess: I can't find the quote I remember about this At any rate, chess computers don't have to be intelligent minmax tree search No need for elaborate pruning with tic-tac-toe pruning with chess! Very important! More pruning = longer lookahead Too much pruning = miss important moves How many moves can you make? checkers vs. chess go mancala Main point here: AI is a bag of tricks Sometimes they work really well! Intro to machine learning Machine learning is increasingly applied to chess An important principle in AI: Simpler is better if they perform equally well Complicated is worse unless it performs better Perform? This really depends on the task Chess or tic-tac-toe: Defeats other players There are computer chess tournaments Classification: Accuracy? Maybe! Detecting rare conditions? "How does this perform" can be a pretty complicated question Machine learning is usually on the "more complicated" end An example task that's really common: Supervised learning for Classification Problem setup: We have a bunch of examples of X and Y We know a bunch that are X for sure and Y for sure We also have, or will have, a bunch where we're not sure which they are The future or unknown X and Y examples are similar to the old ones We can create a training set using the data we already have classified A set where we know the answer already We can create a test set the same way But we won't use this to train the model! Must also be comprised of examples where we know the right classification We expect that test set performance will be maintained Some examples of what this could be I had brain scans with neural conditions Balance and training: Do we have more of one type of example than the other? Is one type just plain more likely? Accuracy as % correct is only valid for balanced datasets! Representation of Examples: Usually a feature vector It's a series of features, consistent from one example to the next A lot of choices go into how to make these! Will we have weka available? First algorithm: C4.5 (J48) Decision Trees J48 is a widely-used Java implementation of C4.5 Tree nodes split on a feature Try to purify results Entropy: Discorder at any point Entropy definition (wikipedia) https://en.wikipedia.org/wiki/Entropy_(information_theory) Let's make a quick Python function to calculate entropy Easy enough! (I hope...) math.log2 Information gain https://en.wikipedia.org/wiki/Information_gain_(decision_tree) Decrease in entropy, averaged across child nodes Ok, now we can calculate these Let's put together a tree for a classic weka example Note: The classifications don't have to be binary Can, for example, try ranges for numerical values What can a decision tree repesent? And: If feature1 is A and feature2 is B Or: If feature1 is A or feature2 is B Xor: If feature1 is A xor feature2 is B Not: If not feature1 You can reduce a decision tree to a logical conjunction Could possibly be simplified after that Could also make a semiconductor to implement it So a C4.5 hypothesis can be applied very quickly and efficiently! Another decision tree pro: Comprehensibility It's possible to see why a decision was made And figure out what features were important Yet another pro for this: Large numbers of useless features don't hurt If they're useless and don't have much information gain, they'll be ignored Good if you don't even know what features are important! Can use this for feature selection, for other algorithms A disadvantage: Lots of significant features are a problem Imaging a feature vector from a 4K image Relations of neighboring pixels would be very important! Some other algorithms do a lot better here Another disadvantage: Doesn't really do degree Specific outputs, rather than floating-point