Artificial Neural Networks: First, let's talk about real neural networks a bit Neurons: Electrical potential in biological tissue Has to do with balance of ions Chemistry review: Valence layer electrons Positron and electron balance Sharing and bonding Conductive materials and metallic bonding Stable vs. unstable atoms ion engines for space exploration Alright, the body is full of ions Na+, Ca2+, K+, Cl-, H+ Balance of these in a cell determines charge Charge above action potential causes firing Firing does NOT happen at the speed of light Channels have to open to boost signal Mylination, and mammels, etc How do the ions get in? Through gates Neurotransmitter affects gate opening Inhibitory vs excitatory neurotransmitters are released into the synaptic cleft They're in synaptic vesicles first Too much/little signalling: Change gates or neurotransmitter release This allows regulation Changing signalling: Add neurotransmitter Block neurotransmitters Prevent or increase neurotransmitter reuptake Add electrical charge Note on interacting with neurons: We can detect electrical signals We can produce electrical signals But neurons are small, so implants don't usually connect to individual ones Scanning: EEG: Record electrical activity in neurons fMRI: Look for metabolism by-products (energy expenditure) How about an artificial version: Inputs: Numerical usually Activation function: Weights for each input, threshold for activation That's phi, the transfer function Note that weights can be negative Real-number output: Multiply result by something instead of activation threshold Output: Consolidate to output node Or a few output nodes, if a few outputs are needed Parameters that usually aren't learned: Number of layers Number of neurons in each layer Connection scheme Transfer function threshold, but weights could just increase Parameters that are usually learned: Weights for each input on each node How could we figure out the parameters? Guess a bunch of times Use an evolutionary system Sorry, we don't have a closed form for it The common answer: backpropagation! Backpropagation: Iterative process When do we stop? When it's good enough How long will it take? We don't know General idea: Assign responsibility to incorrect weights and correct Another way to think of it: If wrong, adjust weights to be more right Training balance: If we give it a bunch of one class, it'll move that direction Can be a problem for unbalanced class distribution An option: Just run the uncommon examples more times Overtraining: Don't be too specific to the training set Same does apply to other algorithms, but in different ways Starting out: The delta rule Only works for one-layer networks Reminder from calculus: Partial derivatives "regular" derivatives: position, velocity, acceleration Notation: There are actually a bunch of these partial derivatives: Multiple varibles, hold all but one constant partial derivative of the activation function is used for this So the activation function should be differentiable Not going too far into gradient descent: Partial derivative of a function can indicate parameter adjustment So the formula comes from calculating the partial derivative for each weight Understanding this bit is key to understanding the whole thing How do we figure out what weight was responsible? First: Error function A function that returns the error over the training set It should be 0 if there were no errors The description on Wikipedia assumes multiple output nodes Skipping past some of the derivation here: Change in weight can be calculated by a straightforward formula Learning rate: Too fast will "jump around" Running time on this? (kinda long) Looking at a hypothesis from ANN: They're among the least comprehensible Some research has been dedicated to this SVM is also difficult in this regard C4.5/J48 and Naive Bayes produce far more comprehensible hypothesis Nearest neighbor will yield some related examples, which is helpful So what could we represent with this? not A A and B A or B A xor B Remember: Given the correct weighting, a node can be a logic gate Small combinations: Enough of A, B, and C combined Enough of A and B, but not too much C A is greater than B Runtime: Slow to train Not super fast to run, but not terrible either Depends on how many nodes and layers Can operate nodes from a single layer in parallel Do we have time for Weka today?