Artificial Neural Networks:
	First, let's talk about real neural networks a bit

Neurons:
	Electrical potential in biological tissue
	Has to do with balance of ions
	Chemistry review:
		Valence layer electrons
		Positron and electron balance
		Sharing and bonding
		Conductive materials and metallic bonding
		Stable vs. unstable atoms
		ion engines for space exploration
	Alright, the body is full of ions
		Na+, Ca2+, K+, Cl-, H+
		Balance of these in a cell determines charge
		Charge above action potential causes firing
		Firing does NOT happen at the speed of light
			Channels have to open to boost signal
			Mylination, and mammels, etc
	How do the ions get in?
		Through gates
		Neurotransmitter affects gate opening
			Inhibitory vs excitatory
			neurotransmitters are released into the synaptic cleft
			They're in synaptic vesicles first
		Too much/little signalling:  Change gates or neurotransmitter release
			This allows regulation
		Changing signalling:
			Add neurotransmitter
			Block neurotransmitters
			Prevent or increase neurotransmitter reuptake
			Add electrical charge

Note on interacting with neurons:
	We can detect electrical signals
	We can produce electrical signals
	But neurons are small, so implants don't usually connect to individual ones
	Scanning:
		EEG:  Record electrical activity in neurons
		fMRI:  Look for metabolism by-products (energy expenditure)

How about an artificial version:
	Inputs:  Numerical usually
	Activation function:  Weights for each input, threshold for activation
		That's phi, the transfer function
		Note that weights can be negative
	Real-number output:  Multiply result by something instead of activation threshold
	Output:  Consolidate to output node
		Or a few output nodes, if a few outputs are needed

Parameters that usually aren't learned:
	Number of layers
	Number of neurons in each layer
	Connection scheme
	Transfer function threshold, but weights could just increase

Parameters that are usually learned:
	Weights for each input on each node

How could we figure out the parameters?
	Guess a bunch of times
	Use an evolutionary system
	Sorry, we don't have a closed form for it
	The common answer:  backpropagation!

Backpropagation:
	Iterative process
		When do we stop?  When it's good enough
		How long will it take?  We don't know	
	General idea:  Assign responsibility to incorrect weights and correct
	Another way to think of it:  If wrong, adjust weights to be more right
	Training balance:
		If we give it a bunch of one class, it'll move that direction
		Can be a problem for unbalanced class distribution
		An option:  Just run the uncommon examples more times
	Overtraining:
		Don't be too specific to the training set
		Same does apply to other algorithms, but in different ways

Starting out:  The delta rule
	Only works for one-layer networks
	Reminder from calculus:  Partial derivatives
		"regular" derivatives:  position, velocity, acceleration
		Notation:  There are actually a bunch of these
		partial derivatives:  Multiple varibles, hold all but one constant
		partial derivative of the activation function is used for this
			So the activation function should be differentiable
		Not going too far into gradient descent:
			Partial derivative of a function can indicate parameter adjustment
	So the formula comes from calculating the partial derivative for each weight
		Understanding this bit is key to understanding the whole thing	

How do we figure out what weight was responsible?
	First:  Error function
		A function that returns the error over the training set
		It should be 0 if there were no errors
		The description on Wikipedia assumes multiple output nodes
	Skipping past some of the derivation here:
		Change in weight can be calculated by a straightforward formula
		Learning rate:  Too fast will "jump around"
		Running time on this?  (kinda long)
		
Looking at a hypothesis from ANN:
	They're among the least comprehensible
	Some research has been dedicated to this
	SVM is also difficult in this regard
	C4.5/J48 and Naive Bayes produce far more comprehensible hypothesis
	Nearest neighbor will yield some related examples, which is helpful

So what could we represent with this?
	not A
	A and B
	A or B
	A xor B
	Remember:  Given the correct weighting, a node can be a logic gate
	Small combinations:
		Enough of A, B, and C combined
		Enough of A and B, but not too much C
		A is greater than B

Runtime:
	Slow to train
	Not super fast to run, but not terrible either
		Depends on how many nodes and layers
		Can operate nodes from a single layer in parallel

Do we have time for Weka today?