第八章他山之石 附件1算法示例
There are at least five basic categories of learning rules for different kinds of NN:Error-correction learning,Memory-based learning,Hebbian learning,Competitive learning and Boltzmann learnsEven back-propagation algorithmFor the integrity and better understanding of this paper,only BP would be briefly introducedMore specific explanation could be located in “Neural Networks:A comprehensive foundation” chapter 4 (BP),the milestone in NN training,could still be thought of as a derivative of Error-correction learning process
Two passes constitute the BPThe brief proof would be in appendix method:the forward pass and the backward passIn forward phase,weights of neurons are fixedThey will be adjusted in backward phase according to error-correction ruleThe error is defined as ej(t)=dj(t)-aj(t),where n represents the iteration (time step)The error energy for specified neuron is defined as ξ(t)=12∑j∈Ce2j(t)The BP method applies a correctionΔwji(t) to weights wji(t),Δwji(t)=-ηξ(t)wji(t) where η is the learning-rate parameter of the BP algorithm and could be set manuallyTherefore,for an output neuron,Δwji(t)=ηej(t)f′j(∑Ri=0wjipi)pi(t)When dealing with a hidden layer neuron,there is no specified desired response for itAccording to the chain rule of calculus,the weight change for the neurons,which directly connect the output layer,will be:Δwji(t)=ηf′j(∑Ri=0wjipi)∑kek(t)f′k(∑mj=0wkjpj)wkj(t) where m is the number of inputs applied to the output neuron k
Unfortunately,BP method is really slowThe learning rate,η,is constant in the training processHowever,the network performance is sensitive to the setting of the learning rateAs cited by Yang and Zheng (2003):when the learning rate is too high,the weights may oscillate around the stable valueWhen the learning rate is too low,the convergence process will cost too much timeTherefore,when the topology is simple,the low learning rate strategy could still handle the problemHowever,when the NN structure becomes more complicated,other fast training methods need to be considered
As Δwji(t)=η·δj(t)·pi(t),cited by Haykin (1999),where η is the learning rate parameter,δj(t) is the local gradient,pi(t)is the input signal of neuron j,heuristic techniques,which focus on the analysis of descent algorithm,could give some solutions for fast trainingBecause the speed of training depends on many factors,including the network structure,data set and even the precision of error,it is very difficult to suggest a specific training algorithm for this paperHence,a number of training algorithm candidates will be briefly introduced here: