Momentum BP:Δwji(t)=αΔwji(t-1)-ηξ(t)wji(t),cited by Yang and Zheng (2003)Its gradient form:Δwji(t)=αΔwji(t-1)+(1-α)η*δj(t)pi(t),where 1>α>0 is the momentum factorWhen α=1,weight change will be totally dependent on the previous weight changeWithout the first weight change,this algorithm will collapseWhen α=0,Momentum BP becomes BP againTherefore,Momentum BP makes weight change equal to the sum of a fraction of the last weight change and the new change suggested by the BP rule
Variable Learning Rate (VLR):
As the optimal learning rate changes during the training process,the VLR method could improve the performance of the NN on the steepest descent algorithmFirst,three predefined parameters are needed:learning rate increase ratio,learning rate decrease ratio and max performance increase ratioAnd then,the new performance based on Gradient Descent algorithm is calculated and compared with the old performanceIf
new performanceodl performance>max performance increase ratio,η=η·learning rate decrease ratioOr else,if new performance < old performance,η=η·learning rate decrease ratioIt is obvious that VLR tries to decrease the rough convergence time by increasing the learning rateWhen the performance starts to oscillate,the learning rate begins to decline to give more precise adjustment
Hybrid of Momentum and VLR:
This method absorbs VLR in Momentum BP weight change rule,Δwji(t)=αΔwji(t-1)+(1-α)η*δj(t)pi(t),whereη* follows the VLR process
Resilient BP
Most of the transfer function will compress the input data in a finite rangeWhen some input value is close to the minimum or maximum of the input,its derivative will be close to zeroTherefore,the weights adjustment will be smallRiedmiller and Braun (1993) introduce this resilient BP to conquer this problemFirst,there are also three parameters that need to be set:the plus learning ratio (η+),the negative learning ratio (η-) and the initial weight changeAs already pointed out by Riedmiller and Braun (1993),the initial value is not critical at allA proper value around 01 is reasonable for it still determines the size of the first weight step (Δji(0))The algorithm decomposes the weight-adjustment into two parts:the sign part and the update-value partΔtijRiedmiller and Braun (1993) do propose the typical adjust-process: