量化投資的轉折:分析師的良知54(2 / 3)

Momentum BP:Δwji(t)=αΔwji(t-1)-ηξ(t)wji(t),cited by Yang and Zheng (2003)Its gradient form:Δwji(t)=αΔwji(t-1)+(1-α)η*δj(t)pi(t),where 1>α>0 is the momentum factorWhen α=1,weight change will be totally dependent on the previous weight changeWithout the first weight change,this algorithm will collapseWhen α=0,Momentum BP becomes BP againTherefore,Momentum BP makes weight change equal to the sum of a fraction of the last weight change and the new change suggested by the BP rule

Variable Learning Rate (VLR):

As the optimal learning rate changes during the training process,the VLR method could improve the performance of the NN on the steepest descent algorithmFirst,three predefined parameters are needed:learning rate increase ratio,learning rate decrease ratio and max performance increase ratioAnd then,the new performance based on Gradient Descent algorithm is calculated and compared with the old performanceIf

new performanceodl performance>max performance increase ratio,η=η·learning rate decrease ratioOr else,if new performance < old performance,η=η·learning rate decrease ratioIt is obvious that VLR tries to decrease the rough convergence time by increasing the learning rateWhen the performance starts to oscillate,the learning rate begins to decline to give more precise adjustment

Hybrid of Momentum and VLR:

This method absorbs VLR in Momentum BP weight change rule,Δwji(t)=αΔwji(t-1)+(1-α)η*δj(t)pi(t),whereη* follows the VLR process

Resilient BP

Most of the transfer function will compress the input data in a finite rangeWhen some input value is close to the minimum or maximum of the input,its derivative will be close to zeroTherefore,the weights adjustment will be smallRiedmiller and Braun (1993) introduce this resilient BP to conquer this problemFirst,there are also three parameters that need to be set:the plus learning ratio (η+),the negative learning ratio (η-) and the initial weight changeAs already pointed out by Riedmiller and Braun (1993),the initial value is not critical at allA proper value around 01 is reasonable for it still determines the size of the first weight step (Δji(0))The algorithm decomposes the weight-adjustment into two parts:the sign part and the update-value partΔtijRiedmiller and Braun (1993) do propose the typical adjust-process: