量化投資的轉折：分析師的良知-量化投資的轉折：分析師的良知54

Momentum BP:Δwji(t)=αΔwji(t－1)－ηξ(t)wji(t)，cited by Yang and Zheng (2003)Its gradient form:Δwji(t)=αΔwji(t－1)+(1－α)η*δj(t)pi(t)，where 1>α>0 is the momentum factorWhen α=1，weight change will be totally dependent on the previous weight changeWithout the first weight change，this algorithm will collapseWhen α=0，Momentum BP becomes BP againTherefore，Momentum BP makes weight change equal to the sum of a fraction of the last weight change and the new change suggested by the BP rule

Variable Learning Rate (VLR):

As the optimal learning rate changes during the training process，the VLR method could improve the performance of the NN on the steepest descent algorithmFirst，three predefined parameters are needed:learning rate increase ratio，learning rate decrease ratio and max performance increase ratioAnd then，the new performance based on Gradient Descent algorithm is calculated and compared with the old performanceIf

new performanceodl performance>max performance increase ratio，η=η·learning rate decrease ratioOr else，if new performance < old performance，η=η·learning rate decrease ratioIt is obvious that VLR tries to decrease the rough convergence time by increasing the learning rateWhen the performance starts to oscillate，the learning rate begins to decline to give more precise adjustment

Hybrid of Momentum and VLR:

This method absorbs VLR in Momentum BP weight change rule,Δwji(t)=αΔwji(t－1)+(1－α)η*δj(t)pi(t)，whereη* follows the VLR process

Resilient BP

Most of the transfer function will compress the input data in a finite rangeWhen some input value is close to the minimum or maximum of the input，its derivative will be close to zeroTherefore，the weights adjustment will be smallRiedmiller and Braun (1993) introduce this resilient BP to conquer this problemFirst，there are also three parameters that need to be set:the plus learning ratio (η+)，the negative learning ratio (η－) and the initial weight changeAs already pointed out by Riedmiller and Braun (1993)，the initial value is not critical at allA proper value around 01 is reasonable for it still determines the size of the first weight step (Δji(0))The algorithm decomposes the weight-adjustment into two parts:the sign part and the update-value partΔtijRiedmiller and Braun (1993) do propose the typical adjust-process: