量化投資的轉折：分析師的良知-量化投資的轉折：分析師的良知54

Δji(t)=η+·Δji(t－1)，ifξ(t－1)wji(t－1)·ξ(t)wji(t)>0

η－·Δji(t－1)，ifξ(t－1)wji(t－1)·ξ(t)wji(t)<0

Δji(t－1)，elsewhere 0<η－<1<η+

Δwji(t)=Δwji(t)=－Δwji(t－1)，ifξ(t－1)wji(t－1)·ξ(t)wji(t)<0

－Δji(t)，ifξ(t)wji(t)>0

+Δji(t)，ifξ(t)wji(t)<0

0，else

Therefore，when the derivatives are in the same direction，the convergence speed will rise by increasing the magnitude of the weight changeHowever，when they are in different directions，it means the oscillation starts to appear and weight change should be reduced to give more precise adjustment

In practice，only part of the algorithm is adoptedOne eclectic method is:

Δji(t)=η+·Δji(t－1)ξ(t－1)wji(t－1)·ξ(t)wji(t),ifξ(t－1)wji(t－1)·ξ(t)wji(t)>0

η－·Δji(t－1)ξ(t－1)wji(t－1)·ξ(t)wji(t)，ifξ(t－1)wji(t－1)·ξ(t)wji(t)<0

Δji(t－1)，else

Δwji(t)=－Δji(t)，ifξ(t)wji(t)>0

+Δji(t)，ifξ(t)wji(t)<0

Conjugate Gradient Algorithms (CGA)

Rather than always following the steepest descent direction In BP series algorithms，CGA performs search in conjugateGiven the quadratic function:F(x)=12xTHx+dTx+c，a set of vectors {pk} is mutually conjugate with respect to a positive definite Hessian matrix H if and only if pTkHpj=0，where k≠j，Cited by Hagan et al(1996) directionsThe first search iteration still uses the steepest descent directionp(0)=－g(0)，after that，CGA starts to follow the rule:wji(t+1)=wji(t)+α(t)p(t)

p(t)=－g(t)+β(t)p(t－1) Different CGAs give different βdefinitionSuch as Fletcher-Reeves method:β(t)=gTtgtgTt－1gt－1，Polak-Ribiére method:β(t)=ΔgTtgtgTt－1gt－1 and hestenes-steifel method:β(t)=ΔgTtgtΔgTt－1gt－1

Newton Algorithms

As stated in Hagan et al (1996)，Newton method is actually bases on second-order Taylor series:

F(wji(t+1))=F(wji(t)+Δwji(t))≈F(wji(t))+gT(t)Δwji(t)+12ΔwTji(t)H(t)Δwji(t)

Where g(t) is the descent vector and H(t) is the Hessian matrixThe derivative with respect to Δwji(t) will be g(t)+H(t)Δwji(t)=0，then Δwji(t)=－H(t)＼g(t)

Levenberg-Marquardt algorithm (LMA)

LMA is based on Newton methodRather than calculating the Hessian matrix every iteration，LMA uses the approximate Hessian matrix G(t+1)=H(t)+μ(t)I，Δwji(t)=－［JT(wji(t))J(wji(t))+μ(t)I］＼JT(wji(t))eDetailed description could be found in Hagan et al(1996) where J is the Jacobian matrix，e is the network error