神经网络- 6-第一文库 | 海量文档资源下载与分享平台

免费试读已结束，剩余 51 页请下载文档后查看

15 金币

下载此文档

/ 61

下载提示文本预览

如果您无法下载资料，请参考说明：

1、部分资料下载需要金币，请确保您的账户上有足够的金币

2、已购买过的文档，再次下载不重复扣费

3、资料包下载后请先用软件解压，在使用对应软件打开

Chapter6TheMultilayerPerceptron6.1Introduction6.2Feed-forwardNetworkMapping6.3SigmoidUnits2.“tanh”ActivationFunction:6.4MethodofTraining:BPTheminimizationofthecostfunctionbyusinggradientmethod:(1)Thegradientforthehidden-to-outputweights:(2)Thegradientfortheinput-to-hiddenweights:(3)Summaryofthegradientcomputing:3.ImplementingBPAlgorithm(Cont’d)(Cont’d)Step3:Accumulategradientsovertheinputpatterns4.Networkswithmorethan2LayersBackpropagationofError:AnExample(Cont’d)(Cont’d)(Cont’d)(Cont’d)(Cont’d)(Cont’d)Demonstrations6.5TrainingAspects:LocalMinima(Cont’d)(Cont’d)2.TrainingwithNoiseSolutions:3.MomentumandLearningRateAdaptationWithmomentumµm,theweightupdateatagiventimetbecomeswhere0<µm<1isanewglobalparameterwhichmustbedeterminedbytrialanderror.Momentumsimplyaddsafractionµmofthepreviousweightupdatetothecurrentone.Whenthegradientkeepspointinginthesamedirection,thiswillincreasethesizeofthestepstakentowardstheminimum.Itisthereforeoftennecessarytoreducethegloballearningrateµwhenusingalotofmomentum(µmcloseto1).Ifyoucombineahighlearningratewithalotofmomentum,youwillrushpasttheminimumwithhugesteps!(Cont’d)(Cont’d)(2)LearningRateAdaptation(Cont’d)(Cont’d)6.6TrainingAspects:OverTraining(Cont’d)2.Bias-VarianceTrade-off(Cont’d)3.GeneralizationAbility4.PreventingOverfitting/Overtraining(1)Earlystopping(Cont’d)(2)Weightdecay(3)Trainingwithnoise6.7TrainingAspects:OptimizingNetSizeBywayofanexample,thenonlineardatawhichformedourfirstexamplecanbefittedverywellusing40tanhfunctions.Learningwith40hiddenunitsisconsiderablyharderthanlearningwith2,andtakessignificantlylonger.Theresultingfitisnobetter(asmeasuredbythesumsquarederror)thanthe2-unitmodel.Themostusualanswerisnotnecessarilythebest:weguessanappropriatenumber(aswedidabove).Anothercommonsolutionistotryoutseveralnetworksizes,andselectthemostpromising.Neitherofthesemethodsisveryprincipled.Twomorerigorousclassesofmethodsareavailable,however.Wecaneitherstartwithanetworkwhichweknowtobetoosmall,anditerativelyaddunitsandweights,orwecant