神经网络- 6.ppt
上传人:sy****28 上传时间:2024-09-13 格式:PPT 页数:61 大小:1.2MB 金币:15 举报 版权申诉
预览加载中,请您耐心等待几秒...

神经网络- 6.ppt

神经网络-6.ppt

预览

免费试读已结束,剩余 51 页请下载文档后查看

15 金币

下载此文档

如果您无法下载资料,请参考说明:

1、部分资料下载需要金币,请确保您的账户上有足够的金币

2、已购买过的文档,再次下载不重复扣费

3、资料包下载后请先用软件解压,在使用对应软件打开

Chapter6TheMultilayerPerceptron6.1Introduction6.2Feed-forwardNetworkMapping6.3SigmoidUnits2.“tanh”ActivationFunction:6.4MethodofTraining:BPTheminimizationofthecostfunctionbyusinggradientmethod:(1)Thegradientforthehidden-to-outputweights:(2)Thegradientfortheinput-to-hiddenweights:(3)Summaryofthegradientcomputing:3.ImplementingBPAlgorithm(Cont’d)(Cont’d)Step3:Accumulategradientsovertheinputpatterns4.Networkswithmorethan2LayersBackpropagationofError:AnExample(Cont’d)(Cont’d)(Cont’d)(Cont’d)(Cont’d)(Cont’d)Demonstrations6.5TrainingAspects:LocalMinima(Cont’d)(Cont’d)2.TrainingwithNoiseSolutions:3.MomentumandLearningRateAdaptationWithmomentumµm,theweightupdateatagiventimetbecomeswhere0<µm<1isanewglobalparameterwhichmustbedeterminedbytrialanderror.Momentumsimplyaddsafractionµmofthepreviousweightupdatetothecurrentone.Whenthegradientkeepspointinginthesamedirection,thiswillincreasethesizeofthestepstakentowardstheminimum.Itisthereforeoftennecessarytoreducethegloballearningrateµwhenusingalotofmomentum(µmcloseto1).Ifyoucombineahighlearningratewithalotofmomentum,youwillrushpasttheminimumwithhugesteps!(Cont’d)(Cont’d)(2)LearningRateAdaptation(Cont’d)(Cont’d)6.6TrainingAspects:OverTraining(Cont’d)2.Bias-VarianceTrade-off(Cont’d)3.GeneralizationAbility4.PreventingOverfitting/Overtraining(1)Earlystopping(Cont’d)(2)Weightdecay(3)Trainingwithnoise6.7TrainingAspects:OptimizingNetSizeBywayofanexample,thenonlineardatawhichformedourfirstexamplecanbefittedverywellusing40tanhfunctions.Learningwith40hiddenunitsisconsiderablyharderthanlearningwith2,andtakessignificantlylonger.Theresultingfitisnobetter(asmeasuredbythesumsquarederror)thanthe2-unitmodel.Themostusualanswerisnotnecessarilythebest:weguessanappropriatenumber(aswedidabove).Anothercommonsolutionistotryoutseveralnetworksizes,andselectthemostpromising.Neitherofthesemethodsisveryprincipled.Twomorerigorousclassesofmethodsareavailable,however.Wecaneitherstartwithanetworkwhichweknowtobetoosmall,anditerativelyaddunitsandweights,orwecant