Nutch07_tutorial.pdf
上传人:sy****28 上传时间:2024-09-14 格式:PDF 页数:5 大小:19KB 金币:16 举报 版权申诉
预览加载中,请您耐心等待几秒...

Nutch07_tutorial.pdf

Nutch07_tutorial.pdf

预览

在线预览结束,喜欢就下载吧,查找使用更方便

16 金币

下载此文档

如果您无法下载资料,请参考说明:

1、部分资料下载需要金币,请确保您的账户上有足够的金币

2、已购买过的文档,再次下载不重复扣费

3、资料包下载后请先用软件解压,在使用对应软件打开

Nutchversion0.7tutorialTableofcontents1Requirements.....................................................................................................................22GettingStarted...................................................................................................................23IntranetCrawling...............................................................................................................23.1Intranet:Configuration..................................................................................................23.2Intranet:RunningtheCrawl..........................................................................................34Whole-webCrawling.........................................................................................................34.1Whole-web:Concepts...................................................................................................34.2Whole-web:BoostrappingtheWebDatabase...............................................................44.3Whole-web:Fetching....................................................................................................44.4Whole-web:Indexing....................................................................................................54.5Searching.......................................................................................................................5Copyright©2006TheApacheSoftwareFoundation.Allrightsreserved.Nutchversion0.7tutorial1.Requirements1.Java1.4.x,eitherfromSunorIBMonLinuxispreferred.SetNUTCH_JAVA_HOMEtotherootofyourJVMinstallation.2.Apache'sTomcat4.x.3.OnWin32,cygwin,forshellsupport.(IfyouplantouseSubversiononWin32,besuretoselectthesubversionpackagewhenyouinstall,inthe"Devel"category.)4.Uptoagigabyteoffreediskspace,ahigh-speedconnection,andanhourorso.2.GettingStartedFirst,youneedtogetacopyoftheNutchcode.Youcandownloadareleasefromhttp://lucene.apache.org/nutch/release/.Unpackthereleaseandconnecttoitstop