b-Hadapt-Split Execution Hadoopdb.pdf
上传人:sy****28 上传时间:2024-09-14 格式:PDF 页数:12 大小:1.4MB 金币:16 举报 版权申诉
预览加载中,请您耐心等待几秒...

b-Hadapt-Split Execution Hadoopdb.pdf

b-Hadapt-SplitExecutionHadoopdb.pdf

预览

免费试读已结束,剩余 2 页请下载文档后查看

16 金币

下载此文档

如果您无法下载资料,请参考说明:

1、部分资料下载需要金币,请确保您的账户上有足够的金币

2、已购买过的文档,再次下载不重复扣费

3、资料包下载后请先用软件解压,在使用对应软件打开

EfficientProcessingofDataWarehousingQueriesinaSplitExecutionEnvironmentKamilBajda-Pawlikowski1+2,DanielJ.Abadi1+2,AviSilberschatz2,ErikPaulson31HadaptInc.,2YaleUniversity,3UniversityofWisconsin-Madison{kbajda,dna}@hadapt.com;avi@cs.yale.edu;epaulson@cs.wisc.eduABSTRACT1.INTRODUCTIONHadaptisastart-upcompanycurrentlycommercializingMapReduce[19]isemergingasaleadingframeworkfortheYaleUniversityresearchprojectcalledHadoopDB.Theperformingscalableparallelanalyticsanddatamining.companyfocusesonbuildingaplatformforBigDataanalyt-SomeofthereasonsforthepopularityofMapReduceicsinthecloudbyintroducingastoragelayeroptimizedforincludetheavailabilityofafreeandopensourceimplemen-structureddataandbyprovidingaframeworkforexecutingtation(Hadoop)[2],impressiveease-of-useexperience[30],SQLqueriesefficiently.aswellasGoogle's,Yahoo!'s,andFacebook'swideusageThisworkconsidersprocessingdatawarehousingqueries[19,25]andevangelizationofthistechnology.Moreover,oververylargedatasets.Ourgoalistomaximizeperfor-MapReducehasbeenshowntodeliverstellarperformancemancewhile,atthesametime,notgivingupfaulttoleranceonextreme-scalebenchmarks[17,3].Allthesefactorshaveandscalability.WeanalyzethecomplexityofthisproblemresultedintherapidadoptionofMapReduceformanyinthesplitexecutionenvironmentofHadoopDB.Here,in-differentkindsofdataanalysisandprocessing[15,18,32,comingqueriesareexamined;partsofthequeryarepushed29,25,11].downandexecutedinsidethehigherperformingdatabaseHistorically,themainapplicationsoftheMapReducelayer;andtherestofthequeryisprocessedinamoregenericframeworkincludedWebindexing,textanalytics,andMapReduceframework.graphdatamining.Inthispaper,wediscussindetailperformance-orientedNow,however,asMapReduceissteadilydevelopingintoqueryexecutionstrategiesfordatawarehousequeriesinsplitthedefactodataanalysisstandard,itrepeatedlybecomesexecutionenvironments,withparticularfocusonjoinandemployedforqueryingstructureddata|anareatradition-aggregationoperations.Theefficiencyofourtechniquesallydominat