[PDF] [PDF] Fine-grained Partitioning for Aggressive Data Skipping - Databricks

purpose, such as load balancing and roll-in/roll-out opera- tions Specifically For example, when a new partition dt='1994-11-03' is added to the table logs:



Previous PDF Next PDF





[PDF] Treaty Series Recueil- des Traites - UN Environment Document

A list of the technical and artistic personnel of the two countries; - A detailed cost I have the honour to inform you that the above-mentioned proposal by the Les projets dont la mise en ceuvre a td d6cidde peuvent comporter des aspects juridisen avun, maksujen perimisen, arvopapereiden sailyttamisen yms rahaston



[PDF] Treaty Series Recueil des Traite-s - United Nations Treaty Collection

L'Ambassadeur de Finlande dt Paris au Ministre des affaires gtrangres Agreement, I have the honour to propose that the following tax relief should be 05/SK/ 1/68 of 29 January 1968, the Indonesian official list of goods for which payment kimuksia, selostuksia, luonnoksia, malleja yms sisdltivien tieteellisteknisten



[PDF] IAC minutes Jan 2014 - Navy Japan Child & Youth Education Services

23 jan 2014 · Ms Lindsey Todd, Parent Representative, YMS k Ms Valerie (e) 10 February 2014: Sixth Grade Honor Roll, 1315, D T GLENISTER 14



[PDF] IAC Minutes Nov 2013 - Navy Japan Child & Youth Education Services

School (YMS) 1 Ms Lindsey Todd, Parent Representative, YMS The YMS Girls' Science Technology Engineering and Math (STEM) conference and (g) 14 November 2013: 6th Grade Honor Roll (h) 15 November D T GLENISTER 14



[PDF] assistant coach jamie corti - AWS Simple Storage Service (Amazon

26 juil 2017 · 10 – Kindt – [Kin-dt] Named to Big South Presidential Honor Roll a member of the National Honors Society Played for YMS Xplosion 



[PDF] 1964pdf - NET

Thi year th top honor _tud nt w r , bac/, row, left to rzght: Tom covill Third row: du ·ation g-yms and Honor Roll, C \ \, \h td ( horu , (,irJ ' lee Club, - hm n irl- '



[PDF] TABLE OF CONTENTS - York Public Schools

The criteria for All “A” Honor Roll, a YMS Student must have a GPA of 93 or higher 3 doses of DTaP, DTP, DT, or Td vaccine, one given on or after the 4th



[PDF] Fine-grained Partitioning for Aggressive Data Skipping - Databricks

purpose, such as load balancing and roll-in/roll-out opera- tions Specifically For example, when a new partition dt='1994-11-03' is added to the table logs:



[PDF] OFFSHORE HYDROMECHANICS First Edition

a bridge wing is mainly build up by heave, pitch and roll motions and the mass flow through plane dy dz during a unit of time dt out-of the block at This type of equation is often referred to as a ”Cummins Equation” in honor of ª0(µ)=_yMs



[PDF] 510123_RMpdf - Town Hall Town of Chapel Hill

your City in line for an Honor Roll Plaque from the National Safety Council for last year Yours truly, S D T Neville A J Thomas George Poe Harold Cheek

[PDF] Honor Roll: Second Honors

[PDF] Honorable Correspondant, A l`occasion de l`expansion de ses - Anciens Et Réunions

[PDF] Honoraires - Géomètre-expert - Immobilier

[PDF] Honoraires CEU Cardenal Herrera 2016/17 - Université

[PDF] Honoraires de location : plafonds et modalités d`application - France

[PDF] Honoraires de location et de gestion locative - France

[PDF] honoraires de résultat de l avocat dessaisi avant la fin

[PDF] Honoraires location et gestion locative - France

[PDF] HONORAIRES PRESTATIONS JURIDIQUES

[PDF] Honorarkraft für ein Sprach- und Leseförderungsprojekt in der

[PDF] Honorarordnung - Österreichische Ärztekammer

[PDF] Honorarordnung 2012 - Veterinärmedizinische Universität Wien

[PDF] Honoré de Balzac - Le Chef - Anciens Et Réunions

[PDF] Honoré de Balzac, Le Colonel Chabert (1832) Imprimez cette feuille

[PDF] Honoré TABUNA et Ingratia Kayitavu

Fine-grainedPar titioningforAggressiveDataSkipping LiwenSun,MichaelJ. Franklin, SanjayKr ishnan,ReynoldS.Xin

AMPLab,UCBerk eleyand

DatabricksInc.

{liwen,franklin,sanjay ,rxin}@cs.berkeley .edu

ABSTRACT

Modernquerye nginesareincreasi nglybeingrequiredt opro- cessenormousdat asetsinnearreal-t ime.Whilemuchcan bedoneto speedup thedataa ccess,apromisingt echni que istoreduc etheneed toaccess datathroug hdataskipping .

Bymain tainingsomemetadataforeachblock oftuples,a

querymaysk ipadatabloc kifthemetadata indi catesthat theblock doesnotcontain relevantda ta.Thee ectiveness ofdatas kipping,howev er,dependsonhowwellthe blocking schemematchestheq ueryfilters. Inthi spaper,weprop oseafine-grainedbl ocking tech- niquethatreo rganizestheda tatuplesintoblockswitha goalofenablingq ueriesto skipblocksaggre ssively.Wefirst extractrepresentative filtersinaworkloadasfeaturesusing frequentitemsetmini ng.Basedonthesefeatures,e achdata tuplecanbe representeda safeature vector.Wethenfor- mulatetheblockingpro blemasa optimizationproblemon thefeature vectors,calle dBalancedMaxSkipPartitioning, whichweproveisN P-hard.To findanapproxim atesol u- tione ciently,weadoptthebottom-upc lusteri ngframe- work.Weprototy pedourbl ockingtechniquesonShark,an open-sourcedatawarehousesystem .Ourexperi mentson

TPC-Handareal-worl dworklo ad showthatourblocking

techniqueleadsto2-5xim provementinque ryrespons etime overtraditiona lrange-basedblockingtechniques.

CategoriesandSubject Descriptors

H.2.2[DatabaseManagement]:Phys icalDesign

Keywords

Datawarehouse ;Partitioning;Queryprocessi ng;Algorithms

1.INTRODUCTION

Dataanalyt icshasbeenprovencriticalinm anyappli ca- tions,rangingfrombusines sdecisionmak ingtosc ientific discovery.Manyoftheseapplicat ionsrequireinteractively unlockinginsightsfrome normousdata.Tomeetthisneed, designsofmodernquerye ngines(e .g.,[8,37,3 2,2,35])a re Permissionto makedigital orhardcopiesofallor partofthis workfor personalor classroomuseis grantedwithoutfee providedthat copiesarenot madeor distributed forprofitor commercialadvantage andthatcopies bearthisnotice andthefullcitation onthefirst page.Copyrights forcomponentsof thiswork ownedbyothersthan the author(s)mustbe honored.Abstracting withcreditis permitted.Tocopyotherwise, or republish,topost onservers ortoredistrib utetolists, requirespriorspecificpermission and/orafee. Requestpermissionsfrom permissions@acm.org.

SIGMOD'14,June22-27,2014, Snowbird,UT ,USA.

Copyrightisheldbythe owner/author(s).Publication rightslicensedto ACM.

ACM978-1-4503-2376-5/14/06...$15.00.

par%%on!1 par%%on!2 data!source ETL range/hash par%%oning query! log par%%on7wise blocking storage+engine par%%on!3 blocked!tuples blocking par%%oned!tableunpar%%oned!table par%%on!1 par%%on!2 par%%on!3

Figure1:Blo ckingatDa taLoading

strivingtoidentifyopport unities toshortenqueryresponse time.Onedimension ofthise ortfocuse sonimprovingthe datascanthroughput,suc hasmemoryc aching,pa ralleliza- tion,anddataco mpression.Ano therdimensi onistoreduce thedataa ccess.For example,columnar-oriented datalayout preventsthequeriesfromacc essingi rrelevantcolumns,a nd samplingprovidesapproximat eanswersbyscanningo nlya smallsubsetofdata. Alongtheseli nes,there hasrecently beenanincreas inginte restinreducingdataaccessthro ugh dataskipping[1,6,35, 37].Int elligentl yskippingblocksof tuplescansignific antlyspee dupthequeryprocessing.

1.1Background

Traditionally,dataskippinghaslongbeenimple mented

viapartitio npruning.Inadatawarehouseenviro nment, manytablesare partitionedbytimea ndmost querieshave ati merangefilter.Aque rycanchec kthetimerangeso f thepartit ionsanddecidewhichpartitions toscan andwhich toskip. Whilethisisane ectivewaytoprunedata,t he remainingpartitionscans tillcontainalotoftuple s.

Recentsystems[28,1 ,34,7,6,37,35]supportsk ipping

datablocks.Abl ock inthesesystem sisaho rizontalparti- tionthatisfairl ysmall(e. g.,10 00'sor10,000 'softuples).

Eachblockisas sociatedwit hsomem etadatasuchasmin

andmaxv alues.Before scanningablock,dataski ppingfirst evaluatesthequeryfilteragainst thismet adataandthen decidesiftheblockca nbesk ipped,i. e.,theblockdoes not needtobea ccesse d.Thesal ientfeaturesofdataskipping includeavoidingrando mdiskaccessandincurringm inimal storageandmaintenanceo verhead[ 28,34,6,35].Block skippingspeedsuptable scansbyaccessing les sdata,which isbene ficialtobothdisk-andmemory -reside nttables.

1.2Goals

Thee ectivenessofblockskippingdependson howthe datatuplesare partitionedint oblocks .Werefertothis t6t5t4t3t2t1 publisher='google' revenue6<60

F1F2F3

weight502010 vector (F1,$F2,$F3) t6t5t4t3t2t1 blocking t1"(0,1,0) t4"(1,1,0) t2"(0,0,1) t5"(0,1,1) t3"(0,0,0) t6"(1,0,0) P1" (1,1,0) P2" (0,1,1) P3" (1,0,0) (a)tuples (b)features(c) vectors (d)blocks

Figure2:Ex ampleofBloc king

processasblocking.Curre ntsystemsado ptverysmallblock sizesfordataskipping .Forex ample,IBMDB 2BLU[35] uses1,000-tupleblocks;Google' sPowerDrill[6]suggest s

50,000tuples; Shark[37]skipsdataonHDF Sblocks ,each

ofwhich is128MBbydefault .These systemsrelyon range partitioningtogeneratesuchbloc ks.Whil erangepartition- inghasbe enusefulfo rmanypurposes, itmaynotbe ideal forgenera tingfine-grainedblocksforski pping.Specifically, rangepartiti oninglacksofaprincipledwayof:( 1)setting thefine-gra inedrangesoneachcolumntha tmatchesthe dataskewa ndworkloadskew ,(2)al locatingthenumber ofparti tionsfordi erentcolumnsand( 3)capturinginter- columndatacorrelati onandfilterc orrelation. Inthi spaper,wepro poseaworkload- drivenblo ckingtech- nique,withagoa lof(horizonta lly )partitioningtheda tainto fine-grained,balance-sizedblocksinawaythatq ueriesc an maximallyskiptheblocks.Thisis ano ineproce ssthatex- ecutesatdataloading timeand maybere-e xecutedlaterto accountforamorerecentw orkloa d.Fi gure1depicts theuse ofbloc kinginanETLprocess.No tethat ourblock ingtech- niquescanco-e xistwitht raditionalhorizontalpartitioning techniques,asthesetechniquesma ybeused foradi erent purpose,suchasload balancingandro ll-in/ro ll-outopera- tions.Specifically, ourtechniquescanbeappliedtofurther segmenteachindividualparti tion.Asshownin Figure1,we takeasinputthedat atuplesa ndaquery log,and writethe blockedtuplestothestora geengine;ifthe incoming tuples arepartit ioned,wecanblockeachpartitio nindividua lly(in parallel).Anewly-insertedparti tio ncanbeblockedonits ownandwil lnota ectexist ingpartitions. First,weextractso mefilter predicatesasfeaturesfroma pastquerywork loadusingfreq uentitemsetmining.Wethen generatefeaturevectorsb yprecomputingthese filterpredi- catesonthedataands olvean optimi zationproble mtoguide thedata blocking.Inm anyreal-worldworkloads,e spe cially thereport ingandscheduledworkloads, simil arqueriesare repeatedlyrunondi erenttime-range ddata.Wealsoan- alyzereal-worldwo rkloadsinSection2,whichshowt hat (1)asmal lse tofrepresentativefil tersarecom monlyuse d byman yqueriesand(2)m anyqueriesuserecurring filters. Thesefindingssugges tthatourwo rkload-drivenapproach canbee ectiveforrealquerywork loads. Somepreviousw orkalsoutilizeswo rkloadsforph ysical databasedesign,e.g.,[1 7,21,12,10].Spe cifically,ourwork isrelat edtomaterializedvi ewsele ction(MVS)[16,10].Like MVS,wee xploitpre computation.However,o urpartition- ingtechnique sareatthephysical-recordl eveland arecom- plementarytomaterializedvie ws.Infa ct,ourtechniques canbeapplie dtopart itionlargemateria lized views,e.g., datacubes. Aswewillshowshortl y,wem aintai nconcise feature-basedmetadataderivedfrompreco mputation.An- otherproposedda taskippingtechniqueinvol vestheuseo f smallmaterialize daggregates(SMAs)associatedwithpar- titions[28,34].These SMAshavebeensho wntoimprove queryperforma nceinrange-partitionedsyste ms.Inc on- trast,ourworkisfoc usedonc onstructing fine-graine dpar- titionsthatmoreclose lycapturethe accesspatternso fcom- plexanalyti csworkloads.Likemateriali zedviews,SMAsare alsocomplemen tarytoourapproachandinfactcouldbeim- plementedonourpartitionsaswe ll.We deferthede tailed discussionofrelatedworktoSec tion8.

1.3Example

Supposewearegive natable asshow ninFigure2(a),an

examplelogofonlineev ents.Wefirs tlooka tthelogof queriesthatwerepose donthista bleandextrac tasetof features,eachofwhichi sarepresentati vefilter withpos- siblymultiplec onjunctivepredicates.Suppos ethefeatures extractedareasshowninFigure2 (b).Giv enthese features, wethent ransformthedata tuplesintofeatureve ctors.T his processcanbedoneby scanningtheta bleoncea ndbatch- evaluatingthefeaturesoneachtupl e.Assho wninFig- ure2( c),eachfeaturevect oris(inthisc ase)a3-dimensi onal bitvect or,whosei-thbitindica teswhet herthistuplesat- isfiesfilterFi.Inpra cti ce,thenumberoffeaturescan be keptsmall, e.g.,<50.Wethen partitio nthetuplesac cord- ingtothe sevec tors.Intuitivel y,tuplesthatdonotsatisfy thesame featuresshouldbe placedinthesamebl ocksuch that,whenaqueryus esoneofthe sefe aturesasfilte r,this blockoftuplesc anbesk ippedaltogethe r.Anexampl eof theresult ingblockedtuplesisshow ninFigure2(d).For eachblock,we computeaunionvector bytaki ngabitwise ORofall thefeatureve ctorsin it.Ifthei-thbitofthe union vectoris0,thenwekno wthatno tupleint hisblocksatis- fiesfeature i.In this case,anyq uerywhosefilterisFican skipthisblock. Forexample ,aqueryonF3canskipthe blocksP1andP3.More generall y,aquerycanskipblocksif itsfilterissubsume dby(i.e.,isstrictertha n)somefe atures.

Forexample ,aquerywithfilterevent='buy' !product=

'jeans'issubsume dbybothfeaturesF1andF2,whi chlead tothesk ippingofP2andP3respectively.

1.4Contributions

Toreali zethisdesign,weaddressa fewtechnical chal- lengesasoutlinedbe low. FeatureSelection.Indeed,selec tingtherightfeatures toguidepa rtitioningis critical.Wedevelopaworkloadan- alyzertoident ifyrepresentativefilters asfeaturesfroma querylog.We considerafeaturere present ativeifitcouldbe usedtohelpm anyqueri es.Ifsomefil terpredicatesa refre- quentlyusedtogether,we shouldcombi nethesepredicates asonefea turetosk ipdatamoree ectively,e.g.,featureF3 inFig ure2.Tocapturebot hfrequencyandco-occurrenceof filters,wemodelthefea turesele ctionasafreque ntitem set miningproblem.Duet otheirsubsumptionrela ti ons,som e featurescanberedundant .Forexampl e,revenue>0co uld beredundant ifthereisalready afeature revenue>100.

Wedevel opaprincipledwaytoel imi nateredundancy.

OptimalPartitioning.Givenasetoffeat ures, wecom-

puteafeat urebit -vectorforea chtuple.Theproblemthen istofinda noptima lpart itioni ngoverthesevectors.Thi s isclea rlyahardproblem,asdi erentfeaturesmay becon- flicting,maybecorrelat ed,andmayhav edi erentselec- problem:givenadesire dnumberoftuplespe rblo ck,find apart itioningoveracollectionof tuples(represent eda s bitvec tors)thatmaximizesthenumb eroftuples thatcan beskipp ed.Thisobjectiveisfundam entallydi erentfrom otherwell-kno wnpartitioningobjectives,suchask-means anddista nce-basedclustering[29].WeprovethatBalanced

MaxSkipisNP-ha rd,byareductionfromthe hypergra ph

bisectionproblem[24].Weconjec turethatk-MaxSkip,a variantwithoutthebalance constraint,isals oNP-hard. To findan approxima tesolutione ciently,weadopttheclas- sicbottom -upclusteringframework,asitnatura llyincor- poratesourobjectivefunct ionand isawidely-understood frameworkwithscalablei mplementatio ns,e.g.,[38,5]. Scalability.Itis prohibiti velyexpensivetorunaclus- teringalgorithm onlargedatasets.Fortunate ly,weo bserve thattheinputsi zecanb ereduced fromthenumberoft uples tothenum berofdi stinctfeaturevecto rs.Thela ttermostly dependsonthenumb eroffea turesandc anbesmall(e.g. , <10k)in practic e.AsshowninSection7,alt houghwe runa sophisticatedpartitioningalgorithmonthev ectors,thecost bottleneckoftheentireblockingproc essiss tillt heactual datamoveme ntinsteadofthepartitioningalg orithm.

Weprotot ypeourblockingtechniqueson Shark[37 ],an

open-sourcedatawarehousesystem .Weconducte xperi- mentsonTPC-Hbe nchmarkand areal-worldad-hocw ork- loadfromavideos treaming com pany.Theresults showthat ourtechni quesreducethedataaccessbya factorof4- 7over existingskippingtechniqueso ntopofrangepartiti oning. Wealso demonstratethatt hisreductioncandirectlytrans- latetoareductio ninquery res ponsetime,onbothdisk and memoryresidentdata .Specifically,weimprov ethequeryre- sponsetimeby5-14 xoverfulltabl escansw ithoutskipping andby2 -5xove rexistingskippi ngtechniques.

Theremai nderofpaperisorganized asfollo ws.Sec-

tion2givesan ove rviewofourblocking approach.Se ction3 presentstheworkloadana lyzer.W ediscussthepartitioner inSecti on4.Weshowhowskippingw orksduring queryex- ecutioninSection5.Sec tion6 discussesthepracticalis sues. Wereport ourexperimenta lresult sinSection7.Section8 reviewstherelatedwork andSecti on9concludesthepaper.

2.OVER VIEW

Inth issection,wefirs tdiscusstheassumptionsforour techniquesandthengiveanove rviewo ftheworkflow.

2.1Workload Assumptions

Weextra ctrepresentativefilters asfeaturesfromthework- loadtoguidethebl ocking. Inorderforour approacht owork well,weexpecttw oproperti esoftheworkload: (a)FilterCommonalit y(b)Filter Stability

Figure3:FilterAnaly sison RealAd-hocWorklo ads

FilterCommonali ty.Weexpe ctthatthereisasmall

setoffilters thatare commonlyusedby manyquerie s.If eachqueryusesa distinctfilte r,thenitwo uldb edi cultto findre presentativefilters.

FilterStability .Sincewebaseo urblocki ngdecision

onapast wo rkload,itis importantthatmostofthefil ters infutureque rieshave occurredbefore.Inothe rwords,we expectthatmostoffilte rsarerecurringandonlya small portionareentirelynew overti me. Obviously,thecommonalityandsta bilityo ffilterscanbe observedinrecurringschedule dorrepo rtingqueries.Such queriesareusuallyge neratedfro mtemplatesandthes ame setoffilters wouldbe repeatedlyused onthedataofdi erent timeranges.Thee ectivenessofusingpastqueriestogui de databasedesignwasalso evidencedinmanypre viousworks, e.g.,[10,21, 17,12]. Weconduct edempiricalanalysiso nareal-worldproduc- tionSQLworkloa dfromavi deostreamingcompanycalle d Conviva.These8664ad-hocqueri es,spanningtheperi od from06/2 010to01/2012,wereusedfo rproblemdi agno- sisanddataa nalytic soverana ccesslogofvideostreams. Notethateachq ueryusesposs iblymultiplefilt erpredicates . Foreachpre dicate,e.g.,event='click',we counti tsfre- quency,i.e.,howman yqueriesuseit.InFi gure3(a) ,we sortthefilte rsbydes cendingfrequencyandplott hecumu- lativepercentageoft hequeriesthatusethefilters.Apoint (x,y)ind icatesthatthemostfrequentx%fil tersareused byy%of queries .Wecanseethatthefilter usagei shighly skewed,e.g.,10%oft heuniquefilterswereused by90%of thequerie s.Thisimpliesthatusing only10%of filtersas featurescanbenefit90 %ofthequeri es.

Wethene xaminetheque riesintheorderastheyarri ve.

Topreven touranalysisfrombei ngbiase dtowardsaparticu- larstarting point,wedividedthe8 664queriesinto5disjoin t timewindowsandpl ottedfivecurves.Fi gure3(b)s howsan averageoverthesefivecurv es.Apoint(x,y)me ansthat,as wehaves eenx%of thequeri es(orx%wo rkloadprefix),the filtersinthesex%a reusedby y%of alltheq ueriesinthe workload.Ifeveryqueryusedcom ple telynewfilters,i .e., thereisnorecurri ngfilter,t hisc urvewouldbeafunct ion ofy=x.T heplot,ho wever,showsthat manyqueriesuse recurringfilters.Inpart icular,80%oftheen tire workload usesthefilters thatalre adyoccurinthe3 0%prefix.Since thefilters arerecurring,wecan infertha tmostofthefuture filtersarepredictabl ebasedo napastworkload.

2.2TheBlocking Workflow

Havingdescribedourwork loadassumptions,wenowove r- viewtheblocki ngoverflow ,asdepictedinFigure4. This workflowconsistsof threestandarddatamarshalings teps (shadedarrows)andtw oimportantmodules namedthe work- loadanalyzera ndthepartitioner. (2) featurization (1) workload analyzer (3) reduction (vector, blk-id) map workload (vector, count) pairs (4) partitioner features (5) shu!e data tuples (vector, tuple) pairs blocked tuples block catalog (6) catalog update

Figure4:TheBl ockingW orkflow

Theinputis atable andawork loa drepresentedas acol- lectionofqueriesonthist able.The queryworkloadcanb e obtainedbycollectingq uerylog softhesystem.Wenow walkthroughtheindi vidualstepsint heworkflow : (1)Worklo adAnalyzer.Theworkl oadanalyzer(Sec- tion3)extract saseto ffeaturesfromthequeryworkload. Afea tureisarepresent ativefil ter withpossiblymultiple predicates. (2)Featur ization.Giventhefeatures,i. e.,filte rs,from

Step1,wes canthet ableonce andbatchevalua tethese

filtersforeachtupl e.Thiss teptransformsea chtupletoa (vector,tuple)-pair. (3)Reduct ion.Sinceourblo ckingonlydep endsonthe featurevectors,no ttheactualtuples,wegroup-bythe(vec - tor,tuple)- pairsinto(vector,count)-pairs .Thisisanim- portantsteptoreducethe inputsizeforthepa rtitione r. (4)Partit ioner.Thepartit ioner(Section4)runsapar- titioningalgorithmonthe(v ector,count)-pairs.Thisge n- eratesablockingmap(fromafeaturev ect ortoablockid) . (5)Shu e.Inth isstep,eachof theaugmentedt uples (outputofStep3)finds itsdest inationblo ckbyco nsult ing theblock ingmap(outputofStep4). (6)Catal ogUpdate.Weaddthe featuresa ndone unionvecto r(e.g.,Figure2(d))for eachblocktotheblock catalog(Section5).Afterthisste p,wecandropt hefeature vectors,andtheblockedtuple sarerea dytoserv equeries.

Theabov eworkflowcanbeexe cutedasano

inepro- cessatdatalo adingtim eandmaybere -executedlatert o accountforworkloadchang es.Int heeventthatthedata arrivalrateishighor thatthenewda taneedst obequeried immediately,theblockingprocesscanbepos tponed.A s statedinSection1,i nmanyda tawarehouseapplications , tablesarepartitioned bytim eranges,andnewtuplesare typicallybatch-insertedasanew partition.Wecancon- siderourblocking asa"se condary"partitioningschem eun- dereach suchcoarse-grai nedpartitio n.Forexample,whena newpartit iondt='1994-11-03'isaddedt othetablelogs:

ALTERTABLElogs ADDPARTITION (dt='1994-11-03') ...

wecani nvokeourblo ckingprocessonthisnew lyinsert ed partitionwithouta ectingexistingdata .Tore-blockahor- izontallypartitionedtable,weca nblockeachpartitionin- dividuallyinparallel. Havingoutlinedthew orkflow,inthefollowingse ctions , wewil ldiscussthedi erentcomponent sindetail.

3.WORKLO ADANALYSIS

Thegoa lofworkloadana lysisi stoextractfeaturesfrom thequeryt race.Wefoll owtwointuitions. First,w eshould userepres entativefiltersasfeaturestoguidetheblocking . Afea tureisrepresentat iveif itcanpotentiallyhelpalarge numberofqueriesski pdata.Se cond,thefilterpredicat es thatarefrequen tlyused togethershouldbeconsidere dasonequotesdbs_dbs7.pdfusesText_13