advertisement

s433 bu

60 %
40 %
advertisement
Information about s433 bu
Entertainment

Published on October 29, 2007

Author: Rafael

Source: authorstream.com

advertisement

MDL Summarization with Holes:  MDL Summarization with Holes Shaofeng Bu Laks V.S. Lakshmanan Raymond T. Ng University of British Columbia, Canada Introduction:  Introduction Multi-dimensional OLAP queries typically produce data intensive answers Often the question is: how to express the large answer set of cells that satisfy the OLAP query conditions: Simple enumeration: accurate but not necessarily the most intuitive; Summaries: not (necessarily) 100% accurate but can be more intuitive and informative. Summarized answers can be more easily understood OLAP Data Cube Example:  OLAP Data Cube Example clothes New York Vancouver Edmonton San Jose San Francisco Chicago Minneapolis Boston Summit Albany northwest midwest northeast location jackets tops women’s jeans blouses skirts formal wear men’s jeans dress pants ties dress skirts women’s men’s Each dimension is associated with a hierarchical tree OLAP Data Cube Example:  OLAP Data Cube Example clothes New York Vancouver Edmonton San Jose San Francisco Chicago Minneapolis Boston Summit Albany northwest midwest northeast location jackets tops women’s jeans blouses skirts formal wear men’s jeans dress pants ties dress skirts women’s men’s Data Cell: (c1,c2), c1,c2 are leaf-nodes in axis-trees, e.g. (Vancouver, ties) Data Region: describes all data cells covered by given nodes in the axis-trees, (x1, y1), e.g.: (Vancouver, ties) (Vancouver, women’s) (northwest, women’s) OLAP Data Cube Example:  OLAP Data Cube Example clothes New York Vancouver Edmonton San Jose San Francisco Chicago Minneapolis Boston Summit Albany northwest midwest northeast location jackets tops women’s jeans blouses skirts formal wear men’s jeans dress pants ties dress skirts women’s men’s Blue cells: the cells that satisfy the query conditions; How to find a summary of the blue cells in a data cube? MDL Summarization:  MDL Summarization MDL: Minimum Description Length Use regions to cover the blue cells; Length of an MDL description is the number of included regions and cells; MDL is to find the description with the minimum length. An Example of MDL Summarization:  An Example of MDL Summarization clothes New York Vancouver Edmonton San Jose San Francisco Chicago Minneapolis Boston Summit Albany midwest northeast location jackets tops women’s jeans blouses skirts formal wear men’s jeans dress pants ties dress skirts women’s men’s northwest A Motivating Example: A New Case:  A Motivating Example: A New Case clothes New York Vancouver Edmonton San Jose San Francisco Chicago Minneapolis Boston Summit Albany northwest midwest northeast location jackets tops women’s jeans blouses skirts formal wear men’s jeans dress pants ties dress skirts women’s men’s Can we do better? :  Can we do better? Yes! We present a new compression approach: MDL with Holes: Identify regions with blue cells, even if they contain non-blue cells; Express the included blue cells by using regions with the exception of the covered non-blue cells; Non-blue cells are called holes. A Motivating Example: MDL with Holes:  R1-(Vancouver,Skirts) R9-(Boston,ties) -(New York, dress skirts) R3-(Vancouver,Skirts) A Motivating Example: MDL with Holes clothes New York Vancouver Edmonton San Jose San Francisco Chicago Minneapolis Boston Summit Albany northwest midwest northeast location jackets tops women’s jeans blouses skirts formal wear men’s jeans dress pants ties dress skirts women’s men’s MDL with Holes: Length = 6+3+3=12 MDL Approach: Length is 18 Problem Statements:  Problem Statements MDL with Holes (MDLH) is to find a description with holes that has the minimum length and the maximum benefit. In practice, we can drill down on regions to get additional details. Definitions: Length & Benefit:  Definitions: Length & Benefit Given a set B of data cells (blue cells), an MDLH description for B: D=S – H , S is a set of data regions, H is a set of data cells, also called ‘holes’, D covers exactly the data cells in B. Length: total number of the included regions and cells in the description. |D|=|S|+|H| Benefit : how much shorter is the MDLH summary than the enumeration of B. Benefit (D) = |B| – | D| B1={a, b, c} D1= s – d |D1|=2 Benefit(D1) = |B1| - |D1| = 1 B2={e, g} D2= t – f – h |D2| = 3 Benefit(D2)= |B2| - |D2| = -1 Related Work:  Related Work The Generalized MDL Approach for Summarization, Laks V.S. Lakshmanan, Raymond T. Ng et al., VLDB 2002 Reduce description length by allowing non-blue cells to be covered in the regions The regions are not pure. Concise Descriptions of Subsets of Structured Sets, Alberto O. Mendelzon & Ken Q. Pu, PODS 2003 Allow Cartesian products to be formed; Not purely hierarchical: NP Completeness result is less surprising; What about the pure hierarchical? Intelligent Rollups in Multidimensional OLAP Data, Gayatri Sathe and Sunita Sarawagi, VLDB 2001 Only report consistent generalization: A tuple can be generalized along a set of dimensions only if it can be generalized along all subsets of dimensions. Outline :  Outline Introduction to MDL with Holes A motivating example 1-D Case: MDLH is Tractable 2-D Case: MDLH is NP-Complete Heuristics A Greedy Heuristic Dynamic Programming Quadratic Programming Experimental Results Summarization on Holes: An Extension Conclusions & Contributions 1-D Case: MDLH is Tractable:  ‘x’ D1= x – d – f – j Benefit(D1) = 7 – 4 = 3 D2=(s – d ) + e + ( u – j ) Beneift(D2) = 7 – 5 = 2 ‘y’ D3 = y – m – p – q – r Benefit(D3) = 4 – 5 = -1 D4 = ( v – m ) + o , Benefit(D4) = 4 – 3 = 1 ‘z’ D5 = z – d – f – j – m – p – q – r Benefit(D5) = 11 – 8 = 3 D6=(x – d – f – j)+( v – m + o ) Benefit(D6) = 11 – 7 = 4 1-D Case: MDLH is Tractable MDLH is Tractable: the Optimal MDLH description, which has the maximum benefit, can be generated in polynomial time in 1-D case. Outline :  Outline Introduction to MDL with Holes A motivating example 1-D Case: MDLH is Tractable 2-D Case: MDLH is NP-Hard Heuristics A Greedy Heuristic Dynamic Programming Quadratic Programming Experimental Results Summarization on Holes: An Extension Conclusions & Contributions 2-D Case: Optimality is not Preserved Any More:  1 2 3 4 5 6 7 8 a b c d e f g i (c,8),(d,8),(e,8) 4 0 rows length benefit (f,8),(g,8) 3 2 (a,8),(b,8) 5 -2 columns length benefit (i,1) 3 2 (i,5) 5 -2 2-D Case: Optimality is not Preserved Any More Optimal Solution: {(c,8)+(d,8)+(e,8)+(i,2)+(i,e)+(i,4)} -{(c,2)+(c,3)+(c,4)+(d,2)+(d,3)+(d,4) +(e,2)+(e,3)+(e,4)} +(f,1)+(g,1)+(f,6)+(g,7) Length = 19 Benefit = 28-19 = 9 MDLH is NP-Hard in 2-D Case:  MDLH is NP-Hard in 2-D Case It is NP-Hard to find the optimal MDLH description in 2-D data cube; Not a Trivial Proof: Details are in the paper; Reduction Strategy: Maximum Induced Subgraph in Complete Edge-Weighted(CEW) Bipartite Graph MDL with Holes Outline :  Outline Introduction to MDL with Holes A motivating example 1-D Case: MDLH is Tractable 2-D Case: MDLH is NP-Hard Heuristics A Greedy Heuristic Dynamic Programming Quadratic Programming Experimental Results Summarization on Holes: An Extension Conclusions & Contributions Heuristics for MDLH:  Heuristics for MDLH Greedy Each time, choose the row/column with the most benefit Dynamic Programming A bottom-up method to get the description of a region from the descriptions of its children regions Quadratic Programming Using a quadratic function to represent the benefit of a 2-d data cube Example for Comparison with Heuristics:  Example for Comparison with Heuristics The optimal description for this example: (e,1)-(a,1)+(e,2)-(b,2)+(e,3)-(b,3)+(d,4)+(b,5) +(e,6)+(e,8)+(a,11)-(a,8) Length = 12 Benefit = 8 Heuristics: A Greedy Heuristic:  Heuristics: A Greedy Heuristic region length benefit holes (e,6) 1 3 - (d,10) 2 2 (d,5) (e,1) 2 1 (a,1) (e,2) 2 1 (b,2) (e,3) 2 1 (b,3) (a,11) 2 1 (a,8) (e,8) 2 1 (a,8) (c,10) 3 0 (c,4)(c,5) Greedy: Why it is not optimal?:  Greedy: Why it is not optimal? Description from Greedy A selection of row/column may reduce more total benefit Heuristics: Dynamic Programming:  Heuristics: Dynamic Programming L: The Length of a Region S: Selection of Rows & Columns (a,10) : (a,2) + (a,3) L(a,10)=2, S(a,10)=‘t2’ (e,4) : (d,4) L(e,4)=1, S(e,4)=‘t1’ (d,10): (d,10) – (d,5) L(d,10)=2, S(d,10)=‘g’ t1 t2 Heuristics: Dynamic Programming(2):  Heuristics: Dynamic Programming(2) S (e,12)=‘t2’ S (e,11)=‘t2’ S (e,10)=‘t2’ Generated Description: (e,1)-(a,1)+(e,2)-(b,2)+(e,3)-(b,3)+(d,4)+(b,5) +(e,6)+(a,7)+(e,8)-(a,8)+(a,9) The length is 13 and the benefit is 20-13 = 7 D(x1,x2):description for region (x1,x2) t1 t2 Dynamic Programming: Why it is not optimal?:  Dynamic Programming: Why it is not optimal? Description by Dynamic Programming Optimal Description Misses the combination of rows and columns Heuristics: Quadratic Programming:  Use variables to represent rows/columns; for a variable v: v=1: the corresponding row/column is selected; v=0: the corresponding row/column is not selected; f = – Benefit( D) Maximizing the benefit is to minimize the value of f For the previous example, quadratic programming generates the optimal description; Optimality is not guaranteed. Heuristics: Quadratic Programming Outline :  Outline Introduction to MDL with Holes A motivating example 1-D Case: MDLH is Tractable 2-D Case: MDLH is NP-Hard Heuristics A Greedy Heuristic Dynamic Programming Quadratic Programming Experimental Results Summarization on Holes: An Extension Conclusions & Contributions Experiments:  Experiments We ran a set of experiments on the TPC-H benchmark data set; We compared the three MDLH heuristics with MDL and GMDL. Experimental Results: Comparison of All Methods:  Experimental Results: Comparison of All Methods Compression Ratio: MDLH-Quadratic generates the most concise descriptions: a yardstick of quality; MDLH-Dynamic is a very close second. Experimental Results: Compression Ratio:  Experimental Results: Compression Ratio The more children per parent node, the greater the benefit Experimental Results: Summary :  Experimental Results: Summary Running time & Scalability: MDLH-Greedy is the fastest; MDLH-Dynamic runs slower than MDLH-Greedy, but it is still scalable w.r.t. the number of cells; Outline :  Outline Introduction to MDL with Holes A motivating example 1-D Case: MDLH is Tractable 2-D Case: MDLH is NP-Hard Heuristics A Greedy Heuristic Dynamic Programming Quadratic Programming Experimental Results Summarization on Holes: An Extension Conclusions & Contributions Extension: Summarization on holes:  As the blue density becomes high, a large part of the MDLH description is made up of holes. Can we further reduce the total length by summarizing ‘Holes’? MDLH description is: (a,11)-{(a,6)+(a,8)+(a,9)} +(d,11)-{(d,6)+(d,7)+(d,8)} +(b,6)+(c,8) Total length is 10. Summarization on holes: (a,6)+(a,8)+(a,9) = (a,10)-(a,7) (d,6)+(d,7)+(d,8) = (d,10)-(d,9) After summarization on holes: (a,11) - { (a,10) - (a,7)} +(d,11) - { (d,10) - (d,9)} +(b,6) + (c,8) Total length is 8. Extension: Summarization on holes Conclusions & Contributions:  Conclusions & Contributions We present a new method, MDLH, to compress the answers of OLAP queries; We present a bottom-up algorithm for 1-d cube; We proved the NP-Hardness of the MDLH problem; We provided three heuristics for MDLH: greedy, dynamic programming, and quadratic programming; We extended the summarization on holes to further reduce the total length; We did a set of experiments on the TPC-H benchmark data to compare the heuristics. On going work:  On going work Based on the summarization on blue cells and summarization on holes, build a visualization tool with MDLH summarization: Return summarized answers to user’s queries; Provide drill down operation for users: Browse details on blue cells Browse details on holes Design k-approximation algorithm for MDLH: What is the best quality we can guarantee?

Add a comment

Related presentations

Related pages

LEGEND ONLİNE S433 BÜYÜCÜ OAAA- ws- YAYCI KİNGTÜRKE ...

server açıldığından beri çekişiriz kardeşimle 1 o kazanır bir ben bu ... S433 BÜYÜCÜ OAAA- ws ... YAYCI KİNGTÜRKE KARŞI adlı ...
Read more

PowerPoint Presentation - VLDB 2005 * Welcome

MDL Summarization with Holes Shaofeng Bu Laks V.S. Lakshmanan Raymond T. Ng University of British Columbia, Canada Introduction Multi-dimensional OLAP ...
Read more

becker.raumsysteme - S433 Staplerkabine - Onlineshop

S433 Hallenbüro auf Gabelstaplerpodest - becker.raumsysteme® 4 - seitig, Abmessung: L x T = 3.045 mm x 3.045 mm, lichte Raumhöhe 2.510 mm, Außenhöhe 2 ...
Read more

becker.raumsysteme - Hallenbüros-Staplerpodest - Onlineshop

S433 Staplerkabine, 4-seitig, 3.045 mm x 3.045 mm. EUR 3.960,00 . zzgl. 19 % USt zzgl. Versandkosten. S443 Staplerkabine, 4-seitig, 4.045 mm x 3.045 mm.
Read more

Bu®ering Carbon Nanotube Interconnects Considering ...

Bu®ering Carbon Nanotube Interconnects Considering Inductive E®ects¤ Jia Wang†, Lin Liu‡, Yuchen Zhou§ and Shiyan Hu¶ Department of Electrical and ...
Read more

Configure the Windows Firewall to Allow SQL Server Access

The Windows Firewall item in Control Panel allows you to configure basic options. These include the following: Turning the Windows Firewall item ...
Read more

Sommerkursprogramm ab 23.06.14 · MTV Ludwigsburg

S433 6x 18:30 – 19:30 Pilates GymH S434 6x 19:30 – 20:30 Pilates GymH S456 12x 17:00 – 18:00 Zumba® Kids GymR S454 12x 18:00 – 19:00 Zumba ...
Read more