DaWaK'07

50 %
50 %
Information about DaWaK'07

Published on August 23, 2007

Author: alvesrco

Source: slideshare.net

Mining Top-K Multidimensional Gradients Department of Informatics School of Engineering University of Minho PORTUGAL Ronnie Alves, Orlando Belo and Joel Ribeiro 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2007) 3-7 September 2007, Regensburg, Germany

Outline Introduction Top-K Gradients Gradient Ascent Gradient-based Cubing Evaluation Study Conclusions Mining Top-K Multidimensional Gradients

Introduction

Top-K Gradients

Gradient Ascent

Gradient-based Cubing

Evaluation Study

Conclusions

Gradients Consider a telecom fact table T Phone (e.g., p1,p2, p3) DateTime (e.g., 10/02/2006 10am, 10/03/2006 14pm, 10/04/2006 16pm) Origin (e.g., Porto, Braga, Lisbon) Destination (e.g., Porto, Lisbon, Regensburg) TypeCall (e.g., Local, International) Cost Duration *Introduction Mining Top-K Multidimensional Gradients How is the average of duration call affected By age , origin, weekday in cubes with at least 1000 customers and where the average of duration calls is between 300s and 720s ? > It goes (75%) up for middle-age and people in Porto area on Monday. Typical Cubegrade “how” query Imielinski et al DMKD’02, vol.6

Consider a telecom fact table T

Phone (e.g., p1,p2, p3)

DateTime (e.g., 10/02/2006 10am, 10/03/2006 14pm, 10/04/2006 16pm)

Origin (e.g., Porto, Braga, Lisbon)

Destination (e.g., Porto, Lisbon, Regensburg)

TypeCall (e.g., Local, International)

Cost

Duration

Gradients (A=a1, B=b1, C=c1) (A=a1, B=b1, C=c1, D=d1) (A=a1, B=b1) (A=a1, B=b1, C=c2) roll-up(C) drill-down(D=d1) mutate(C=c2) cubegrade operations Even when considering only iceberg cells , It may still generate a very large number of pairs . > Mining gradients with constraints: a) significance , b) probe and c) gradient > LiveSet-Driven strategy Constrained Gradients Mining Top-K Multidimensional Gradients Dong et al TKDM’02, vol.16 *Introduction

Gradients Back to table T Phone (e.g., p1,p2, p3) DateTime (e.g., 10/02/2006 10am, 10/03/2006 14pm, 10/04/2006 16pm) Origin (e.g., Porto, Braga, Lisbon) Destination (e.g., Porto, Lisbon, Regensburg) TypeCall (e.g., Local, International) Cost Duration *Introduction Mining Top-K Multidimensional Gradients Find the Top-K highest changes situations related to average of duration call originated in the Porto area during the week . > Find maximum gradient regions (MGRs) in the cube that maximize the task of mining Top-K gradient cells . Top-K Gradient Query Alves et al DaWaK’07

Back to table T

Phone (e.g., p1,p2, p3)

DateTime (e.g., 10/02/2006 10am, 10/03/2006 14pm, 10/04/2006 16pm)

Origin (e.g., Porto, Braga, Lisbon)

Destination (e.g., Porto, Lisbon, Regensburg)

TypeCall (e.g., Local, International)

Cost

Duration

What’s New with Top-K Gradients An effort made to enrich multidimensional data analysis A new rank-aware materialization handling gradients Top-K query processing based on rank gradient cells Partitioning based on a gradient ascent approach DFS traversal order according to spreading factors of aggregating functions (GR tree ) High-dimensional cubing (TopK gr -Cube) *Introduction Mining Top-K Multidimensional Gradients

An effort made to enrich multidimensional data analysis

A new rank-aware materialization handling gradients

Top-K query processing based on rank gradient cells

Partitioning based on a gradient ascent approach

DFS traversal order according to spreading factors of aggregating functions (GR tree )

High-dimensional cubing (TopK gr -Cube)

Gradient Regions *Top-K Gradients Mining Top-K Multidimensional Gradients countXY( ) sumXY( ) avgXY() convex non-convex gradient region (GR) > Avg() is an algebraic function and It also has an unpredictable spreading factor regarding its distribution value > There are also sets of GRs to looking for Different shapes of aggregating functions

Gradient Regions How to select gradient regions GR1: Rectangles with all bins GR1 = {[4,7]:[6,5] GR2: Rectangles with minimum bin and maximum bin GR2 = {[3,5]:[4,4]} Mining Top-K Multidimensional Gradients *Top-K Gradients GR1 GR2 We expect that GRs with largest aggregating values will provide higher gradient cells

How to select gradient regions

GR1: Rectangles with all bins

GR1 = {[4,7]:[6,5]

GR2: Rectangles with minimum bin and maximum bin

GR2 = {[3,5]:[4,4]}

Definitions *Top-K Gradients Mining Top-K Multidimensional Gradients Base Table closed cell maximal cell maximal probe cell matchable cells A cell cg is said to be gradient cell of a probe cell cp , when they are matchable cells and their delta change, given by Δg(cg, cp)  (g(cg, cp) ≥  ) is true, where  is a constant value and g is a gradient function .

Gradient Ascent Approach Equation 1 Equation 2 *Top-K Gradients Mining Top-K Multidimensional Gradients When evaluating a GR we first search for the maximal probe cells , i.e. the highest aggregating values on it and then calculates its gradients from all possible matchable cells.

Equation 1

Equation 2

Gradient-based Cubing Intuition Given a ranking gradient function Quick locate the most promising gradient regions Evaluate dimensions’ spreading factors Effective retrieval of Top-K gradient cells Approach Cubing Step 1: Build inverted index and value-list indices from base table Step 2: Pruning non-valid iceberg cells Step 3: Calculate spreading factors, create GR tree following these factors Step 4: Pruning non-valid GRs Evaluates Sf(GR i )>= min_sf Top-K Step 5: Partitioning is due projecting GRs (High>>Low) Bin boundaries [min, max] are returned for each partition Step 6: Pruning non-valid Top-K regions Step 7: Select maximal probe cells Step 8: Calculate all Top-K gradients *Top-K Gradients Mining Top-K Multidimensional Gradients

Intuition

Given a ranking gradient function

Quick locate the most promising gradient regions

Evaluate dimensions’ spreading factors

Effective retrieval of Top-K gradient cells

Approach

Cubing

Step 1: Build inverted index and value-list indices from base table

Step 2: Pruning non-valid iceberg cells

Step 3: Calculate spreading factors, create GR tree following these factors

Step 4: Pruning non-valid GRs

Evaluates Sf(GR i )>= min_sf

Top-K

Step 5: Partitioning is due projecting GRs (High>>Low)

Bin boundaries [min, max] are returned for each partition

Step 6: Pruning non-valid Top-K regions

Step 7: Select maximal probe cells

Step 8: Calculate all Top-K gradients

Cubing *Top-K Gradients Mining Top-K Multidimensional Gradients X,Y,Z: Selecting dimensions Value list Inverted index Spreading factors C i ={x1,y3,*}={4} Cuboid cell {1,4} {4} U Count (Ci)=1 Intersect tids aggregating function > Assembling high-dimensional cubes from low-dimensional ones > Follows Frag-Cubing ideas Li et al VLDB’04

*Top-K Gradients Set Enumeration Tree Mining Top-K Multidimensional Gradients Gradient Region Top-K sets Min_sf>0.25, valid GR > Lattice is formed by projecting GR[x1] >> GR[y2] >> GR[z2] > Find local gradients Agg_value Probe cells 1

*Top-K Gradients Mining Top-K Multidimensional Gradients 2 Projecting probe cells GR[x1] >> GR[y3] Top-K sets Matchable links Bin x1 = [1,4] Min_avg>2.7, valid Top-KGR

*Top-K Gradients Mining Top-K Multidimensional Gradients 3 Projecting probe cells GR[x1] >> GR[z1] Top-3 = {i, L, j} {x1,y2,*} -> {x1,y3,*} {x1,*,z3} -> {x1,*,z1} {x1,*,*} -> {x1,y3,*} Top-K sets Matchable links That’s it!!

Mining Top-K Gradients Summarizing… *Top-K Gradients Mining Top-K Multidimensional Gradients Min_sf Min_avg

Summarizing…

Min_ sf pruning effects *Evaluation Study Mining Top-K Multidimensional Gradients Datasets Running time(s) Min_sf() D2 D1

Min_avg pruning effects *Evaluation Study Mining Top-K Multidimensional Gradients Datasets D2 D1 Running time(s) Min_avg()

K effects *Evaluation Study Mining Top-K Multidimensional Gradients Running time(s) K-cells D2

General pruning effects *Evaluation Study Mining Top-K Multidimensional Gradients D1 & D2 Valid cells Min_sf() K=5, avg>200 1.3M cells 420K cells 200s 170s 1Gb Ram 1M GRs

Conclusions Top-K Gradients Cubing based on partitioning gradient regions Raking based on gradient ascent + variance Current status Extended to transaction-based Top-K gradients More pruning, <>datasets -> <>densities Future work Raking gradients by semi-online computation Explore other partitioning strategies Support selection with several gradients -> the Top-K prioritization approach High-dimensional Top-K gradients

Top-K Gradients

Cubing based on partitioning gradient regions

Raking based on gradient ascent + variance

Current status

Extended to transaction-based Top-K gradients

More pruning, <>datasets -> <>densities

Future work

Raking gradients by semi-online computation

Explore other partitioning strategies

Support selection with several gradients -> the Top-K prioritization approach

High-dimensional Top-K gradients

Mining Top-K Multidimensional Gradients QUESTIONS??? Department of Informatics School of Engineering University of Minho PORTUGAL Ronnie Alves, Orlando Belo and Joel Ribeiro 9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2007) 3-7 September 2007, Regensburg, Germany Web : http://alfa.di.uminho.pt/~ronnie/

Frag-Cubing Curse of dimensionality ABCD ABC ABD ACD BCD AC BC AD BD CD A D B C AB Partition dimensions into several groups Materialize low dimensional cuboids offline Assembly high dimensional cuboids online Mining Cube Approach [Li et al, VLDB’04] *Top-K Gradients Mining Top-K Multidimensional Gradients

Curse of dimensionality

Add a comment

Related presentations

Related pages

9th International Conference on Data Warehousing and ...

9th International Conference on Data Warehousing and Knowledge Discovery (DaWaK 2007) Regensburg, Germany 3-7 September 2007
Read more

DEXA 2007

DEXA 2007 Regensburg, Germany 3-7 September 2007 . Conferences. ... DaWaK '07; 8th International Conference on Electronic Commerce and Web Technologies ...
Read more

The Special Issues of Data Warehouse and Knowledge ...

The Special Issues of Data Warehouse and Knowledge Discovery (DAWAK’07) Tho Manh Nguyen,, iennaienna niversity of Technology,niversity of Technology ...
Read more

An Annotation Management System for Multidimensional Databases

An Annotation Management System for Multidimensional Databases. In Song, I.-Y., Eder, J. and Nguyen, T.M., editors : DaWaK 07: ...
Read more

Computing JoinAggregates over Private Tables

Computing JoinAggregates over Private Tables Rong She1, Ke Wang1, Ada Waichee Fu2, Yabo Xu1 1 School of Computing Science, Simon Fraser University, Canada
Read more

A Novel Similarity-based Modularity Func- tion for Graph ...

A Novel Similarity-based Modularity Func-tion for Graph Partitioning Zhidan Feng 1, Xiaowei Xu , Nurcan Yuruk1, Thomas A.J. Schweiger 2 The Department of ...
Read more

A UML Profile for Representing Business Object States in ...

A UML Profile for Representing Business Object States in a Data Warehouse⋆ Veronika Stefanov and Beate List Women’s Postgraduate College for Internet ...
Read more

What-if Analysis for Data Warehouse Evolution

What-if Analysis for Data Warehouse Evolution George Papastefanatos1, Panos Vassiliadis2, Alkis Simitsis3, and Yannis Vassiliou1 1 National Technical ...
Read more