multidimensional databases .fr

Example. – Dimensional Tables. • Product, Location, Time. – Fact Table. • World Sales ... Fact Table is Normalized (No Redundancies) ... Constellation Schema.
165KB taille 6 téléchargements 408 vues
MULTIDIMENSIONAL DATABASES

Dominique Laurent Université de Cergy-Pontoise

OLTP vs. OLAP • On-Line Transaction Processing ● ● ● ●

Current, Up-to-date Day to day operations Application Oriented Flat Relational Tables of medium size

• On-Line Analytical Processing ● ● ● ●

Historical, Non Volatile Decision Support Subject Oriented Multidimensional processing

Multidimensional Model World Sales ($b) Paris Islamabad Bandung Kuala Lumpur

Cell

Bread Wine Milk Dimensions

2006

2005

2004

2003

Flour

Measure

Members

Conceptual Level • Two Kinds of Tables – Several Dimensional tables – One Fact Table

• Example

World Sales

– Dimensional Tables • Product, Location, Time

Location

– Fact Table • World Sales

Product Time

Conceptual Level • Additional Knowledge – Hierarchies on Dimensions

• Example – Product < Category – Day < Month < Year – City < Country < Continent

Star Schema • Specific Features – Sizes: Dimensional Tables 4 ISB KL

7 8

6

2003

2004

4

7

3

8 6

Wine Milk

8

6

1

5

Flour

6

9

Wine

8 6

Milk Flour

5 2005 2006

9

6

2006

1

2005

5

3

Bread

5

Bread

2003 2004

Paris ISB BDG KL 4 2

Projection Time, Product

4

7

3

8 6

Wine Milk

8

6

1

5

Flour

Wine Milk

38 47 76 98

Flour

85 46 55 63

75 84 72 49 2005 2006

6

24 48 52 25

2004

9

3

Bread

2003

Bread

2006

1

2005

5

2003 2004

Paris ISB BDG KL 4 2

Roll-Up Location: City < Continent EU Asia 56 28 32 40

Bread

47 28 64 42 38 56 50 65

Milk Flour

7

3

8 6

Wine Milk

8

6

1

5

Flour

2005 2006

Wine

4

2004

65 43 26 72

6

2003

9

3

2006

1

2005

5

Bread

2003 2004

Paris ISB BDG KL 4 2

Efficiency Issues • Joins with the Fact Table – Join indexes – Bit map indexes

• Computation of Aggregations – Pre-compute and store intermediate aggregations – Cube by operator: pre-computation of projections

Further Issues • « Sorting » a cube according to the measure values – Without changing its content – Problem: no standard algorithm

• Summarizing a data cube – Change the content – Identify sub-cubes whose measure values are « almost » the same

Sorting a Cube

3

4

Wine Milk

6

5

1

3

9

7

4 6

Flour

8

4

1

2

2005

2006

Increasing Bottom-Up Left-Right

Milk Bread Wine Flour

4

6

7

9

3

4

6

7

1

3

5 6

1 2

4

8 2003

6

2004

7

2005 2006

Bread

2003 2004

• Sort the Data Cube According to Criteria on the Measure Values

– Achieved based on Switch – Polynomial algorithm under restrictions

Optimal Representations • Sorting is not always possible • With null values, no polynomial algorithm to compute the « best » representation – According to the number of misplaced cells

• Use AI techniques to reach one solution – Hill climbing: at each step the best solution is chosen – Genetic algorithm

Data Cube Summarization • Compute areas of the data cube in which the sales have (almost) the same value If CITY ∈ [C1, C2] And PRODUCT ∈ [P1, P2] Then SALES is 6

• Apply Apriori over the set of dimensions to optimize the scans of the Data Cube