Pobieranie prezentacji. Proszę czekać

Pobieranie prezentacji. Proszę czekać

Dzielenie relacyjne / Relational Division

Podobne prezentacje


Prezentacja na temat: "Dzielenie relacyjne / Relational Division"— Zapis prezentacji:

1 Dzielenie relacyjne / Relational Division
Bazy i hurtownie danych TWO1 2009/2010 dbdw/students/dbdw-winter_ / Based on: V.M. Matos, R. Grasser, Assessing performance of the relational division operator. Data Base Management

2 Bazy i hurtownie danych
Relational Division The basic operators of the relational algebra: Union (UNION) Difference (MINUS) Cartesian product Projection & selection (SELECT ... FROM...) Additional operators added to the relational algebra: Join  most popular in practice Rename (renaming fields) Intersection Division 7/11/2009 Bazy i hurtownie danych

3 Bazy i hurtownie danych
Relational Division The division operator is less common than select-project-join queries, however, it is applicable to many common queries: Find students who have taken all the core source courses Find customers who have ordered all items from a given line of products The division operator can be also employed in data mining algorithms (e.g., generation of association rules) 7/11/2009 Bazy i hurtownie danych

4 Bazy i hurtownie danych
Informal Definition The division operator allows verifying whether or not a candidate subject is related to each of the values held in the base set. The base set is called the divisor (or denominator T2[B]), and the table holding the subject’s data is called the dividend (or nominator T1[A, B]). The expression T1[A, B]/T2[B] selects the A values from the dividend table T1[A, B], whose B values are a superset of those B values held in the divisor table T2[B]. 7/11/2009 Bazy i hurtownie danych

5 Bazy i hurtownie danych
Informal Definition 7/11/2009 Bazy i hurtownie danych

6 Formal Definition: Relational Algebra
Let’s assume that the numerator table T1 always consists of two columns A and B, and the denominator has only one B attribute. Then, the expression T1[A, B]/T2[B] is semantically equivalent to: T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A] 7/11/2009 Bazy i hurtownie danych

7 Formal Definition: Relational Algebra
7/11/2009 Bazy i hurtownie danych

8 Formal Definition: Tuple-calculus
Using relational tuple-calculus language, the division operator can be rephrased as follows: T1[A, B]/T2[B] = { t1[A] / t1  T1 and for-all t2 (t2  T2  exists t3 (t3  T1 and (t1[A] = t3[A]) and (t2[B] = t3[B]))) } 7/11/2009 Bazy i hurtownie danych

9 Bazy i hurtownie danych
SQL Implementation: Q0 SELECT A FROM T1 WHERE B IN (SELECT B FROM T2) GROUP BY A HAVING COUNT(*) = (SELECT COUNT(*) FROM T2) 7/11/2009 Bazy i hurtownie danych

10 Bazy i hurtownie danych
SQL Implementation: Q1 Byzanthyne approach Based on the formal predicate calculus definition modified to fit SQL: The universal quantifier for-all x (f(x)) replaced by not exists x (not f(x)) The implication X  Y replaced by (not(X) or Y) T1[A, B]/T2[B] = { t1[A] / t1  T1 and not exists t2 (not(not( t2  T2) or (exists t3 (t3  T1 and (t1[A] = t3[A]) and (t2[B] = t3[B]))))) } 7/11/2009 Bazy i hurtownie danych

11 Bazy i hurtownie danych
SQL Implementation: Q1 Previous definition is equivalent (De Morgan’s law) to: T1[A, B]/T2[B] = { t1[A] / t1  T1 and not exists t2 (( t2  T2) and (not exists t3 (t3  T1 and (t1[A] = t3[A]) and (t2[B] = t3[B])))) } 7/11/2009 Bazy i hurtownie danych

12 Bazy i hurtownie danych
SQL Implementation: Q1 SELECT DISTINCT x.A FROM T1 AS x WHERE NOT EXISTS (SELECT * FROM T2 y FROM T1 AS z WHERE (z.A=x.A) AND (z.B=y.B))) 7/11/2009 Bazy i hurtownie danych

13 Bazy i hurtownie danych
SQL Implementation: Q2 Based on the algebraic definition of the division operator and broken into two steps: SELECT DISTINCT y.A, z.B INTO T3 FROM T1 AS y, T2 AS z WHERE NOT EXISTS (SELECT * FROM T1 WHERE (T1.A = y.A) AND (T1.B=z.B)) SELECT DISTINCT A FROM T3 WHERE (T3.A=T1.A)) 7/11/2009 Bazy i hurtownie danych

14 Bazy i hurtownie danych
SQL Implementation: Q3 Similar to Q0, with GROUP BY and HAVING replaced by join: SELECT DISTINCT x.A FROM T1 AS x WHERE (SELECT COUNT(*) FROM T2) = (SELECT COUNT(*) FROM T1, T2 WHERE (T1.A=x.A) AND (T1.B=T2.B)) 7/11/2009 Bazy i hurtownie danych

15 Bazy i hurtownie danych
Zero Division The divide operator is defined in such a way that T1[A,B]\T2[B] produces exactly all A values in T1 each time that T2[B] is either empty or has a zero selectivity with respect to T1[A,B]. An empty set would be a more appropriate answer  this is how Q0 works. 7/11/2009 Bazy i hurtownie danych

16 Bazy i hurtownie danych
Experiment Conduct an experiment with the following settings: number of A-values in T1 = , number of B-values in T1 = 100, number of B-values in T2 = 20, 40, 60, 80, 100. Create appropriate script generating data samples, implement the four queries (Q0…Q3) and test their performance (execution time). Collect the observations in a tabular and graphical form and describe the results. 7/11/2009 Bazy i hurtownie danych

17 Checking Execution Time
Turn SET STATISTICS TIME on (Tools  Options) 7/11/2009 Bazy i hurtownie danych

18 Generacja zbiorów danych (1)
CREATE PROCEDURE [dbo].[FILL_TABLES] @B_COUNT int AS BEGIN BEGIN TRY BEGIN TRANSACTION DELETE FROM T1 DELETE FROM T2 int int = 1 <= 10000 = 1 <= 100 INSERT INTO T1 VALUES( 'a' + AS varchar(10)), 'b' + AS varchar(10))) + 1 END +1 Procedura wypełniająca danymi tablice T1 i T2. Procedura zakłada, że kolumny A i B w obu tablicach są typu VARCHAR. Jedynym parametrem procedury jest liczba rekordów w tablicy T2. W tej wersji wykorzystano jawną obsłguę transakcji (BEGIN ... COMMIT ... ROLLBACK TRANSACTION) wraz z obsługą wyjątków (TRY ... CATCH ...). Nie jest to konieczne dla poprawnego działania procedury, ale poprawia jej efektywność. CREATE PROCEDURE [dbo].[FILL_TABLES] @B_COUNT int AS BEGIN BEGIN TRY BEGIN TRANSACTION DELETE FROM T1 DELETE FROM T2 int int = 1 <= 10000 = 1 <= 100 INSERT INTO T1 VALUES( 'a' + AS varchar(10)), 'b' + AS varchar(10))) + 1 END +1 INSERT INTO T2 VALUES('b' + AS varchar(10))) COMMIT TRANSACTION END TRY BEGIN CATCH IF > 0 ROLLBACK TRANSACTION END CATCH SET STATISTICS TIME ON SET STATISTICS IO ON 7/11/2009 Bazy i hurtownie danych

19 Generacja zbiorów danych (2)
= 1 BEGIN INSERT INTO T2 VALUES('b' + AS varchar(10))) + 1 END COMMIT TRANSACTION END TRY BEGIN CATCH IF > 0 ROLLBACK TRANSACTION END CATCH SET STATISTICS TIME ON SET STATISTICS IO ON 7/11/2009 Bazy i hurtownie danych

20 Poprawianie efektywności procedury
Po dyskusji na temat sensowności wykorzystania BEGIN ... COMMIT TRANSACTION, która miała miejsce na ostatnich zajęciach, sprawdziłem ich wpływ na efektywność procedury (czas jej wykonania). Sprawdziłem 2 warianty procedury – z BEGIN... COMMIT TRANSACTION (wariant 1) i bez (wariant 2). Każdy wariant uruchomiłem 10 razy i sprawdzałem czasy wykonania. Wyniki (wartość średnia i odchylenie standardowe w sekundach) podane są w poniższej tabeli. Wariant 1 Wariant 2 CPU TIME ELAPSED TIME AVERAGE 41.1 43.5 107.4 446.3 STDEV 11.9 12.0 1.6 20.8 Na podstawie tych wyników wydaje się więc, że warto zastosować BEGIN ... END TRANSACTION – łączny czas wykonania procedury (ELAPSED TIME) skraca sie 10-krotnie. 7/11/2009 Bazy i hurtownie danych


Pobierz ppt "Dzielenie relacyjne / Relational Division"

Podobne prezentacje


Reklamy Google