Dzielenie relacyjne / Relational Division Bazy i hurtownie danych TWO1 2009/2010 https://ophelia.cs.put.poznan.pl/webdav/ dbdw/students/dbdw-winter_2009-10/ Based on: V.M. Matos, R. Grasser, Assessing performance of the relational division operator. Data Base Management
Bazy i hurtownie danych Relational Division The basic operators of the relational algebra: Union (UNION) Difference (MINUS) Cartesian product Projection & selection (SELECT ... FROM...) Additional operators added to the relational algebra: Join most popular in practice Rename (renaming fields) Intersection Division 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych Relational Division The division operator is less common than select-project-join queries, however, it is applicable to many common queries: Find students who have taken all the core source courses Find customers who have ordered all items from a given line of products The division operator can be also employed in data mining algorithms (e.g., generation of association rules) 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych Informal Definition The division operator allows verifying whether or not a candidate subject is related to each of the values held in the base set. The base set is called the divisor (or denominator T2[B]), and the table holding the subject’s data is called the dividend (or nominator T1[A, B]). The expression T1[A, B]/T2[B] selects the A values from the dividend table T1[A, B], whose B values are a superset of those B values held in the divisor table T2[B]. 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych Informal Definition 7/11/2009 Bazy i hurtownie danych
Formal Definition: Relational Algebra Let’s assume that the numerator table T1 always consists of two columns A and B, and the denominator has only one B attribute. Then, the expression T1[A, B]/T2[B] is semantically equivalent to: T1[A, B]/T2[B] = T1[A] – ((T1[A] × T2[B]) – T1[A, B])[A] 7/11/2009 Bazy i hurtownie danych
Formal Definition: Relational Algebra 7/11/2009 Bazy i hurtownie danych
Formal Definition: Tuple-calculus Using relational tuple-calculus language, the division operator can be rephrased as follows: T1[A, B]/T2[B] = { t1[A] / t1 T1 and for-all t2 (t2 T2 exists t3 (t3 T1 and (t1[A] = t3[A]) and (t2[B] = t3[B]))) } 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych SQL Implementation: Q0 SELECT A FROM T1 WHERE B IN (SELECT B FROM T2) GROUP BY A HAVING COUNT(*) = (SELECT COUNT(*) FROM T2) 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych SQL Implementation: Q1 Byzanthyne approach Based on the formal predicate calculus definition modified to fit SQL: The universal quantifier for-all x (f(x)) replaced by not exists x (not f(x)) The implication X Y replaced by (not(X) or Y) T1[A, B]/T2[B] = { t1[A] / t1 T1 and not exists t2 (not(not( t2 T2) or (exists t3 (t3 T1 and (t1[A] = t3[A]) and (t2[B] = t3[B]))))) } 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych SQL Implementation: Q1 Previous definition is equivalent (De Morgan’s law) to: T1[A, B]/T2[B] = { t1[A] / t1 T1 and not exists t2 (( t2 T2) and (not exists t3 (t3 T1 and (t1[A] = t3[A]) and (t2[B] = t3[B])))) } 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych SQL Implementation: Q1 SELECT DISTINCT x.A FROM T1 AS x WHERE NOT EXISTS (SELECT * FROM T2 y FROM T1 AS z WHERE (z.A=x.A) AND (z.B=y.B))) 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych SQL Implementation: Q2 Based on the algebraic definition of the division operator and broken into two steps: SELECT DISTINCT y.A, z.B INTO T3 FROM T1 AS y, T2 AS z WHERE NOT EXISTS (SELECT * FROM T1 WHERE (T1.A = y.A) AND (T1.B=z.B)) SELECT DISTINCT A FROM T3 WHERE (T3.A=T1.A)) 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych SQL Implementation: Q3 Similar to Q0, with GROUP BY and HAVING replaced by join: SELECT DISTINCT x.A FROM T1 AS x WHERE (SELECT COUNT(*) FROM T2) = (SELECT COUNT(*) FROM T1, T2 WHERE (T1.A=x.A) AND (T1.B=T2.B)) 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych Zero Division The divide operator is defined in such a way that T1[A,B]\T2[B] produces exactly all A values in T1 each time that T2[B] is either empty or has a zero selectivity with respect to T1[A,B]. An empty set would be a more appropriate answer this is how Q0 works. 7/11/2009 Bazy i hurtownie danych
Bazy i hurtownie danych Experiment Conduct an experiment with the following settings: number of A-values in T1 = 10 000, number of B-values in T1 = 100, number of B-values in T2 = 20, 40, 60, 80, 100. Create appropriate script generating data samples, implement the four queries (Q0…Q3) and test their performance (execution time). Collect the observations in a tabular and graphical form and describe the results. 7/11/2009 Bazy i hurtownie danych
Checking Execution Time Turn SET STATISTICS TIME on (Tools Options) 7/11/2009 Bazy i hurtownie danych
Generacja zbiorów danych (1) CREATE PROCEDURE [dbo].[FILL_TABLES] @B_COUNT int AS BEGIN BEGIN TRY BEGIN TRANSACTION DELETE FROM T1 DELETE FROM T2 DECLARE @A int DECLARE @B int SET @A = 1 WHILE @A <= 10000 SET @B = 1 WHILE @B <= 100 INSERT INTO T1 VALUES( 'a' + CAST(@A AS varchar(10)), 'b' + CAST(@B AS varchar(10))) SET @B = @B + 1 END SET @A = @A +1 Procedura wypełniająca danymi tablice T1 i T2. Procedura zakłada, że kolumny A i B w obu tablicach są typu VARCHAR. Jedynym parametrem procedury jest liczba rekordów w tablicy T2. W tej wersji wykorzystano jawną obsłguę transakcji (BEGIN ... COMMIT ... ROLLBACK TRANSACTION) wraz z obsługą wyjątków (TRY ... CATCH ...). Nie jest to konieczne dla poprawnego działania procedury, ale poprawia jej efektywność. CREATE PROCEDURE [dbo].[FILL_TABLES] @B_COUNT int AS BEGIN BEGIN TRY BEGIN TRANSACTION DELETE FROM T1 DELETE FROM T2 DECLARE @A int DECLARE @B int SET @A = 1 WHILE @A <= 10000 SET @B = 1 WHILE @B <= 100 INSERT INTO T1 VALUES( 'a' + CAST(@A AS varchar(10)), 'b' + CAST(@B AS varchar(10))) SET @B = @B + 1 END SET @A = @A +1 WHILE @B <= @B_COUNT INSERT INTO T2 VALUES('b' + CAST(@B AS varchar(10))) COMMIT TRANSACTION END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION END CATCH SET STATISTICS TIME ON SET STATISTICS IO ON 7/11/2009 Bazy i hurtownie danych
Generacja zbiorów danych (2) SET @B = 1 WHILE @B <= @B_COUNT BEGIN INSERT INTO T2 VALUES('b' + CAST(@B AS varchar(10))) SET @B = @B + 1 END COMMIT TRANSACTION END TRY BEGIN CATCH IF @@TRANCOUNT > 0 ROLLBACK TRANSACTION END CATCH SET STATISTICS TIME ON SET STATISTICS IO ON 7/11/2009 Bazy i hurtownie danych
Poprawianie efektywności procedury Po dyskusji na temat sensowności wykorzystania BEGIN ... COMMIT TRANSACTION, która miała miejsce na ostatnich zajęciach, sprawdziłem ich wpływ na efektywność procedury (czas jej wykonania). Sprawdziłem 2 warianty procedury – z BEGIN... COMMIT TRANSACTION (wariant 1) i bez (wariant 2). Każdy wariant uruchomiłem 10 razy i sprawdzałem czasy wykonania. Wyniki (wartość średnia i odchylenie standardowe w sekundach) podane są w poniższej tabeli. Wariant 1 Wariant 2 CPU TIME ELAPSED TIME AVERAGE 41.1 43.5 107.4 446.3 STDEV 11.9 12.0 1.6 20.8 Na podstawie tych wyników wydaje się więc, że warto zastosować BEGIN ... END TRANSACTION – łączny czas wykonania procedury (ELAPSED TIME) skraca sie 10-krotnie. 7/11/2009 Bazy i hurtownie danych