Citation preview

Distributed Database Management Systems Query Optimization Dr. Eng. Haiyan HOUSROUM

Query Optimization       

The number of relations in the query The number of operations to be performed The number of predicates applied The size of each relation in the query The order of operations to be performed, The existence of indexes The number of alternatives for performing each individual operation

Query Optimization In a distributed system, there are other factors The fragmentation details for the relations The location of these fragments/tables in the system The speed of communication links connecting the sites in the system.  The overhead associated with sending messages and the overhead associated with the local processing speed increase exponentially as the number of available alternatives increases.  It is generally acceptable to merely try to find a “good” alternative execution plan for a given query, rather than trying to find the “best” alternative.    

Query Optimization      

Bank Database CUSTOMER (CID, CNAME, STREET, CCITY); BRANCH (BNAME, ASSETS, BCITY); ACCOUNT (A#, CID, BNAME, BAL); LOAN (L#, CID, BNAME, AMT); TRANSACTION (TID, CID, A#, Date, AMOUNT);

Computing Selection  



No Index on R and Not Sorted (Sorted): Scan all disk blocks B+Tree index on R ◦ Simple predicates ◦ Complex predicates Hash Index on R

Computing Join   

Nested-Loop Join Nested-Loop Join with Indexes Sort-Merge Join

Query Processing in CS.    

Minimize the query response time. Maximize the parallelism in the system. Maximize the system throughput. Minimize the total resources used (amount of memory, disk space, cache, etc.)

Query Processing in CS.

Query Processing in CS. 



We want to retrieve the name of all customers who have one or more accounts in branches in the city of Edina. Select c.Cname From Customer c, Branch b, Account a Where c.CID = a.CID AND a.Bname = b.Bname AND b.Bcity = ‘Edina’;

Query Processing in CS.   

Query Optimization PJcname (Customer NJN Account)NJN (Account NJN (SLBcity = ‘Edina’(Branch))) PJcname (Customer NJN (Account NJN (SLBcity=‘Edina’ (Branch))))

Query Processing in CS. Unary operator (Uop) is commutative:  Uop1(Uop2(R)) ≡ Uop2(Uop1(R))  For example, SLBname = ‘Main’ (SLAssets > 12000000 (Branch) ≡ SLAssets > 12000000 (SLBname = ‘Main’ ((Branch)) 

Query Processing in CS. Unary operator is idempotent:  Uop((R)) ≡ Uop1(Uop2((R))  For example, SLBname = ‘Main’ AND Assets > 12000000 (Branch) ≡ SLBname = ‘Main’ (SLAssets > 12000000 (Branch) 

Query Processing in CS. Binary operator (Bop) is commutative except for set difference:  R Bop1 S ≡ S Bop1 R  For example, Customer NJN Account ≡ Account NJN Customer 

Query Processing in CS. Binary operator is associative:  R Bop1 (S Bop2 T) ≡ (R Bop1 S) Bop2 T  For example, Customer NJN (Account NJN Branch) ≡ (Customer NJN Account) NJN Branch 

Query Processing in CS. Unary operator is distributive with respect to some binary operations:  Uop(R Bop S) ≡ (Uop(R)) Bop (Uop(S))  For example, SLsal > 50000 (PJCname, sal (Customer) UN PJEname, sal (Employee)) ≡ (SLsal > 50000 (PJCname, sal (Customer)) UN ((SLsal > 50000 (PJEname, sal (Employee))) 

Query Processing in CS. Unary operator can be factored with respect to some binary operation:  (Uop(R)) Bop (Uop(S)) ≡ Uop(R Bop S)  For example, SLsal > 50000 (PJCname, sal (Customer)) UN (SLsal > 50000 (PJEname, sal (Employee))) ≡ SLsal > 50000 (PJCname, sal (Customer)) UN (PJEname, sal (Employee)) 

Query Processing in CS.        

Alt1: (SLBname = ‘Main’ (Account)) NJN Branch Alt2: Branch NJN (SLBname = ‘Main’ (Account)) Alt3: (SLBname = ‘Main’(Branch)) NJN Account Alt4: Account NJN (SLBname = ‘Main’(Branch)) Alt5: SLBname = ‘Main’ (Account NJN Branch) Alt6: SLBname = ‘Main’ (Branch NJN Account) Alt7: (SLBname = ‘Main’ (Account)) NJN (SLBname = ‘Main’(Branch)) Alt8: (SLBname = ‘Main’(Branch))NJN(SLBname = ‘Main’(Account))

Query Processing in CS.

Query Processing in CS.     





There are 500 customers in the bank. On average, each customer has two accounts. There are 100 branches in the bank. There are 10 branches in Edina city. Ten percent of customers have accounts in the branches in Edina. 1000 accounts in the bank, and 50 customers have accounts in the branches in Edina resulting in 100 accounts in that city. t: units of time to process each tuple of each relation in memory.

Query Processing in CS.

Query Processing in DS.

Query Processing in DS.

Query Processing in DS.   

EMP relation is horizontally fragmented based on the value of the LOC attribute. Each employee works at one of three possible locations (LA, NY, or MPLS). We will consider a query that needs to retrieve the name of all employees who make more than $50,000.

Query Processing in DS.    

Global Query: PJEname (SLsal > 50000 (EMP)) LA’s Query: PJEname (SLsal > 50000 (LA_EMP)) Ny’s Query: PJEname (SLsal > 50000 (NY_EMP)) MPLS’s Query: PJEname (SLsal > 50000 (MPLS_EMP))

Query Processing in DS. 

PJEname (SLsal > 50000 (LA_EMP)) UN PJEname (SLsal > 50000 (NY_EMP)) UN PJEname (SLsal > 50000 (MPLS_EMP))

Query Processing in DS. 

PJEname (SLsal > 50000 (LA_EMP)) UN ( PJEname (SLsal > 50000 (NY_EMP)) UN PJEname (SLsal > 50000 (MPLS_EMP)) )

Query Processing in DS. 

SLsal > 50000 ( PJEname (LA_EMP) UN ( PJEname (NY_EMP) UN PJEname (MPLS_EMP) ) )

Query Processing in DS.       

Three relations: A, B, C A JN B JN C Three-site distributed system. A is not fragmented, it is replicated, there are two copies. B is horizontally fragmented into B1, B2, and B3. C is vertically fragmented into C1 and C2. A JN B JN C ≡ A JN (B1 UN B2 UN B3) JN (C1 JN C2)

Query Processing in DS.  

Operation Shipping Data Shipping ◦ OODB



Hybrid Shipping

Query Processing in DS. 

Distributed Query Solution Space Reduction ◦ Apply Select and Project as Soon as Possible

EMP (EmpID, Name, LOC, SAL, DoB, Dept);  Create MPLS_EMPS 

AS SELECT * FROM EMP WHERE LOC = ‘Minneapolis’; 

Create LA_EMPS AS SELECT * FROM EMP WHERE LOC = ‘LA’;



Create NY_EMPS AS SELECT * FROM EMP WHERE LOC = ‘NY’;



Select Name From EMP Where LOC = ‘LA’ and Sal > 30,000.

Query Processing in DS.

Query Processing in DS.

Query Processing in DS. 

Distributed Query Solution Space Reduction ◦ Simplify Operations on Horizontal Fragments ◦ Perform Operations Where Most of the Data Is ◦ Simplify Join Operations ◦ Use Semi-join to Reduce Communication Cost