Query new

Distributed Database Management Systems Query Optimization Dr. Eng. Haiyan HOUSROUM Query Optimization

Views 76 Downloads 0 File size 642KB

Recommend stories

ASM Query

ASM Query ========= 1. Check if ASM is healthy? crsctl check cssd If ASM was never initiated, use the following: /u01/ap

72 2 18KB Read more

Encuesta Query

46 2 2MB Read more

Image Intent Query Labeling: Objective

Image Intent Query Labeling Objective The purpose of this task is to identify whether a given query will be satisfied by

1 0 310KB Read more

Apuntes Power Query

91 27 1MB Read more

Recomendaciones Power Query

0 0 789KB Read more

jBASE Query Language

143 4 729KB Read more

Sap Query - Sq01!10!03

9 0 254KB Read more

New

Table of Contents 1. Introduction 1.1. Purpose 1.2 . Audience, Definitions, Acronyms and Abbreviations 1.2.1. Audience D

124 1 384KB Read more

New

56 1 17MB Read more

New

89 0 129KB Read more

Author / Uploaded
Reham Hankol

Citation preview

Distributed Database Management Systems Query Optimization Dr. Eng. Haiyan HOUSROUM

Query Optimization

The number of relations in the query The number of operations to be performed The number of predicates applied The size of each relation in the query The order of operations to be performed, The existence of indexes The number of alternatives for performing each individual operation

Query Optimization In a distributed system, there are other factors The fragmentation details for the relations The location of these fragments/tables in the system The speed of communication links connecting the sites in the system. The overhead associated with sending messages and the overhead associated with the local processing speed increase exponentially as the number of available alternatives increases. It is generally acceptable to merely try to find a “good” alternative execution plan for a given query, rather than trying to find the “best” alternative.

Query Optimization

Bank Database CUSTOMER (CID, CNAME, STREET, CCITY); BRANCH (BNAME, ASSETS, BCITY); ACCOUNT (A#, CID, BNAME, BAL); LOAN (L#, CID, BNAME, AMT); TRANSACTION (TID, CID, A#, Date, AMOUNT);

Computing Selection

No Index on R and Not Sorted (Sorted): Scan all disk blocks B+Tree index on R ◦ Simple predicates ◦ Complex predicates Hash Index on R

Computing Join

Nested-Loop Join Nested-Loop Join with Indexes Sort-Merge Join

Query Processing in CS.

Minimize the query response time. Maximize the parallelism in the system. Maximize the system throughput. Minimize the total resources used (amount of memory, disk space, cache, etc.)

Query Processing in CS.

Query Processing in CS.

We want to retrieve the name of all customers who have one or more accounts in branches in the city of Edina. Select c.Cname From Customer c, Branch b, Account a Where c.CID = a.CID AND a.Bname = b.Bname AND b.Bcity = ‘Edina’;

Query Processing in CS.

Query Optimization PJcname (Customer NJN Account)NJN (Account NJN (SLBcity = ‘Edina’(Branch))) PJcname (Customer NJN (Account NJN (SLBcity=‘Edina’ (Branch))))

Query Processing in CS. Unary operator (Uop) is commutative: Uop1(Uop2(R)) ≡ Uop2(Uop1(R)) For example, SLBname = ‘Main’ (SLAssets > 12000000 (Branch) ≡ SLAssets > 12000000 (SLBname = ‘Main’ ((Branch))

Query Processing in CS. Unary operator is idempotent: Uop((R)) ≡ Uop1(Uop2((R)) For example, SLBname = ‘Main’ AND Assets > 12000000 (Branch) ≡ SLBname = ‘Main’ (SLAssets > 12000000 (Branch)

Query Processing in CS. Binary operator (Bop) is commutative except for set difference: R Bop1 S ≡ S Bop1 R For example, Customer NJN Account ≡ Account NJN Customer

Query Processing in CS. Binary operator is associative: R Bop1 (S Bop2 T) ≡ (R Bop1 S) Bop2 T For example, Customer NJN (Account NJN Branch) ≡ (Customer NJN Account) NJN Branch

Query Processing in CS. Unary operator is distributive with respect to some binary operations: Uop(R Bop S) ≡ (Uop(R)) Bop (Uop(S)) For example, SLsal > 50000 (PJCname, sal (Customer) UN PJEname, sal (Employee)) ≡ (SLsal > 50000 (PJCname, sal (Customer)) UN ((SLsal > 50000 (PJEname, sal (Employee)))

Query Processing in CS. Unary operator can be factored with respect to some binary operation: (Uop(R)) Bop (Uop(S)) ≡ Uop(R Bop S) For example, SLsal > 50000 (PJCname, sal (Customer)) UN (SLsal > 50000 (PJEname, sal (Employee))) ≡ SLsal > 50000 (PJCname, sal (Customer)) UN (PJEname, sal (Employee))

Query Processing in CS.

Alt1: (SLBname = ‘Main’ (Account)) NJN Branch Alt2: Branch NJN (SLBname = ‘Main’ (Account)) Alt3: (SLBname = ‘Main’(Branch)) NJN Account Alt4: Account NJN (SLBname = ‘Main’(Branch)) Alt5: SLBname = ‘Main’ (Account NJN Branch) Alt6: SLBname = ‘Main’ (Branch NJN Account) Alt7: (SLBname = ‘Main’ (Account)) NJN (SLBname = ‘Main’(Branch)) Alt8: (SLBname = ‘Main’(Branch))NJN(SLBname = ‘Main’(Account))

Query Processing in CS.

Query Processing in CS.

There are 500 customers in the bank. On average, each customer has two accounts. There are 100 branches in the bank. There are 10 branches in Edina city. Ten percent of customers have accounts in the branches in Edina. 1000 accounts in the bank, and 50 customers have accounts in the branches in Edina resulting in 100 accounts in that city. t: units of time to process each tuple of each relation in memory.

Query Processing in CS.

Query Processing in DS.

Query Processing in DS.

Query Processing in DS.

EMP relation is horizontally fragmented based on the value of the LOC attribute. Each employee works at one of three possible locations (LA, NY, or MPLS). We will consider a query that needs to retrieve the name of all employees who make more than $50,000.

Query Processing in DS.

Global Query: PJEname (SLsal > 50000 (EMP)) LA’s Query: PJEname (SLsal > 50000 (LA_EMP)) Ny’s Query: PJEname (SLsal > 50000 (NY_EMP)) MPLS’s Query: PJEname (SLsal > 50000 (MPLS_EMP))

Query Processing in DS.

PJEname (SLsal > 50000 (LA_EMP)) UN PJEname (SLsal > 50000 (NY_EMP)) UN PJEname (SLsal > 50000 (MPLS_EMP))

Query Processing in DS.

PJEname (SLsal > 50000 (LA_EMP)) UN ( PJEname (SLsal > 50000 (NY_EMP)) UN PJEname (SLsal > 50000 (MPLS_EMP)) )

Query Processing in DS.

SLsal > 50000 ( PJEname (LA_EMP) UN ( PJEname (NY_EMP) UN PJEname (MPLS_EMP) ) )

Query Processing in DS.

Three relations: A, B, C A JN B JN C Three-site distributed system. A is not fragmented, it is replicated, there are two copies. B is horizontally fragmented into B1, B2, and B3. C is vertically fragmented into C1 and C2. A JN B JN C ≡ A JN (B1 UN B2 UN B3) JN (C1 JN C2)

Query Processing in DS.

Operation Shipping Data Shipping ◦ OODB

Hybrid Shipping

Query Processing in DS.

Distributed Query Solution Space Reduction ◦ Apply Select and Project as Soon as Possible

EMP (EmpID, Name, LOC, SAL, DoB, Dept); Create MPLS_EMPS

AS SELECT * FROM EMP WHERE LOC = ‘Minneapolis’;

Create LA_EMPS AS SELECT * FROM EMP WHERE LOC = ‘LA’;

Create NY_EMPS AS SELECT * FROM EMP WHERE LOC = ‘NY’;

Select Name From EMP Where LOC = ‘LA’ and Sal > 30,000.

Query Processing in DS.

Query Processing in DS.

Query Processing in DS.

Distributed Query Solution Space Reduction ◦ Simplify Operations on Horizontal Fragments ◦ Perform Operations Where Most of the Data Is ◦ Simplify Join Operations ◦ Use Semi-join to Reduce Communication Cost