Merge Join Algorithm

The merge joins are used for performing natural joins and equi-joins for given relations r and s. We use an algorithm for performing the merge join, known as the Merge Join algorithm. It is also known as a sort-merge-join algorithm.

Merge Join Algorithm

The merge join algorithm is given below:

pr = address of first tuple of relation r;
ps = address of first of relation s;
while (ps!=null && pr!=null) do begin
	 ts = tuple to which ps points;
	 Ss = {ts};
	set ps to point the next tuple of relation s;
	done = false;
	while (!done && ps!=null) do begin
		ts? = tuple to which ps points;
		 if (ts?[JoinAttrs] = ts[JoinAttrs])
			Ss = Ss U {ts?};
			set ps to point the next tuple of relation s;
		    done = true;
   tr = tuple to which pr points;
   while (pr !=null && tr [JoinAttrs] < ts[JoinAttrs]) do begin
	for each ts in Ss do begin
	      add ts   ⋈    tr to result;
	set pr to point nest tuple of r;
          tr = tuple to which pr points;

In the algorithm, there are various terms used:

JoinAttrs: It denotes the attributes in the intersection of r ꓵ s.

r ꓵ s: The r ꓵ s refers to those attributes which are common in relations r and s.

ts ⋈ tr: A concatenated expression of the attributes of ts and tr tuples. It is further followed by projecting out repeated attributes.

ts and tr: These are two tuples having the same value of JoinAttrs.

Ss: It reads those join attributes of a group of tuples of a relation which are having the same values.

In the merge join algorithm, it associates each relation with a pointer. Initially, the pointer points to the first tuple of the relation and then moves towards the next one as soon the algorithm proceeds. Also, the algorithm needs that each set of tuples Ss fits in the memory even if the size of the relation s is large. However, if for some attribute values, Ss seem larger than the available memory size, we can perform block nested-loop join for it. Somehow if the given input relations r and s are not sorted on the join attributes or anyone is unsorted, we need to sort them before applying the merge join algorithm.

Cost Analysis of Merge Join Algorithm

If the relations are sorted and tuples having the same value on the join attributes are placed consecutively. Then we need to read each tuple only once, and thus the block will also be read for once. Thus,

Number of block transfers = br + bs

Also, in both files, the number of block transfers is equal.

Number of Disk seeks = [br/bb] + [bs/bb]

Here, br and bs are the numbers of blocks of the given relations r and s. The term bb means that we are assuming that bb buffer blocks are allocated to both relations. But, we know that data transfer is less expensive than disk seeks, so we should allocate multiple blocks to each given relation. Consequently, it will provide extra memory space too.

Hybrid Merge Join Algorithm

The Hybrid merge join is different from the merge join. In merge join operation, we saw that it is a must to sort the given relations before applying the merge join technique. However, both join attributes consist of secondary indices, then also we can perform a variation of the merge join operations on unsorted tuples too. For doing so, the applied merge join algorithm will scan the records through the indices, which will enable to retrieve the records in a sorted manner. Thus, such variation of the merge join operations leads to a significant drawback, i.e.:

  • It is possible that the records might be placed in different file blocks. It means they might be scattered In several blocks of files. So, for accessing each tuple, we also need to access the particular file block, and it is a costly step.

For preventing ourselves from such expensive access, we use a new technique which is known as 'Hybrid Merge Join' technique. The hybrid merge join operation combines the indices with merge join.

To understand the hybrid merge join operation, let's take an example:

Consider that we have two relations from which one of the relations is sorted, and the other one is unsorted. But the relation which is not sorted contains a secondary B+-tree index on the join attributes.

In such a case, initially, the hybrid merge join process will merge or combine the sorted relation with the leaf entries of the secondary B+-tree index. Then the result file will carry tuples from sorted relation and address for tuples of the unsorted relation. Further, the result file is sorted again on the addresses of unsorted relation's tuples to enable efficient retrieval of the corresponding tuples to complete the join in the sequence of physical storage. Such type of method is known as the Hybrid Merge Join method.

Contact US

Email:[email protected]

Merge Join Algorithm