ANIRUDDHA ADAK

Posted on May 27

Demystifying Relational Algebra in DBMS: Operators Explained (with SQL Examples)

#dbms #sql #database #datascience

Database Management Systems (DBMS) are at the heart of modern data storage and retrieval. At a fundamental level, behind the user-friendly interfaces and complex SQL queries, lies a powerful theoretical concept: Relational Algebra.

What is Relational Algebra?

Relational Algebra is a procedural query language that defines the set of operations on relations (tables). It forms the theoretical foundation for SQL (Structured Query Language). Essentially, it allows us to describe how to get a desired result from a database.

Think of it like this:

It indicates the algebraic relationships between tables.
It takes one or two existing relations (tables) as input.
It then produces a new relation (table) as an output, representing the result of the operation.

For instance, if you have two tables, Table 1 and Table 2, Relational Algebra operations can combine them into a single new Output Table containing combined data.

Relational Algebra operators are broadly categorized into two main types:

Basic Operators
Extended Operators

Let's dive into each of these with explanations and practical SQL demonstrations.

1. Basic Operators

These are the fundamental operations that form the bedrock of relational algebra.

A) Union (∪)

The Union operator combines all unique rows from two or more relations (tables) into a single new relation.

Key Characteristics:

Combination: It's the combination of rows from two relations.
Schema Compatibility: Both tables must have the same number of columns, and corresponding columns must have compatible data types.
Duplicate Elimination: By default, the UNION operator ignores duplicate records. If a record exists in both input tables, it appears only once in the result.

Example:
Consider two example tables:

Table 1:

ROLL  NAME
---   ---
101   akhil
102   ankush
103   vivek
104   vikash

Table 2:

ROLL  NAME
---   ---
103   vivek
105   vanshika
106   sourabh

If we perform Table1 UNION Table2, the result would be:

SQL Query:

SELECT * FROM t1
UNION
SELECT * FROM t2;

Output:

ROLL  NAME
---   ---
101   akhil
102   ankush
103   vivek
104   vikash
105   vanshika
106   sourabh

Notice that 103 vivek appears only once, even though it's present in both t1 and t2.

To include duplicate records, you can use UNION ALL:

SQL Query:

SELECT * FROM t1
UNION ALL
SELECT * FROM t2;

Output:

ROLL  NAME
---   ---
101   akhil
102   ankush
103   vivek
104   vikash
103   vivek
105   vanshika
106   sourabh

B) Intersection (∩)

The Intersection operator returns only those unique rows that are present in both relations.

Key Characteristics:

Commonality: It identifies records that are common to all input tables.
Schema Compatibility: Similar to Union, input tables must have compatible schemas.
Duplicate Elimination: It naturally eliminates duplicates.

Example:
Using the same example tables:

SQL Query:

SELECT * FROM t1
INTERSECT
SELECT * FROM t2;

Output:

ROLL  NAME
---   ---
103   vivek

This is because 103 vivek is the only record common to both t1 and t2.

C) Difference (-)

The Difference operator (also known as MINUS in Oracle SQL or EXCEPT in SQL Server/PostgreSQL) returns rows that are present in the first relation but not in the second relation.

Key Characteristics:

Exclusion: It effectively "subtracts" the rows of one table from another.
Order Matters: Table1 - Table2 will yield a different result than Table2 - Table1.
Schema Compatibility: Input tables must have compatible schemas.

Example:
Using the same example tables:

SQL Query (Table1 MINUS Table2):

SELECT * FROM t1
MINUS
SELECT * FROM t2;

Output:

ROLL  NAME
---   ---
101   akhil
102   ankush
104   vikash

These are the records present in t1 but not in t2.

SQL Query (Table2 MINUS Table1):

SELECT * FROM t2
MINUS
SELECT * FROM t1;

Output:

ROLL  NAME
---   ---
105   vanshika
106   sourabh

These are the records present in t2 but not in t1.

D) Cartesian Product (× or CROSS JOIN)

The Cartesian Product (also known as Cross Join) combines every row from the first relation with every row from the second relation.

Key Characteristics:

Multiplication of Rows: If Table 1 has m rows and Table 2 has n rows, the result will have m * n rows.
Combination of Columns: The resulting table will include all columns from both input tables.
No Common Attributes Required: Unlike joins, a common column is not necessary.

Example:
Let's consider Table 1 with 4 rows and Table 2 with 3 rows:

Table 1:

ROLL  NAME
---   ---
101   akhil
102   ankush
103   vivek
104   vikash

Table 2:

ROLL  NAME
---   ---
103   vivek
105   vanshika
106   sourabh

SQL Query:

SELECT * FROM t1
CROSS JOIN t2;

Output (12 rows - 4 * 3):

ROLL  NAME    ROLL  NAME
---   ---     ---   ---
101   akhil   103   vivek
101   akhil   105   vanshika
101   akhil   106   sourabh
102   ankush  103   vivek
102   ankush  105   vanshika
102   ankush  106   sourabh
103   vivek   103   vivek
103   vivek   105   vanshika
103   vivek   106   sourabh
104   vikash  103   vivek
104   vikash  105   vanshika
104   vikash  106   sourabh

This operation creates a comprehensive, but potentially very large, combination of all possible row pairings.

2. Extended Operators

These operators provide more specialized functionalities, often built upon the basic operations.

A) Selection (σ)

The Selection operator is used to filter tuples (rows) from a relation based on a specified condition.

Key Characteristics:

Row Filtering: It selects records (rows) based on conditions.
Horizontal Subset: It produces a horizontal subset of the original relation.

Example:
From our Table 1 example, if we want to select the NAME of the person with ROLL 103:

SQL Query:

SELECT name FROM t1 WHERE roll = 103;

Output:

NAME
---
vivek

This is a direct application of the WHERE clause in SQL.

B) Projection (π)

The Projection operator is used to select specific attributes (columns) from a relation.

Key Characteristics:

Column Filtering: It projects (selects) specific columns.
Vertical Subset: It produces a vertical subset of the original relation.
Duplicate Elimination: It generally removes duplicate rows from the projected result (though SQL's SELECT by default includes duplicates unless DISTINCT is used).

Example:
From our Table 2 example, if we only want to see the NAME column:

SQL Query:

SELECT name FROM t2;

Output:

NAME
---
vivek
vanshika
sourabh

This demonstrates how Projection focuses on particular columns.

C) Division (/)

The Division operator is conceptually one of the more complex relational algebra operations. It aims to identify tuples (rows) in the first relation that are associated with every tuple in the second relation. It's often used for "all of" type queries (e.g., "Find students who have taken all courses").

Key Characteristics:

"All Of" Queries: It helps answer questions where an entity must be related to every instance of another entity.
Indirect SQL Implementation: There isn't a direct DIVIDE BY operator in standard SQL. Implementing true relational algebra division in SQL typically involves more complex combinations of operators (e.g., NOT EXISTS with subqueries, GROUP BY and HAVING COUNT(DISTINCT...)).

Example:
For instance, if you were to look for records in Table 1 where the roll also exists in Table 2, the SQL might look like this:

Table 1:

ROLL  NAME
---   ---
101   akhil
102   ankush
103   vivek
104   vikash

Table 2:

ROLL  NAME
---   ---
103   vivek
105   vanshika
106   sourabh

SQL Query:

SELECT t1.roll, t1.name FROM t1 WHERE t1.roll IN (SELECT roll FROM t2);

Output:

ROLL  NAME
---   ---
103   vivek

This SQL effectively performs an intersection on the roll columns and returns the corresponding records from Table 1. While related to logical set operations, this is not the full theoretical relational algebra division.

D) Joining (⋈)

Joining allows us to combine rows from two or more relations based on a common attribute (column) between them. This is one of the most frequently used and vital operations in relational databases.

Key Characteristics:

Combination based on Commonality: It combines data from multiple tables where there's a logical connection.
Common Attribute Required: Joining always needs a common attribute (or set of attributes) to link two tables.
Various Types: There are different types of joins (Inner Join, Left Join, Right Join, Full Outer Join, Self Join) depending on how you want to handle matching and non-matching rows.

Example:
If Table 1 (students) has Roll and Name, and a Courses table has Roll and CourseName, you can join them on Roll to see which student took which course.

Joining is a fundamental and extensive topic in SQL, with various types and complexities. For an in-depth understanding, it's recommended to explore dedicated resources on SQL Joins.

Conclusion

Relational Algebra is an essential conceptual tool for understanding how databases operate and how data can be manipulated. While you might directly use SQL in your daily work, comprehending the underlying principles of these operators helps you write more efficient, logical, and powerful queries. Mastering these operations is a significant step towards becoming proficient in database management.

DEV Community

Demystifying Relational Algebra in DBMS: Operators Explained (with SQL Examples)

What is Relational Algebra?

1. Basic Operators

A) Union (∪)

B) Intersection (∩)

C) Difference (-)

D) Cartesian Product (× or CROSS JOIN)

2. Extended Operators

A) Selection (σ)

B) Projection (π)

C) Division (/)

D) Joining (⋈)

Conclusion

Top comments (0)