This will cover core SQL concepts at a high level that should be learned after getting a rundown of the basics from A Precis of SQL and perhaps PostgreSQL which both cover realistic use cases.

Table of Contents

  1. Relationships
    1. [[#Relationships#Cascading|Cascading]]
  2. Normal Forms
  3. Sublanguages
    1. [[#Sublanguages#Summary|Summary]]
    2. [[#Sublanguages#DDL - Data Definition Language|DDL - Data Definition Language]]
    3. [[#Sublanguages#DML - Data Manipulation Language|DML - Data Manipulation Language]]
    4. [[#Sublanguages#DQL - Data Query Language|DQL - Data Query Language]]
  4. Subqueries

Relationships

1:1 (one to one) - Like a person and a passport
1:M (one to many) - Like a person and their orders
M:N (many to many) - Like students enrolled in classes

A primary key uniquely identifies a row in the table. A foreign key references the primary key of another table. The table with the primary key being referenced is considered the parent table, whereas the table with a foreign key is the child table because it depends on the existence of an entry in the parent table.

Cascading

Understanding the parent-child relationship is crucial for understanding this. As we said, this relationship forms a dependency, so that if the parent does an action like deleting an entry, we’d like to respond to it (cascade) to avoid orphaning an entry in the child table.

In other words, the cascade happens in one direction only: parent action → child reaction. Never the reverse.

The two actions are ON DELETE and ON UPDATE. The reactions are CASCADE to apply the same action as the parent, SET NULL, SET DEFAULT, and RESTRICT/NO ACTION to block.

E.g. ON DELETE CASCADE - If parent is deleted, delete this entry. If parent is updated, update this entry.

Normal Forms

1NF - Eliminates duplicate records

  • All columns must contain atomic values—i.e. a Phone Numbers column should not contain multiple phone numbers in one row.
  • No duplicate rows.
  • No duplicate column names.
    2NF - Eliminates partial dependency
  • Every non-key column must fully depend on the entire primary key.
  • Emphasis on the entire key, not parts of it (when composite).
    3NF - Eliminates transitive dependency
  • Columns shouldn’t depend on other columns, only the key.
  • E.g. if you need to get advisor_room using advisor found in student_id, you should probably split into two separate tables.
    • Otherwise, updating advisor_room will require finding all entries and updating them accordingly.
      BCNF (3.5NF) - Every determinant must be a superkey

Sublanguages

Summary

DDL - Data Definition Language

  • Defines the data structure/schema
  • ALTER TABLE users ADD COLUMN nickname TEXT;
    DML - Data Manipulation Language
  • Insert, update, delete records
  • INSERT INTO users (email) VALUES ('a@b.com');
    DCL - Data Control Language
  • Grants or revokes permissions to certain roles
  • GRANT SELECT ON orders TO analytics_role;
    DQL - Data Query Language
  • For reading information, often with filtering or aggregation
  • SELECT date(created_at) FROM users;
    TCL - Transaction Control Language
  • Defines boundaries for concurrent operations; in other words bundling operations so they succeed/fail together, like a funds transfer
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

DDL - Data Definition Language

Assuming the caller has permissions…

  • CREATE
    • Can be used to create objects like a database, user, table, index, trigger, function, view
  • DROP
    • Destroy anything made with CREATE
  • ALTER
    • Add, drop, or modify characteristics of an object (data types, columns, constraints)
  • RENAME
  • TRUNCATE
    • Remove all data while preserving shape/structure of table
  • COMMENT

Examples

CREATE DATABASE IF NOT EXISTS IAM;
 
CREATE TABLE IF NOT EXISTS permissions (
        id BIGINT PRIMARY KEY,
    categoryId BIGINT NOT NULL,
    name VARCHAR(30) NOT NULL UNIQUE,
    INDEX(categoryId, name)
);
 
ALTER TABLE permissions ADD CONSTRAINT fk_permissions_category_id FOREIGN KEY (categoryId) REFERENCES permission_categories(id);

About constraints

  • NOT NULL
  • UNIQUE - no duplicates
  • PRIMARY KEY - implies UNIQUE, NOT NULL, and creates an index
  • FOREIGN KEY - links the columns to another tables primary key
  • CHECK - validates any condition, similar to WHERE
  • DEFAULT - default value when none is provided
  • ON UPDATE/ON DELETE - controls cascading behavior for foreign key changes

Constraints can be defined at the column-level AND table-level.

DML - Data Manipulation Language

For changing existing data or schemata…

  • INSERT
    • Format is INSERT INTO <table> (col1, col2...) VALUES (val1, val2...), ()...
    • e.g. INSERT INTO roles (id, name) VALUES (1, 'ADMIN'), (2, 'OWNER'), (3, 'EDITOR'), (4, 'VIEWER');
  • UPDATE
    • Format is UPDATE <table> SET col_val='new_val' [WHERE condition]
    • e.g. UPDATE permissions SET categoryId=1 WHERE name='PAYROLL';
  • DELETE
    • Format is DELETE FROM <table> [WHERE condition]
    • DELETE FROM roles WHERE name='VIEWER';

DCL - Data Control Language

Regarding rights and permissions of users on database objects.

  • GRANT
  • REVOKE
    The format for these commands is <GRANT|REVOKE> PRIVILEGES ON object FROM user, with the following privileges:
  • SELECT
  • INSERT
  • DELETE
  • UPDATE
  • INDEX
  • CREATE
  • ALTER
  • DROP
  • ALL - All permissions except grant
  • GRANT - Allows GRANT statements
    E.g.
REVOKE SELECT, INSERT, UPDATE, DELETE ON posts FROM 'community.manager';

TCL - Transaction Control Language

A sequence of one or more SQL operations treated as a single unit of work, the point is for them to complete in their entirety (commit) or have no effect whatsoever, not in between. For example, a bank transaction—debit account A, credit account B.

  • START TRANSACTION/BEGIN
  • COMMIT - Makes all pending changes in a transaction permanent by saving it
  • ROLLBACK - Restores data to the last save point, unless no save point exists then the last committed state
  • SAVEPOINT - An intermediate point within a transaction that can be rolled back

E.g.

START TRANSACTION;
UPDATE bankaccounts SET funds = funds - 50 WHERE account_no = ACC1 AND funds > 50;
UPDATE bankaccounts SET funds = funds + 50 WHERE account_no = ACC2;
COMMIT;

DQL - Data Query Language

For searching, aggregating, joining, filtering, and grouping existing data…

  • SELECT
    • That’s it.

The format for this is somewhat complicated at first, but generally follows this structure. Learn this in more detail at A Precis of SQL.

SELECT [ALL | DISTINCT]
    select_expr [, select_expr] ...
    [into_option]
    [FROM table_ref]
    [WHERE where_condition]
    [GROUP BY {col_name | expr | position}]
    [HAVING having_condition]
    [ORDER BY {col_name | expr | position}]
        [ASC | DESC]
    [LIMIT {[offset,] row_count | row_count OFFSET offset}];

Subqueries

Convenient and clear, but almost always outperformed by a join. Can be placed in three main places…

  • WHERE - Nested Query
    • Placed in the conditional portion of the outer query.
    • e.g. SELECT a.id, a.name, b.evalName, b.mark FROM students a, evals b WHERE a.id = b.studentId AND b.evalName = 'quiz 1' AND b.mark > (SELECT mark FROM evals WHERE evalName = 'quiz 1' AND studentId = 2);
  • FROM - Inline Query
    • Great for essentially creating a temporary table.
    • e.g. SELECT a.name, b.evalName, b.mark FROM students a, (SELECT studentId, evalName, mark FROM evals WHERE mark > 90) b WHERE a.id = b.studentId;
  • SELECT - Inner Query
    • Great for creating a temporary column.
    • e.g. SELECT a.id, a.name, (SELECT AVG(mark) FROM evals WHERE studentId = a.id GROUP BY studentId) avg FROM students a;i

Joins

  • Inner Join — return rows where both tables match on the join condition.
  • Left Join — return all left-table rows + matching right-table rows; unmatched right becomes NULL.
  • Right Join — return all right-table rows + matching left-table rows; unmatched left becomes NULL.
  • Full Outer Join — return all rows from both tables; non-matching sides filled with NULLs.
  • Theta Join — join using any comparison operator ( <, >, <=, >=, <>, not just = ).
  • Cross Join / Cartesian Join — pair every row in table A with every row in table B (no condition).
  • Self Join — join a table to itself to relate rows within the same table (e.g., employees to managers).

Set Operations

Combines the result of two queries, provided that the order, data, and number of columns between them are the same.

  • UNION — Like a set union, combines the results (EXCLUDING duplicates)
  • UNION ALL — Like a set union, combines the results (INCLUDING duplicates)
  • INTERSECT — Like a set intersection, only includes COMMON rows between the two
  • MINUS/EXCLUDE — Like a set difference, removes B rows found in A.

Indexes

The go-to solution for optimization. Like using a bookmark to quickly find a page, it reduces the amount of data that needs to be scanned.

While it improves reading/query performance, there is an overhead on write operations. In your average database (e.g. a weather data db updated once a month), reading operations tend to dominate writing operations which is why this becomes a fairly important concept. Contrast this to a financial ticker or event logging database where constant writes on a database could potentially blow up.

Rule of thumb: Try indexing columns you often find in a WHERE, JOIN, ORDER BY, or GROUP BY clause.

Two types:

  • Clustered: Rows are stored in order of the index key, limit 1 clustered index.
  • Non-clustered: The index key contains a sorted order of the data, but the actual rows are stored somewhere else from the index.

Database Objects

SEQUENCE
Used to create custom sequences, possibly for use as a primary key.

CREATE SEQUENCE example_1
AS INT
START WITH 10
INCREMENT BY 10; 

TRIGGER
Runs in response to an event on the database.

CREATE TRIGGER [trigger_name] 
    [BEFORE | AFTER]  
    {INSERT | UPDATE | DELETE}  
    ON [table_name]  
    [FOR EACH ROW]  
    [trigger_body] 

PROCEDURE
A set of statements that are named and stored to be executed at a future point in time.

CREATE PROCEDURE UpdateEmployeeSalaries(IN PercentageIncrease DECIMAL(5,2))
BEGIN
    -- Updating the salaries
    UPDATE Employees
    SET Salary = Salary * (1 + PercentageIncrease / 100);
 
    -- Logging the update
    INSERT INTO SalaryUpdateLog (UpdateDate, PercentageIncrease)
    VALUES (NOW(), PercentageIncrease);
END

FUNCTION
The same as a procedure except that it returns a result.

CREATE FUNCTION get_balance(acc_no INT)
RETURNS INT
DETERMINISTIC
BEGIN
   DECLARE acc_bal INT;
   
   SELECT order_total 
   INTO acc_bal 
   FROM sample2.orders 
   WHERE customer_id = acc_no 
   LIMIT 1;
      
   RETURN acc_bal;
END