Introduction to SQL
Notes from Nick Carchedi's SQL Course
Most of the content (and all images if not specified differently) are taken from Introduction to SQL by Nick Carchedi.
SQL
- SQL stands for Structured Query Language
- It is a language for interacting with data stored in something called a relational database.
- A relational database can be thought of as a collection of tables.
- A table is just a set of rows (aka
records
) and columns (calledfields
), like a spreadsheet, which represents exactly one type of entity. - For example, a table might represent employees in a company or purchases made, but not both.
Selecting columns
Assume we have a table called people
id | name | birthdate | deathdate |
---|---|---|---|
1 | 50 Cent | 1975-07-06 | null |
2 | A. Michael Baldwin | 1963-04-04 | null |
… showing 2 out of 8397 rows
To select the name column we can use the following command:
SELECT name FROM people;
SELECT and FROM are called keywords and by convention they are written in uppercase (even though SQL is not case sensitive)
Select multiple columns
SELECT name, birthdate
FROM people;
Select all columns
SELECT *
FROM people;
Limit the number of returned results
SELECT *
FROM people
LIMIT 10;
Select all the unique values from a column
SELECT DISTINCT name
FROM people;
Count the number of rows
SELECT COUNT(*)
FROM people;
Count the number of non-missing values in a column
SELECT COUNT(birthdate)
FROM people;
Count the number of distinct values in a column
SELECT COUNT(DISTINCT birthdate)
FROM people;
Filtering rows
WHERE
allows you to filter based on both text and numeric values in a table by using the following operators:
=
equal<>
not equal<
less than>
greater than<=
less than or equal to>=
greater than or equal to
WHERE
always comes after the FROM
statement
Assume we have a tbale called films with the following columns:
| id | title| release_year |country | duration | language | certification | budget | |—|—|—|—|—|—|—|—|
…Showing 0 out of 4986 rows
Return all rows satisfying a specific condition
SELECT title
FROM films
WHERE title = 'Metropolis';
In PostgreSQL, you must use single quotes with WHERE
SELECT title
FROM films
WHERE release_year > 2000;
Return all rows satisfying a combination of multiple conditions
SELECT title
FROM films
WHERE release_year > 1990
AND release_year < 2010;
SELECT title
FROM films
WHERE release_year = 1990
OR release_year = 2010;
SELECT title
FROM films
WHERE (release_year = 1994 OR release_year = 1995)
AND (certification = 'PG' OR certification = 'R');
SELECT title
FROM films
WHERE release_year BETWEEN 1994 AND 2000;
is equivalent to
SELECT title
FROM films
WHERE release_year >= 1994
AND release_year <= 2000;
SELECT title
FROM kids
WHERE release_year IN (1956, 1990, 2000, 2010);
NULL
represents a missing or unknown value. Check for NULL values by using the expression IS NULL
.
Count the number of missing birth dates in the people table
SELECT COUNT(*)
FROM people
WHERE birthdate IS NULL;
Select all rows with no missing values in particular column
SELECT name
FROM people
WHERE birthdate IS NOT NULL;
LIKE
(andNOT LIKE
) can be used to search for patterns in a column%
is a wildcard which matches zero, one, or many characters in text_
is a wildcard which matches a single character in text
Select all rows where name in column starts with A
SELECT name
FROM people
WHERE name LIKE 'A%';
Select all rows where name in column do not start with A
SELECT name
FROM people
WHERE name NOT LIKE 'A%';
Aggregate functions
SELECT MIN(budget)
FROM films;
SELECT MAX(budget)
FROM films;
SELECT SUM(budget)
FROM films;
SELECT AVG(budget)
FROM films;
SELECT SUM(budget)
FROM films
WHERE release_year >= 2010;
+
, -
, /
, *
: symbols for basic arithmetic
SELECT (4 * 3);
12
SQL assumes that if you divide an integer by an integer, you want to get an integer back.
SELECT (4 / 3);
1
SELECT (4.0 / 3.0)
1.333
AS
: assign a temporary name to something (aka aliasing)
SELECT MAX(budget) AS max_budget,
MAX(duration) AS max_duration
FROM films;
Sorting
ORDER BY
: sort results in ascending or descending order according to the values of one or more columns.DESC
: sort in descending order (default is ascending order)
Put
ORDER BY
at the end of your query
Sort by a column in ascending order
SELECT title
FROM films
ORDER BY release_year;
Sort by a column in descending order
SELECT title
FROM films
ORDER BY release_year DESC;
Sort multiple columns (order of columns matters!)
SELECT birthdate, name
FROM people
ORDER BY birthdate, name;
Grouping
GROUP BY
: group a result by one or more columns (used with aggregate functions like COUNT()
or MAX()
)
GROUP BY
always goes after theFROM
clause
SELECT release_year, count(*)
FROM films
GROUP BY release_year
ORDER BY count;
Having
Aggregate functions can’t be used in
WHERE
clauses
SELECT release_year
FROM films
GROUP BY release_year
HAVING COUNT(title) > 10;