Introduction to Databases & SQL
Progress Tracking
Log in to save this lesson and continue from where you left off.
Before We Write Any Code
Let's get something out of the way: you don't need a computer science degree to be great at SQL. You could learn it on the job, asking ChatGPT whenever you get stuck until it clicks. But having a clear mental model of what's actually happening when you run a query? That makes everything easier and faster.
So let's spend a few minutes on the fundamentals before we start writing code.
Why Databases Exist
Here's the scenario: you're at a company with 2 million customers, 50 million orders, and a product catalog that changes daily. Marketing wants to know which customers haven't placed an order in the past 90 days. Finance needs last quarter's revenue by region. The product team wants to see which items get abandoned in carts.
You could try doing this in Excel. You'd probably crash your laptop.
Databases exist because real companies have too much data for spreadsheets. A database can store billions of rows, let dozens of people query it simultaneously, and return answers in milliseconds. It's not just storage — it's storage that's optimized for asking questions.
When someone says "pull the data," they almost always mean "write a SQL query against a database."
How Data is Organized
If you've used a spreadsheet, you already understand 80% of database structure. Data lives in tables. Tables have columns (the fields) and rows (the records).
The difference is that databases are strict about structure. Every row in a table has the same columns. Every column has a defined data type. You can't just throw random stuff in cell G47.
A Real Example
Let's say you work at a company tracking employees. Here's what a real employee table looks like — this is the techcorp_workforce table:
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Notice a few things:
idis unique for each employee. This is the primary key.- Each column stores one type of data (text, numbers, dates).
- The table only stores employee info. Order details would live in a separate table.
What SQL Actually Is
SQL (most people say "sequel") stands for Structured Query Language. It's how you talk to databases. When you write a SQL query, you're basically saying: "Hey database, give me this data, from this table, where these conditions are true."
The database figures out the most efficient way to get you the answer.
SQL has been around since the 1970s. It's not trendy or exciting. But it's everywhere, it's not going anywhere, and if you work with data, you'll use it constantly. Every data analyst job posting lists SQL as a requirement for a reason.
The Dialect Thing
Here's something that trips people up: there are different "flavors" of SQL depending on which database you're using.
- PostgreSQL: the favorite at startups and tech companies
- MySQL: popular for web apps
- SQL Server (MSSQL): big in enterprise and finance
- Oracle: legacy enterprise systems
The good news: 90% of SQL is identical across all of them. SELECT, FROM, WHERE, JOIN — all the core stuff works the same. The differences show up in date functions, string handling, and some syntax quirks. We'll flag dialect differences when they matter.
StrataScratch lets you run queries in all 4 dialects. If you're interviewing at a company that uses SQL Server, you can practice in that exact environment. You can change the dialect in the top-right corner.
Your First Query
Enough theory. Let's write some SQL.
Every query starts with two keywords: SELECT (what columns you want) and FROM (which table to get them from).
SELECT
column1,
column2
FROM table_name;That's it. That's a valid SQL query.
Getting Specific Columns
| id | first_name | last_name | department | salary | phone_number | joining_date |
|---|---|---|---|---|---|---|
| 1 | Sarah | Mitchell | HR | 95000 | 555-0101 | 2021-03-15 |
| 2 | Michael | Chen | HR | 88000 | 555-0102 | 2022-06-01 |
| 3 | Emily | Rodriguez | HR | 82500 | 2021-09-20 | |
| 4 | David | Park | HR | 80000 | 555-0104 | 2023-01-10 |
| 5 | Lisa | Thompson | HR | 65000 | 2021-04-05 |
Write a query to select just the first and last names from the `techcorp_workforce` table.
We asked for two columns; we got two columns. The database only returns what you ask for.
Getting Everything
Sometimes you want to see all the columns. The asterisk (*) is a wildcard that means "everything." Replace the column names with *:
Modify the query to return all columns from the `techcorp_workforce` table using the wildcard *.
This returns all columns in the table. It's useful for exploring data you're not familiar with.
In production queries, avoid SELECT *. It's slower and makes your code harder to understand. Always specify the columns you actually need.
Writing Comments in SQL
As your queries get longer, you'll want to leave notes for yourself and others. SQL supports two types of comments:
-- This is a single-line comment
SELECT
first_name,
last_name
FROM techcorp_workforce;Everything after -- on that line is ignored by the database. For longer explanations, use multi-line comments:
/* This query gets employee names
for the quarterly report */
SELECT
first_name,
last_name
FROM techcorp_workforce;Comments are also useful for temporarily disabling parts of a query while debugging:
SELECT
first_name,
last_name
-- , salary (commented out for now)
FROM techcorp_workforce;Get in the habit of commenting on complex queries. Your future self will thank you when you revisit a query months later.
Practice: Explore a New Dataset
Now let's try a different dataset. Below is data from Los Angeles restaurant health inspections:
| serial_number | activity_date | facility_name | score | grade | service_code | service_description | employee_id | facility_address | facility_city | facility_id | facility_state | facility_zip | owner_id | owner_name | pe_description | program_element_pe | program_name | program_status | record_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DAQHRSETQ | 2017-06-08 | MARGARITAS CAFE | 93 | A | 1 | ROUTINE INSPECTION | EE0000006 | 5026 S CRENSHAW BLVD | LOS ANGELES | FA0023656 | CA | 90043 | OW0004133 | BAZAN, ASCENCION | RESTAURANT (61-150) SEATS HIGH RISK | 1638 | MARGARITAS CAFE | ACTIVE | PR0011718 |
| DA2GQRJOS | 2017-03-07 | LAS MOLENDERAS | 97 | A | 1 | ROUTINE INSPECTION | EE0000997 | 2635 WHITTIER BLVD | LOS ANGELES | FA0160416 | CA | 90023 | OW0125379 | MARISOL FEREGRINO | RESTAURANT (0-30) SEATS HIGH RISK | 1632 | LAS MOLENDERAS | INACTIVE | PR0148504 |
| DAMQTA46T | 2016-03-22 | SANDRA'S TAMALES | 93 | A | 1 | ROUTINE INSPECTION | EE0001049 | 5390 WHITTIER BLVD | LOS ANGELES | FA0171769 | CA | 90022-4032 | OW0178828 | SANDRA'S TAMALES INC. | RESTAURANT (0-30) SEATS MODERATE RISK | 1631 | SANDRA'S TAMALES | ACTIVE | PR0164225 |
| DAXMBTIRZ | 2018-02-12 | CAFE GRATITUDE | 97 | A | 1 | ROUTINE INSPECTION | EE0000828 | 639 N LARCHMONT BLVD STE #102 | LOS ANGELES | FA0058921 | CA | 90004 | OW0005704 | CAFE GRATITUDE LARCHMONT LLC | RESTAURANT (61-150) SEATS HIGH RISK | 1638 | CAFE GRATITUDE | ACTIVE | PR0019854 |
| DAK8TBMS0 | 2015-09-10 | THE WAFFLE | 90 | A | 1 | ROUTINE INSPECTION | EE0000709 | 6255 W SUNSET BLVD STE #105 | LOS ANGELES | FA0051830 | CA | 90028 | OW0035796 | THE WAFFLE, LLC | RESTAURANT (61-150) SEATS HIGH RISK | 1638 | THE WAFFLE | ACTIVE | PR0010922 |
Write a query to select the facility name, score, and grade from the `los_angeles_restaurant_health_inspections` table.
Key Takeaways
- Databases store data in tables with defined columns and rows.
- SQL is the language for querying databases. It's universal and essential.
SELECT...FROMis the foundation of every query.- Different databases use slightly different SQL dialects, but the core is the same.
- Always specify columns instead of using
*in real work.
What's Next
In the next lesson, we'll go deeper into SELECT: aliases, DISTINCT, and working with different data types. Then we'll start filtering data with WHERE — which is when things get actually useful.