Module 1: SQL Foundations•20 min

Introduction to Databases & SQL

Progress Tracking

Before We Write Any Code

Let's get something out of the way: you don't need a computer science degree to be great at SQL. You could learn it on the job, asking ChatGPT whenever you get stuck until it clicks. But having a clear mental model of what's actually happening when you run a query? That makes everything easier and faster.

So let's spend a few minutes on the fundamentals before we start writing code.

Why Databases Exist

Here's the scenario: you're at a company with 2 million customers, 50 million orders, and a product catalog that changes daily. Marketing wants to know which customers haven't placed an order in the past 90 days. Finance needs last quarter's revenue by region. The product team wants to see which items get abandoned in carts.

You could try doing this in Excel. You'd probably crash your laptop.

Databases exist because real companies have too much data for spreadsheets. A database can store billions of rows, let dozens of people query it simultaneously, and return answers in milliseconds. It's not just storage — it's storage that's optimized for asking questions.

What 'Pull the Data' Actually Means

When someone says "pull the data," they almost always mean "write a SQL query against a database."

How Data is Organized

If you've used a spreadsheet, you already understand 80% of database structure. Data lives in tables. Tables have columns (the fields) and rows (the records).

The difference is that databases are strict about structure. Every row in a table has the same columns. Every column has a defined data type. You can't just throw random stuff in cell G47.

A Real Example

Let's say you work at a company tracking employees. Here's what a real employee table looks like — this is the techcorp_workforce table:

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Notice a few things:

id is unique for each employee. This is the primary key.
Each column stores one type of data (text, numbers, dates).
The table only stores employee info. Order details would live in a separate table.

What SQL Actually Is

SQL (most people say "sequel") stands for Structured Query Language. It's how you talk to databases. When you write a SQL query, you're basically saying: "Hey database, give me this data, from this table, where these conditions are true."

The database figures out the most efficient way to get you the answer.

SQL has been around since the 1970s. It's not trendy or exciting. But it's everywhere, it's not going anywhere, and if you work with data, you'll use it constantly. Every data analyst job posting lists SQL as a requirement for a reason.

The Dialect Thing

Here's something that trips people up: there are different "flavors" of SQL depending on which database you're using.

PostgreSQL: the favorite at startups and tech companies
MySQL: popular for web apps
SQL Server (MSSQL): big in enterprise and finance
Oracle: legacy enterprise systems

The good news: 90% of SQL is identical across all of them. SELECT, FROM, WHERE, JOIN — all the core stuff works the same. The differences show up in date functions, string handling, and some syntax quirks. We'll flag dialect differences when they matter.

Practice in Any SQL Dialect

StrataScratch lets you run queries in all 4 dialects. If you're interviewing at a company that uses SQL Server, you can practice in that exact environment. You can change the dialect in the top-right corner.

Your First Query

Enough theory. Let's write some SQL.

Every query starts with two keywords: SELECT (what columns you want) and FROM (which table to get them from).

SQL

SELECT
  column1,
  column2
FROM table_name;

That's it. That's a valid SQL query.

Getting Specific Columns

Table: techcorp_workforce

id	first_name	last_name	department	salary	phone_number	joining_date
1	Sarah	Mitchell	HR	95000	555-0101	2021-03-15
2	Michael	Chen	HR	88000	555-0102	2022-06-01
3	Emily	Rodriguez	HR	82500		2021-09-20
4	David	Park	HR	80000	555-0104	2023-01-10
5	Lisa	Thompson	HR	65000		2021-04-05

Tables: techcorp_workforce

We asked for two columns; we got two columns. The database only returns what you ask for.

Getting Everything

Sometimes you want to see all the columns. The asterisk (*) is a wildcard that means "everything." Replace the column names with *:

Tables: techcorp_workforce

This returns all columns in the table. It's useful for exploring data you're not familiar with.

Avoid SELECT * in Production

In production queries, avoid SELECT *. It's slower and makes your code harder to understand. Always specify the columns you actually need.

Writing Comments in SQL

As your queries get longer, you'll want to leave notes for yourself and others. SQL supports two types of comments:

SQL

-- This is a single-line comment
SELECT
  first_name,
  last_name
FROM techcorp_workforce;

Everything after -- on that line is ignored by the database. For longer explanations, use multi-line comments:

SQL

/* This query gets employee names
   for the quarterly report */
SELECT
  first_name,
  last_name
FROM techcorp_workforce;

Comments are also useful for temporarily disabling parts of a query while debugging:

SQL

SELECT
  first_name,
  last_name
  -- , salary   (commented out for now)
FROM techcorp_workforce;

Comment Your Queries

Get in the habit of commenting on complex queries. Your future self will thank you when you revisit a query months later.

Practice: Explore a New Dataset

Now let's try a different dataset. Below is data from Los Angeles restaurant health inspections:

Table: los_angeles_restaurant_health_inspections

serial_number	activity_date	facility_name	score	grade	service_code	service_description	employee_id	facility_address	facility_city	facility_id	facility_state	facility_zip	owner_id	owner_name	pe_description	program_element_pe	program_name	program_status	record_id
DAQHRSETQ	2017-06-08	MARGARITAS CAFE	93	A	1	ROUTINE INSPECTION	EE0000006	5026 S CRENSHAW BLVD	LOS ANGELES	FA0023656	CA	90043	OW0004133	BAZAN, ASCENCION	RESTAURANT (61-150) SEATS HIGH RISK	1638	MARGARITAS CAFE	ACTIVE	PR0011718
DA2GQRJOS	2017-03-07	LAS MOLENDERAS	97	A	1	ROUTINE INSPECTION	EE0000997	2635 WHITTIER BLVD	LOS ANGELES	FA0160416	CA	90023	OW0125379	MARISOL FEREGRINO	RESTAURANT (0-30) SEATS HIGH RISK	1632	LAS MOLENDERAS	INACTIVE	PR0148504
DAMQTA46T	2016-03-22	SANDRA'S TAMALES	93	A	1	ROUTINE INSPECTION	EE0001049	5390 WHITTIER BLVD	LOS ANGELES	FA0171769	CA	90022-4032	OW0178828	SANDRA'S TAMALES INC.	RESTAURANT (0-30) SEATS MODERATE RISK	1631	SANDRA'S TAMALES	ACTIVE	PR0164225
DAXMBTIRZ	2018-02-12	CAFE GRATITUDE	97	A	1	ROUTINE INSPECTION	EE0000828	639 N LARCHMONT BLVD STE #102	LOS ANGELES	FA0058921	CA	90004	OW0005704	CAFE GRATITUDE LARCHMONT LLC	RESTAURANT (61-150) SEATS HIGH RISK	1638	CAFE GRATITUDE	ACTIVE	PR0019854
DAK8TBMS0	2015-09-10	THE WAFFLE	90	A	1	ROUTINE INSPECTION	EE0000709	6255 W SUNSET BLVD STE #105	LOS ANGELES	FA0051830	CA	90028	OW0035796	THE WAFFLE, LLC	RESTAURANT (61-150) SEATS HIGH RISK	1638	THE WAFFLE	ACTIVE	PR0010922

Tables: los_angeles_restaurant_health_inspections

Key Takeaways

Databases store data in tables with defined columns and rows.
SQL is the language for querying databases. It's universal and essential.
SELECT...FROM is the foundation of every query.
Different databases use slightly different SQL dialects, but the core is the same.
Always specify columns instead of using * in real work.

What's Next

In the next lesson, we'll go deeper into SELECT: aliases, DISTINCT, and working with different data types. Then we'll start filtering data with WHERE — which is when things get actually useful.

Next upContinue →

SELECT and FROM Basics

35 min