Module 1: SQL Foundations20 min

Introduction to Databases & SQL

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Before We Write Any Code

Let's get something out of the way: you don't need a computer science degree to be great at SQL. You could learn it on the job, asking ChatGPT whenever you get stuck until it clicks. But having a clear mental model of what's actually happening when you run a query? That makes everything easier and faster.

So let's spend a few minutes on the fundamentals before we start writing code.

Why Databases Exist

Here's the scenario: you're at a company with 2 million customers, 50 million orders, and a product catalog that changes daily. Marketing wants to know which customers haven't placed an order in the past 90 days. Finance needs last quarter's revenue by region. The product team wants to see which items get abandoned in carts.

You could try doing this in Excel. You'd probably crash your laptop.

Databases exist because real companies have too much data for spreadsheets. A database can store billions of rows, let dozens of people query it simultaneously, and return answers in milliseconds. It's not just storage — it's storage that's optimized for asking questions.

What 'Pull the Data' Actually Means

When someone says "pull the data," they almost always mean "write a SQL query against a database."

How Data is Organized

If you've used a spreadsheet, you already understand 80% of database structure. Data lives in tables. Tables have columns (the fields) and rows (the records).

The difference is that databases are strict about structure. Every row in a table has the same columns. Every column has a defined data type. You can't just throw random stuff in cell G47.

A Real Example

Let's say you work at a company tracking employees. Here's what a real employee table looks like — this is the techcorp_workforce table:

Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05

Notice a few things:

  • id is unique for each employee. This is the primary key.
  • Each column stores one type of data (text, numbers, dates).
  • The table only stores employee info. Order details would live in a separate table.

What SQL Actually Is

SQL (most people say "sequel") stands for Structured Query Language. It's how you talk to databases. When you write a SQL query, you're basically saying: "Hey database, give me this data, from this table, where these conditions are true."

The database figures out the most efficient way to get you the answer.

SQL has been around since the 1970s. It's not trendy or exciting. But it's everywhere, it's not going anywhere, and if you work with data, you'll use it constantly. Every data analyst job posting lists SQL as a requirement for a reason.

The Dialect Thing

Here's something that trips people up: there are different "flavors" of SQL depending on which database you're using.

  • PostgreSQL: the favorite at startups and tech companies
  • MySQL: popular for web apps
  • SQL Server (MSSQL): big in enterprise and finance
  • Oracle: legacy enterprise systems

The good news: 90% of SQL is identical across all of them. SELECT, FROM, WHERE, JOIN — all the core stuff works the same. The differences show up in date functions, string handling, and some syntax quirks. We'll flag dialect differences when they matter.

Practice in Any SQL Dialect

StrataScratch lets you run queries in all 4 dialects. If you're interviewing at a company that uses SQL Server, you can practice in that exact environment. You can change the dialect in the top-right corner.

Your First Query

Enough theory. Let's write some SQL.

Every query starts with two keywords: SELECT (what columns you want) and FROM (which table to get them from).

SQL
SELECT
  column1,
  column2
FROM table_name;

That's it. That's a valid SQL query.

Getting Specific Columns

Table: techcorp_workforce
idfirst_namelast_namedepartmentsalaryphone_numberjoining_date
1SarahMitchellHR95000555-01012021-03-15
2MichaelChenHR88000555-01022022-06-01
3EmilyRodriguezHR825002021-09-20
4DavidParkHR80000555-01042023-01-10
5LisaThompsonHR650002021-04-05
1
Select Employee Names

Write a query to select just the first and last names from the `techcorp_workforce` table.

Tables: techcorp_workforce

We asked for two columns; we got two columns. The database only returns what you ask for.

Getting Everything

Sometimes you want to see all the columns. The asterisk (*) is a wildcard that means "everything." Replace the column names with *:

2
Select All Columns

Modify the query to return all columns from the `techcorp_workforce` table using the wildcard *.

Tables: techcorp_workforce

This returns all columns in the table. It's useful for exploring data you're not familiar with.

Avoid SELECT * in Production

In production queries, avoid SELECT *. It's slower and makes your code harder to understand. Always specify the columns you actually need.

Writing Comments in SQL

As your queries get longer, you'll want to leave notes for yourself and others. SQL supports two types of comments:

SQL
-- This is a single-line comment
SELECT
  first_name,
  last_name
FROM techcorp_workforce;

Everything after -- on that line is ignored by the database. For longer explanations, use multi-line comments:

SQL
/* This query gets employee names
   for the quarterly report */
SELECT
  first_name,
  last_name
FROM techcorp_workforce;

Comments are also useful for temporarily disabling parts of a query while debugging:

SQL
SELECT
  first_name,
  last_name
  -- , salary   (commented out for now)
FROM techcorp_workforce;
Comment Your Queries

Get in the habit of commenting on complex queries. Your future self will thank you when you revisit a query months later.

Practice: Explore a New Dataset

Now let's try a different dataset. Below is data from Los Angeles restaurant health inspections:

Table: los_angeles_restaurant_health_inspections
serial_numberactivity_datefacility_namescoregradeservice_codeservice_descriptionemployee_idfacility_addressfacility_cityfacility_idfacility_statefacility_zipowner_idowner_namepe_descriptionprogram_element_peprogram_nameprogram_statusrecord_id
DAQHRSETQ2017-06-08MARGARITAS CAFE93A1ROUTINE INSPECTIONEE00000065026 S CRENSHAW BLVDLOS ANGELESFA0023656CA90043OW0004133BAZAN, ASCENCIONRESTAURANT (61-150) SEATS HIGH RISK1638MARGARITAS CAFEACTIVEPR0011718
DA2GQRJOS2017-03-07LAS MOLENDERAS97A1ROUTINE INSPECTIONEE00009972635 WHITTIER BLVDLOS ANGELESFA0160416CA90023OW0125379MARISOL FEREGRINORESTAURANT (0-30) SEATS HIGH RISK1632LAS MOLENDERASINACTIVEPR0148504
DAMQTA46T2016-03-22SANDRA'S TAMALES93A1ROUTINE INSPECTIONEE00010495390 WHITTIER BLVDLOS ANGELESFA0171769CA90022-4032OW0178828SANDRA'S TAMALES INC.RESTAURANT (0-30) SEATS MODERATE RISK1631SANDRA'S TAMALESACTIVEPR0164225
DAXMBTIRZ2018-02-12CAFE GRATITUDE97A1ROUTINE INSPECTIONEE0000828639 N LARCHMONT BLVD STE #102LOS ANGELESFA0058921CA90004OW0005704CAFE GRATITUDE LARCHMONT LLCRESTAURANT (61-150) SEATS HIGH RISK1638CAFE GRATITUDEACTIVEPR0019854
DAK8TBMS02015-09-10THE WAFFLE90A1ROUTINE INSPECTIONEE00007096255 W SUNSET BLVD STE #105LOS ANGELESFA0051830CA90028OW0035796THE WAFFLE, LLCRESTAURANT (61-150) SEATS HIGH RISK1638THE WAFFLEACTIVEPR0010922
3
Explore Restaurant Health Inspections

Write a query to select the facility name, score, and grade from the `los_angeles_restaurant_health_inspections` table.

Tables: los_angeles_restaurant_health_inspections

Key Takeaways

  • Databases store data in tables with defined columns and rows.
  • SQL is the language for querying databases. It's universal and essential.
  • SELECT...FROM is the foundation of every query.
  • Different databases use slightly different SQL dialects, but the core is the same.
  • Always specify columns instead of using * in real work.

What's Next

In the next lesson, we'll go deeper into SELECT: aliases, DISTINCT, and working with different data types. Then we'll start filtering data with WHERE — which is when things get actually useful.