Amazon SQL Interview Questions

Last Updated: February 16, 2026

Categories:

Written by:
Sara Nobrega

Amazon SQL Interview Questions: The Real Patterns, Traps, and Query Skills Amazon Actually Tests.

What Amazon SQL Interviews Actually Test

As the knowledge and understanding of SQL are the key requirements for both data scientists and data analysts at Amazon, many interview questions require writing solutions using this language.

In fact, Amazon asks far fewer questions about theoretical concepts and its products than other tech companies. This is why it’s crucial to practice solving SQL interview questions before an Amazon interview.

The Amazon SQL interview questions test a wide range of SQL concepts, but some notions appear in them more often. Nearly 70% of Amazon SQL interview questions concern data stores in multiple tables and ask for merging the data using JOIN clauses, Common Table Expressions, or subqueries.

Another highly prevailing concept is data aggregation using SQL functions such as COUNT() or SUM() in combination with the GROUP BY clause or using the more complicated window functions. Many questions also require data filtering and sorting using the WHERE and ORDER BY clauses.

Amazon-Style Dataset Schema

Before writing SQL, lock in the grain of each table. Most Amazon-style questions are simple once you stop mixing row levels (transaction rows vs user rows vs order rows).

In Amazon-style datasets, similar archetypes show up again and again. Let’s explore some of them.

Customers / Users table (dimension)

This is your customers-type table.

Typical columns: user_id (or id), name fields, city, country, signup date (sometimes)
Grain: one row per user
What it’s for: segmentation (“by city”), cohort cuts (“new users”), deduping

Table: customers

id	first_name	last_name	city	phone_number
8	John	Joseph	San Francisco	928-386-8164
7	Jill	Michael	Austin	813-297-0692
4	William	Daniel	Denver	813-368-1200
5	Henry	Jackson	Miami	808-601-7513
13	Emma	Isaac	Miami	808-690-5201

Loading Dataset

Orders table (order header or order line)

This is your orders-type table.

Typical columns: order_id, user_id, order_date, maybe total_order_cost, maybe order_details
Grain warning: if there’s an order_details / item / product field and it varies per row, it’s often one row per order line, not one row per order.

Table: orders

id	cust_id	order_date	order_details	total_order_cost
1	3	2019-03-04	Coat	100
2	3	2019-03-01	Shoes	80
3	3	2019-03-07	Skirt	30
4	7	2019-02-01	Coat	25
5	7	2019-03-10	Shoes	80

Loading Dataset

Transactions / Purchases table (fact table)

This is the amazon_transactions / amazon_purchases vibe.

Typical columns: transaction_id (or id), user_id, created_at, revenue or purchase_amt
Grain: one row per purchase event (or per transaction / line item)
Common twist: money can be negative (returns/refunds)

Table: amazon_transactions

id	user_id	item	created_at	revenue
1	109	milk	2020-03-03	123
2	139	biscuit	2020-03-18	421
3	120	milk	2020-03-18	176
4	108	banana	2020-03-18	862
5	130	milk	2020-03-28	333

Loading Dataset

Since amazon_transactions is one row per transaction, “monthly revenue” means grouping created_at by month and summing revenue. If you switch to a per-user question (like repeat buyers), you’ll usually dedupe to one row per user per day or per order first.

Amazon SQL Interview Questions (with Solutions)

Let’s start with the easier ones.

Easy Interview Questions

Amazon SQL Interview Questions #1: Workers With The Highest Salaries

Workers With The Highest Salaries

Last Updated: July 2021

EasyID 10353

Management wants to analyze only employees with official job titles. Find the title(s) of the worker(s) with the highest salary among workers who have a corresponding record in the title table. If multiple employees have the same highest salary, include all their job titles.

Go to the Question

Data View

Table: worker

worker_id	first_name	last_name	salary	joining_date	department
1	Monika	Arora	100000	2014-02-20	HR
2	Niharika	Verma	80000	2014-06-11	Admin
3	Vishal	Singhal	300000	2014-02-20	HR
4	Amitah	Singh	500000	2014-02-20	Admin
5	Vivek	Bhati	500000	2014-06-11	Admin

Loading Dataset

Table: title

worker_ref_id	worker_title	affected_from
1	Manager	2016-02-20
2	Executive	2016-06-11
8	Executive	2016-06-11
5	Manager	2016-06-11
4	Asst. Manager	2016-06-11

Loading Dataset

A quick note on what matters here:

worker.salary holds the salary value we’re trying to maximize.
title.worker_title indicates whether someone has an “official” job title (we only consider workers who appear in this table with a non-null title).
We want job titles, not employee names, and we want all titles tied to the maximum salary.

Interview Framing (How It’s Asked + Typical Follow-Ups)

In interviews, this usually isn’t phrased as “find the max salary.” It’s framed as “Find the highest-paid employee(s) among the population that qualifies for analysis, then report a related attribute.”

The follow-up is often about scope: “Does the maximum come from all workers or only titled workers?” and “What if multiple workers tie for the maximum?” If the interviewer wants to push, they’ll ask about the ’affected_from’ column: “If titles change, which title should you use?”

A senior-sounding answer is acknowledging the decision point: “I’ll compute the max salary over only workers who have an official title, because the prompt explicitly restricts the analysis population.”

Common Mistakes (And How To Avoid Them)

A common mistake is computing MAX(salary) from worker first, then joining with title afterward. That can silently drop the max-paid worker if they don’t have a title row, and you end up returning the wrong max for the restricted population.

Another mistake is using MAX(worker_title) instead of max salary, or returning only one title by using LIMIT 1, which fails the “include all ties” requirement. If the data allowed multiple title rows per worker, failing to think about “current vs historical title” can also create duplicates or mismatched titles.

Evaluation Criteria / Rubric (How Interviewers Score It)

A strong solution makes the population restriction explicit, preserves ties, and keeps the logic readable.

A great answer computes the maximum salary over the correctly filtered population (workers with titles), returns all titles tied at that salary, and doesn’t rely on DISTINCT to patch join mistakes.

An okay answer returns the right output on clean data, but doesn’t clearly justify whether the max is computed before or after applying the “official title” constraint. A weak answer computes the max salary from all workers, drops records in the join, or returns only one title when ties exist.

Solution

1) Find the maximum salary among workers who have an official title

This subquery is the key filter that enforces the “official title only” requirement. We join worker to title so untitled employees are excluded from consideration. Then we compute MAX(w.salary) over that restricted set.

PostgreSQL

Tables: worker, title

2) Pull all job titles whose worker salary equals that maximum

Now we join worker and title again so we can return the actual titles. We keep only rows where the worker’s salary equals the maximum salary computed above. If multiple employees share the same highest salary, this filter naturally includes all of them (and therefore all of their titles).

PostgreSQL

Go to the question on the platformTables: worker, title

Output

best_paid_title
Asst. Manager
Manager

Amazon SQL Interview Questions #2: Total Cost Of Orders

Total Cost Of Orders

Last Updated: July 2020

EasyID 10183

Find the total cost of each customer's orders. Output customer's id, first name, and the total order cost. Order records by customer's first name alphabetically.

Go to the Question

Data View

Table: customers

id	first_name	last_name	city	phone_number
8	John	Joseph	San Francisco	928-386-8164
7	Jill	Michael	Austin	813-297-0692
4	William	Daniel	Denver	813-368-1200
5	Henry	Jackson	Miami	808-601-7513
13	Emma	Isaac	Miami	808-690-5201

Loading Dataset

Table: orders

id	cust_id	order_date	order_details	total_order_cost
1	3	2019-03-04	Coat	100
2	3	2019-03-01	Shoes	80
3	3	2019-03-07	Skirt	30
4	7	2019-02-01	Coat	25
5	7	2019-03-10	Shoes	80

Loading Dataset

This is a “looks trivial” aggregation, but interviewers are usually checking whether you notice the grain shift. orders are multiple rows per customer, so the only correct way to get “total cost per customer” is to collapse to one row per customer with a GROUP BY.

Also, notice the prompt says “each customer’s orders,” which implies you’re summarizing at the customer level, not per order, not per day.

Grain (what one output row means): one row per customer who has at least one order in orders (with this solution).

Interview Framing (How It’s Asked + Typical Follow-Ups)

In interviews, this often comes as “roll up the fact table to the dimension grain,” and the follow-up is almost always about inclusivity: “Do you want customers with zero orders?”

If they do, you’ll need a LEFT JOIN from customers and COALESCE(SUM(...), 0). Another common follow-up is about what you’re allowed to group by: “Is first_name unique?” If not, you’d group by ’id’ from customers (and select the name as a dependent attribute).

Common Mistakes (And How To Avoid Them)

The most common mistake is grouping by first_name only. That merges different customers who share a name and produces a total that looks reasonable but is wrong.

Another mistake is selecting columns that aren’t grouped or aggregated (or “fixing” it with DISTINCT, which hides the grain problem instead of solving it). People sometimes also order by the sum instead of by first_name, which does not follow the prompt.

Validation Checks

A quick check is that the output should have one row per customer (for customers with orders), so the number of rows should match the number of distinct cust_id values present in orders.

Another check is that the total of your per-customer sums should equal the total revenue in orders over the same rows, if those don’t match, you likely duplicated rows via a bad join.

Finally, spot-check a customer you can compute mentally (like a customer with one order): their total should equal that single total_order_cost.

Solution

1) Start by joining orders to customers

We first need the customer details (like first_name) alongside each order. The join happens on customers.id = orders.cust_id. At this stage, we’re not aggregating yet, just creating the combined dataset.

PostgreSQL

Tables: customers, orders

2) Aggregate order costs per customer

Now that each order is linked to a customer, we can sum up spending per customer. We group by the customer columns we want to return (id and first_name). This produces one row per customer with their total cost.

PostgreSQL

Tables: customers, orders

3) Final query: add sorting by first name (complete solution)

The question asks us to order records alphabetically by the customer’s first name. So we keep the aggregation exactly the same and simply add ORDER BY customers.first_name ASC. This final query is the full solution.

PostgreSQL

Go to the question on the platformTables: customers, orders

Output

id	first_name	sum
12	Eva	205
3	Farida	440
5	Henry	80
7	Jill	535
1	Mark	275
15	Mia	540
4	William	140

Amazon SQL Interview Questions #3: Workers by Department Since April

Workers by Department Since April

EasyID 9847

Find the number of workers by department who joined on or after April 1, 2014.

Output the department name along with the corresponding number of workers.

Sort the results based on the number of workers in descending order.

Go to the Question

Data View

Table: worker

worker_id	first_name	last_name	salary	joining_date	department
1	Monika	Arora	100000	2014-02-20	HR
2	Niharika	Verma	80000	2014-06-11	Admin
3	Vishal	Singhal	300000	2014-02-20	HR
4	Amitah	Singh	500000	2014-02-20	Admin
5	Vivek	Bhati	500000	2014-06-11	Admin

Loading Dataset

This is a single-table aggregation with a date filter.

The “scenario” part is simple: you’re counting headcount by department for a time-bounded cohort (“joined on or after”). The main thing to get right is the boundary condition and the output grain.

Grain (what one output row means): one row per department in the result set.

Interview Framing (How It’s Asked + Typical Follow-Ups)

In interviews, this is often framed as “give me a breakdown by category for a cohort,” and the follow-up is usually about the boundary: “Is April 1 inclusive?” (the prompt says “on or after,” so yes).

Another common follow-up is whether departments with zero workers should appear; in this question, they won’t, because we’re grouping only over workers who meet the filter.

Common Mistakes (What Candidates Do Wrong + How To Avoid)

The most common mistake is using > instead of >= and accidentally excluding workers who joined exactly on 2014-04-01.

Another is grouping by too much (like department, joining_date), which changes the grain into daily counts. People also sometimes sort alphabetically by department instead of sorting by the computed count, which fails the prompt.

Evaluation Criteria / Rubric (How Interviewers Score It)

A strong answer applies the correct inclusive date filter, groups at the correct grain (department), and sorts by count descendingly.

An okay answer gets the right departments but doesn’t follow the sorting requirement, or uses a CTE when it isn’t needed (not wrong, just extra).

A weak answer changes the grain by grouping on unnecessary columns, or misinterprets the date boundary, and returns the wrong counts.

Solution

1) Filter only workers who joined on or after April 1, 2014

We start by narrowing the dataset to only the workers we care about. This keeps the next steps clean and avoids mixing in earlier join dates. Using a CTE here makes the query easier to read.

PostgreSQL

Tables: worker

2) Count workers per department

Now that we have only qualifying workers, we group them by department. COUNT(worker_id) gives us how many workers are in each department. At this stage, we’re producing one row per department.

PostgreSQL

Tables: worker

3) Final query: add sorting by number of workers (complete solution)

The final requirement is to sort departments by worker count in descending order. We keep the same aggregation and simply add ORDER BY num_workers DESC. This gives the final ranked result.

PostgreSQL

Go to the question on the platformTables: worker

Output

department	num_workers
Admin	4
Account	1
HR	1

Medium Interview Questions

Amazon Medium Difficult SQL Interview Questions

Amazon SQL Interview Questions #4: Finding User Purchases

Finding User Purchases

Last Updated: December 2020

MediumID 10322

Identify returning active users by finding users who made a second purchase within 1 to 7 days after their first purchase. Ignore same-day purchases. Output a list of these user_ids.

Go to the Question

Data View

Table: amazon_transactions

id	user_id	item	created_at	revenue
1	109	milk	2020-03-03	123
2	139	biscuit	2020-03-18	421
3	120	milk	2020-03-18	176
4	108	banana	2020-03-18	862
5	130	milk	2020-03-28	333

Loading Dataset

This table is an event log at the transaction level. A single user can have multiple purchases on the same day, which matters because the prompt is about “first purchase date” and “second purchase date,” not “first two rows in the table.”

Grain (what one output row means): one row per qualifying user_id.
What the table implies: multiple rows per user per day are possible, so you need to decide whether “purchase” means “transaction” or “purchase day.” The solution treats it as distinct purchase dates, which matches “ignore same-day purchases.”

Interview Framing (How It’s Asked + Typical Follow-Ups)

This is typically framed as a retention metric: “Who comes back within a week?”

The follow-ups are usually about definitions: “Are multiple purchases on day 1 considered returning?” (prompt says ignore same-day, so no), “Do we care about the second transaction or the second day with activity?”, and “What if the second purchase is 20 days later but there’s also one on day 3?” In other words, interviewers are testing whether you can lock the definition of “second purchase” before you start coding.

Trade-Offs / Decision Rules (Why This Approach vs Alternatives)

The key decision is to dedupe to one row per user-day first. That makes “ignore same-day purchases” automatic, because multiple transactions collapse into a single purchase_date.

From there, using ROW_NUMBER() to label the first and second purchase dates is straightforward and readable in an interview.

An alternative is MIN() for the first date and then MIN(purchase_date) FILTER (WHERE purchase_date > first_date) for the second date (dialect-dependent), or a self join to find the earliest later purchase.

The window approach is usually easiest to explain out loud: “I’m ranking purchase days and grabbing the first two.”

Edge Cases (Assumptions + What Breaks the Solution)

If a user has two purchases on the same calendar day only, they should not qualify: deduping to user-day enforces that.

If a user has purchases on many days, you still only care about the second distinct day, not “any purchase within 7 days.” This is subtle: a user who buys on day 1, then day 20, then day 3 doesn’t exist chronologically, but if your query isn’t careful, you can accidentally check “any later purchase within 7 days” rather than the second purchase day.

Finally, timezone can matter if created_at is a timestamp; casting to date assumes the business definition of “day” matches the database timezone.

Solution

1) Remove same-day duplicates by keeping unique purchase dates per user

First, we convert created_at into a date and use DISTINCT so multiple same-day purchases count as a single purchase day. This is important because the prompt explicitly says to ignore same-day purchases. The result is a clean set of (user_id, purchase_date) pairs.

PostgreSQL

Tables: amazon_transactions

2) Rank purchase dates per user to identify first vs second purchase

Now we assign an order to each purchase date using ROW_NUMBER(). The earliest purchase date per user gets rn = 1, and the next one gets rn = 2. This makes it easy to pick out the first two purchase days for each user.

PostgreSQL

Tables: amazon_transactions

3) Final query: extract first/second dates and filter to 1-7 day returning users (complete solution)

Next, we pivot the first two ranked rows into two columns: first_date and second_date. We filter out users who don’t have a second purchase day (second_date is NULL). Finally, we compute (second_date - first_date) and keep only those between 1 and 7 days.

PostgreSQL

Go to the question on the platformTables: amazon_transactions

Output

user_id
100
103
105
109
111
114
117
122
130
131
133
141
143

Amazon SQL Interview Questions #5: Top-Rated Support Employees

Top-Rated Support Employees

Last Updated: May 2023

MediumID 10554

You're analyzing employee performance at a customer support center. Management wants to identify which support agents are providing the best customer experience based on satisfaction scores.

Rank employees by their average customer satisfaction score for resolved tickets. Return the top 3 ranks, where ranks should be consecutive and should not skip numbers even if there are ties. For example, if the scores are [4.9, 4.7, 4.7, 4.5], the rankings would be [1, 2, 2, 3].

Return the employee ID, employee name, average satisfaction score, and employee rank.

Go to the Question

Data View

Table: amazon_support_tickets

ticket_id	employee_id	employee_name	department	ticket_priority	resolution_time_minutes	customer_satisfaction	resolution_status
TCK-001	EMP-201	Alice Johnson	Amazon Prime Support	high	62	5	resolved
TCK-002	EMP-202	Bob Smith	Amazon Prime Support	medium	297		unresolved
TCK-003	EMP-203	Carol Lee	Amazon Prime Support	medium	89	3	resolved
TCK-004	EMP-203	Carol Lee	Amazon Prime Support	high	85	4	resolved
TCK-005	EMP-203	Carol Lee	Amazon Prime Support	high	95	2	resolved

Loading Dataset

This is a ticket-level table. Each row is a support ticket with a status and an optional satisfaction score. The prompt quietly enforces two filters: you only score employees on resolved tickets, and you must ignore null satisfaction scores. The output grain is employee-level.

Grain (what one output row means): one row per employee, with their average satisfaction over qualifying tickets, plus a rank.

Interview Framing (How It’s Asked + Typical Follow-Ups)

This usually shows up as “Who are our top performers?” but the follow-ups are where the question becomes interview-grade.

Common follow-ups include: “Do unresolved or escalated tickets count?” (here, no, only resolved), “What do you do with missing satisfaction scores?” (exclude them), and “How do you rank with ties without skipping rank numbers?”

That last line is the real requirement: they’re pushing you toward DENSE_RANK() rather than RANK().

Trade-Offs / Decision Rules (Why This Approach vs Alternatives)

The clean decision rule is to separate the problem into two stages: compute a stable employee-level metric first (average satisfaction), then rank those averages. That keeps the grain controlled and avoids ranking ticket rows by accident.

For ranking, DENSE RANK is the right fit because the prompt explicitly says ranks should not skip numbers under ties. RANK() would skip (e.g., 1, 2, 2, 4), which violates the requirement. ROW_NUMBER() would force a unique ordering even when there is an average tie, which is also not what they asked.

Evaluation Criteria / Rubric (How Interviewers Score It)

A strong answer filters to resolved tickets, excludes null satisfaction values, aggregates to employee-level correctly, uses DENSE_RANK() (not RANK()), and returns all employees whose rank is within the top 3 ranks (including ties at rank 3).

An okay answer gets the averages right but uses the wrong ranking function, or returns only three employees instead of “top 3 ranks.” A weak answer ranks ticket rows directly, includes unresolved/escalated tickets, or treats null satisfaction as zero, and drags averages down silently.

Solution

1) Compute average satisfaction per employee (resolved tickets only)

We first filter the table to include only tickets that were resolved. Then we exclude null satisfaction scores so they don’t distort the average. Finally, we group by employee and compute AVG(customer_satisfaction).

PostgreSQL

Tables: amazon_support_tickets

2) Rank employees by average satisfaction using DENSE_RANK()

Now that we have one row per employee, ranking becomes straightforward. We use DENSE_RANK() so ties share the same rank and the next rank doesn’t skip numbers. This matches the requirements, such as [4.9, 4.7, 4.7, 4.5] -> [1, 2, 2, 3].

PostgreSQL

Tables: amazon_support_tickets

3) Final query: return only top 3 ranks (complete solution)

Finally, we filter to keep only employees whose dense rank is 1, 2, or 3. This automatically includes all employees tied within those ranks. The output includes exactly what the prompt asks for: id, name, average score, and rank.

PostgreSQL

Go to the question on the platformTables: amazon_support_tickets

Output

employee_id	employee_name	avg_satisfaction	employee_rank
EMP-201	Alice Johnson	5	1
EMP-207	Grace Hill	4.5	2
EMP-203	Carol Lee	3	3
EMP-209	Ivy Scott	3	3

Hard Interview Questions

Amazon SQL Interview Questions #6: Monthly Percentage Difference

Monthly Percentage Difference

Last Updated: December 2020

HardID 10319

Given a table of purchases by date, calculate the month-over-month percentage change in revenue. The output should include the year-month date (YYYY-MM) and percentage change, rounded to the 2nd decimal point, and sorted from the beginning of the year to the end of the year. The percentage change column will be populated from the 2nd month forward and can be calculated as ((this month's revenue - last month's revenue) / last month's revenue)*100.

Go to the Question

Data View

Table: sf_transactions

id	created_at	value	purchase_id
1	2019-01-01	172692	43
2	2019-01-05	177194	36
3	2019-01-09	109513	30
4	2019-01-13	164911	30
5	2019-01-17	198872	39

Loading Dataset

This is transaction-level revenue. Each row is a purchase event with a created_at date and a value. The key move is that the metric is at the month level, so you need to roll up to one row per month first, then compare each month to the prior month.

Grain (what one output row means): one row per month.

Interview framing (how it’s asked + typical follow-ups)

In interviews, this is typically framed as “compute MoM growth” or “build a monthly KPI trend with deltas.”

The follow-ups are usually about definitions: “Do we bucket by calendar month in UTC or local time?” and “What happens in the first month?” (no prior month, so NULL). If the interviewer wants to push, they’ll ask what you do when the prior month's revenue is zero, because the formula would divide by zero.

Trade-Offs / Decision Rules (Why This Approach vs Alternatives)

The decision rule is: aggregate monthly revenue first, then use a window function (LAG()) on the monthly totals. That keeps the logic readable and avoids self-joining the monthly table to itself.

You can compute the month key as a string (TO_CHAR) or as a real date (first day of month via date_trunc('month', created_at)), then format at the end. Using a date-month key is often safer because it preserves true chronological ordering, while strings can bite you if formatting is inconsistent.

The provided solution is ordered by the same YYYY-MM string it groups by, which works because the format is fixed-width and lexicographically sortable.

Evaluation Criteria / Rubric (How Interviewers Score It)

A strong solution produces exactly one row per month, uses LAG() to access prior month revenue, applies the formula correctly, and returns NULL for the first month. It also sorts chronologically and rounds to 2 decimals.

An okay solution gets the numbers right but has fragile ordering (e.g., ordering by month name or by a non-sortable string), or it repeats the same LAG(...) call multiple times without making the intent clear.

A weak solution computes daily deltas instead of monthly, compares the wrong months, or calculates percent change incorrectly (common sign errors or dividing by the current month instead of the prior month).

Common Mistakes (What Candidates Do Wrong + How To Avoid)

A common mistake is applying LAG() directly on raw transaction rows. That gives you the previous transaction, not the previous month’s total, so the percent change becomes meaningless.

Another mistake is grouping by MONTH(created_at) without including the year, which merges January 2019 and January 2020 into the same bucket. People also often forget to handle the first month (they try to force a value instead of leaving it NULL), or they write the percent formula as (this - last) / this, which produces a different metric.

Edge Cases (Assumptions + What Breaks the Solution)

If the prior month’s revenue is zero, the percent change is undefined (division by zero). In real pipelines, you’d usually return NULL, infinity, or define a business rule (e.g., treat as 100% if this month > 0).

Missing months are another edge case: if a month has no transactions, it won’t appear unless you generate a calendar table. The window function will then compare non-adjacent months (e.g., March compared to January), which may or may not match the business definition of “month-over-month.”

Solution

1) Aggregate revenue to the monthly level (one row per month)

PostgreSQL

Tables: sf_transactions

2) Bring in last month’s revenue using LAG() (window function)

Now we add a window function so each month can “see” the revenue from the previous month. LAG(monthly_revenue) shifts the monthly revenue down by one row in chronological order. This step sets us up to calculate month-over-month change cleanly without a self-join.

PostgreSQL

Tables: sf_transactions

3) Final query: calculate the MoM % change and round (complete solution)

Finally, we compute the percentage change using the formula in the prompt. We round it to 2 decimals using ROUND(..., 2). The first month will naturally have a NULL percentage change because there’s no “previous month” to compare against.

PostgreSQL

Go to the question on the platformTables: sf_transactions

Output

year_month	revenue_diff_pct
2019-01
2019-02	-28.56
2019-03	23.35
2019-04	-13.84
2019-05	13.49
2019-06	-2.78
2019-07	-6
2019-08	28.36
2019-09	-4.97
2019-10	-12.68
2019-11	1.71
2019-12	-2.11

Amazon SQL Interview Questions #7: Exclusive Amazon Products

Exclusive Amazon Products

HardID 9608

Find products which are exclusive to only Amazon and therefore not sold at Top Shop and Macy's. Your output should include the product name, brand name, price, and rating.

Two products are considered equal if they have the same product name and same maximum retail price (mrp column).

Go to the Question

Data View

Table: innerwear_macys_com

product_name	mrp	price	pdp_url	brand_name	product_category	retailer	description	rating	review_count	style_attributes	total_sizes	available_size	color
Sheer Enough Sheer Lace Hipster 845253	$48.00	$33.60	http://www1.macys.com/shop/product/wacoal-sheer-enough-sheer-lace-hipster-845253?ID=2502262&CategoryID=65739	Wacoal	Women - Lingerie & Shapewear - Designer Lingerie	Macys US	Lovely, light and perfectly unique. This sheer lace and mesh hipster from Wacoal is whisper-soft beneath your favorite outfits.			["Mesh and lace front and back panels", "Lined at gusset", "Nylon/spandex; gusset: cotton", "Hand wash", "Imported", "Web ID: 2502262"]	["S", "M", "XL"]	["S", "M", "XL"]	High Rise
Retro Chic Contour Bra 853186	$65.00	$45.50	http://www1.macys.com/shop/product/wacoal-retro-chic-contour-bra-853186?ID=4355583&CategoryID=65739	Wacoal	Women - Lingerie & Shapewear - Designer Lingerie	Macys US	Superb support and lovely, feminine details come together in this seamless, full coverage, molded bra from Wacoal.	5	3	["Adjustable, restricted, stretch straps with back release", "Kicked-in back straps", "Seamless, underwire, full-coverage cups with supportive spacers", "Two needle wire cover", "Triple back hook-and-eye closure", "Polyester/spandex; lace: nylon", "Hand wash", "Imported", "Web ID: 4355583"]	["32D", "42DD", "44D", "44DD", "44DDD"]	["32D", "42DD", "44D", "44DD", "44DDD"]	White
b.splendid Seamless Hipster 978255	$13.00	$13.00	http://www1.macys.com/shop/product/b.temptd-by-wacoal-b.splendid-seamless-hipster-978255?ID=4427348&CategoryID=55805	b.tempt'd by Wacoal	Women - Lingerie & Shapewear - Panties	Macys US	Soft, sleek heather fabric is seamlessly designed for the smoothest look yet in the b.splendid hipster from b.tempt'd by Wacoal.			["Elastic waistband", "Smooth stitching at the leg openings for no visible panty lines", "Lined at gusset", "Nylon/polyester/spandex; gusset: cotton/spandex", "Hand wash", "Imported", "Web ID: 4427348"]	["S", "M", "L", "XL"]	["S", "M", "L", "XL"]	Limoges/Heather
Icon Modern T-Shirt Bra F3646	$42.00	$42.00	http://www1.macys.com/shop/product/calvin-klein-icon-modern-t-shirt-bra-f3646?ID=926021&CategoryID=65739	Calvin Klein	Women - Lingerie & Shapewear - Designer Lingerie	Macys US	Make sure you add this style to your collection. Calvin Klein's Icon Modern T-Shirt Bra features seamless contour demi cups with encased wire for extra support. Style #F3646	4.6	17	["Fully-adjustable straps", "Seamless, contour demi cups with encased wire for support", "Mesh wings with picot trim and mesh at center gore", "Padded closure for comfort", "Nylon/elastane", "Hand wash", "Imported", "Web ID: 926021", "Halter strap at back neck", "Unlined, wireless cups", "Wide lace band under bust", "Pull-on style", "Cotton/elastane", "Hand wash", "Imported"]	["32A", "32B", "32C", "32D", "32DD", "34A", "34B", "34C", "34D", "34DD", "36A", "36B", "36C", "36D", "36DD", "38B", "38C", "38D", "38DD"]	["32A", "32B", "32C", "32D", "34A", "34B", "34C", "34D"]	Smoke
Jogger Pants QS5716	$58.00	$58.00	http://www1.macys.com/shop/product/calvin-klein-modern-cotton-jogger-pants-qs5716?ID=3676035&CategoryID=65739	Calvin Klein Modern Cotton	Women - Lingerie & Shapewear - Designer Lingerie	Macys US	A super soft cotton-blend makes these jogger pants from Calvin Klein the perfect choice when you're looking for total comfort.			["Wide elastic waistband features logo", "Off-seam pockets", "Full-length with ribbed cuffs", "Cotton/polyester", "Machine washable", "Imported", "Web ID: 3676035", "Halter strap at back neck", "Unlined, wireless cups", "Wide lace band under bust", "Pull-on style", "Cotton/elastane", "Hand wash", "Imported"]	["XS", "S", "M", "L", "XL"]	["XS", "S", "M", "L", "XL"]	Black

Loading Dataset

Table: innerwear_topshop_com

product_name	mrp	price	pdp_url	brand_name	product_category	retailer	description	rating	review_count	total_sizes	available_size	color
Cotton Tanga Mini Knickers	$20.00	$20.00	http://www.topshop.com/en/tsus/product/clothing-70483/lingerie-2313852/cotton-frill-bra-and-tanga-knickers-set-6622346	US TOPSHOP	Lingerie	Topshop US	A set of flattering white cotton tanga mini knickers and matching frilly bra set. 100% Cotton. Machine wash.			2,4,6,8,10,12	2,4,6,8,10,12
Cotton Triangle Bra	$40.00	$40.00	http://www.topshop.com/en/tsus/product/clothing-70483/lingerie-2313852/cotton-frill-bra-and-tanga-knickers-set-6622346	US TOPSHOP	Lingerie	Topshop US	A set of flattering white cotton tanga mini knickers and matching frilly bra set. 100% Cotton. Machine wash.			2,4,6,8,10,12	8
Lace Triangle Bra	$30.00	$30.00	http://www.topshop.com/en/tsus/product/clothing-70483/lingerie-2313852/lace-triangle-bra-6584097	US TOPSHOP	Lingerie	Topshop US	Add a colour pop to your underwear drawer with this chartreuse lace triangle bra with charcoal contrast waistband. Complete the look with the matching knickers. 85% Nylon, 15% Elastane. Machine wash.	5	1	2,4,6,8,10,12,14	2,4,8,12	CHARTREUSE
**MATERNITY Lace and Mesh Nursing Bra	$35.00	$35.00	http://www.topshop.com/en/tsus/product/clothing-70483/lingerie-2313852/maternity-lace-and-mesh-nursing-bra-5408055	US TOPSHOP	Lingerie	Topshop US	Add a fresh, feminine touch to your lingerie with this lace and mesh detail bra. With a classic hook fastening at the back and adjustable straps. Designed to wear through pregnancy and beyond . 90% Polyamide, 10% Elastane. Machine wash.	5	1	34B,34C,34D,34DD,36B,36C,36D,36DD	36B	Pale Pink
Sporty Branded Thong	$10.00	$10.00	http://www.topshop.com/en/tsus/product/clothing-70483/lingerie-2313852/sporty-branded-bra-and-thong-set-6564783	US TOPSHOP	Lingerie	Topshop US	Add a sporty feel to your lingerie collection with this comfy bra and thong duo. Each piece features subtle 'Topshop' branding. 49% Polyamide, 39% Polyester, 12% Elastane. Machine wash.			S,M,L	M

Loading Dataset

Table: innerwear_amazon_com

product_name	mrp	price	pdp_url	brand_name	product_category	retailer	description	rating	review_count	style_attributes	total_sizes	available_size	color
Wacoal Women's Full Figure Basic Beauty Underwire Bra	$50.00	$50.00	https://www.amazon.com/-/dp/B005FR9XVK?th=1&psc=1	Wacoal	Bras	Amazon US	Seamless molded two-ply cups with inner sling for smooth support	4.2	960	[ 85% Nylon/15% Spandex , Imported , Hook and Eye closure , Hand Wash , Full-coverage bra with built-in camisole strap with stretch back release , Cups with hidden inner sling for shape and support , Band and sides smooth and minimize bulge , Hook-and-eye closure ]	32D , 32DD , 32DDD , 32G , 34C , 34D , 34DD , 34DDD , 34G , 34H , 36C , 36D , 36DD , 36DDD , 36G , 36H , 38C , 38D , 38DD , 38DDD , 38G , 38H , 40C , 40D , 40DD , 40DDD , 40G , 40H , 42C , 42D , 42DD , 42DDD , 42G , 42H , 44C , 44D , 44DD , 44DDD , 44G , 44H	32D , 32DD , 32DDD , 32G , 34C , 34D , 34DD , 34DDD , 34G , 34H , 36C , 36D , 36DD , 36DDD , 36G , 36H , 38C , 38D , 38DD , 38DDD , 38G , 38H , 40C , 40D , 40DD , 40DDD , 40G , 40H , 42C , 42D , 42DD , 42DDD , 42G , 44C , 44D , 44DD , 44DDD , 44G , 44H	Naturally Nude
Calvin Klein Women's Bottoms Up Hipster Panty	$12.00	$11.00	https://www.amazon.com/-/dp/B007F8RVDO?th=1&psc=1	Calvin-Klein	Panties	Amazon US	The bottoms up hipster features color prints, a refined lace trim, and a thin elasticized waistband for comfort and shape retention.	4.5	99	[ 82%-84% Nylon 16%-18% Elastane , Imported , Machine Wash , Super soft microfiber , Lace trim ]	Small , Medium , Large	Small , Medium	Buff
Wacoal Women's Retro Chic Underwire Bra	$60.00	$60.00	https://www.amazon.com/-/dp/B007JTYQQY?th=1&psc=1	Wacoal	Bras	Amazon US	Beautiful low plunge chantilly lace bra with superior support. \|Beautiful low plunge chantilly lace bra	4.4	753	[ 82% Nylon/ 18% Spandex/Elastane , Hand Wash , Full-coverage bra featuring lace cups with mesh yoke , Band and sides smooth and minimize bulge , Seamed cups for superior lift, shape, and support , Leotard back ]	30B , 30D , 32B , 32C , 32D , 32DD , 32DDD , 32G , 34C , 34D , 34DD , 34DDD , 34G , 34H , 34I , 36C , 36D , 36DD , 36DDD , 36G , 36H , 36I , 38C , 38D , 38DD , 38DDD , 38G , 38H , 38I , 40C , 40D , 40DD , 40DDD , 40G , 40H , 40I , 42D , 42DD , 42DDD , 42G , 42H , 44D , 44DD , 44DDD , 44G , 44H , 46D , 46DD , 46DDD , 46G , 46H , 48H	32D , 32DD , 32DDD , 34C , 34D , 34DD , 34DDD , 34G , 36C , 36D , 36DD , 36DDD , 36G , 36H , 38C , 38DD , 38DDD , 38G , 38H , 40C , 40D , 40DD , 40DDD , 40G , 40H , 42D , 42DD , 42DDD	Ivory
Calvin Klein Women's Carousel 3 Pack Thong	$33.00	$19.99	https://www.amazon.com/-/dp/B01MZ8D589?th=1&psc=1	Calvin-Klein	Panties	Amazon US	This carousel thong 3-pack features classic cotton blend fabrication and an iconic Calvin Klein repeating logo waistband.	4	2	[ Cotton , Imported , Contrasting elasticized Calvin Klein logo waistband , Cotton gusset , Three low-rise thong panties each featuring logoed waistband and cotton gusset ]	Women's Large / 12-14 , Small , Medium , Large	Medium , Large	Salvia/Grey Heather/Sultry
b.tempt'd by Wacoal Women's Lace Kiss Bralette	$18.00	$11.65	https://www.amazon.com/-/dp/B00SHYSSGE?th=1&psc=1	b-temptd	Bras	Amazon US	Lace kiss bralette has soft allover lace that make a beautiful underpinning	4	512	[ 100% Nylon , Imported , Hand Wash , Lace bralette featuring semi-sheer cups, scalloped trim, and adjustable straps ]	Small , Medium , Large , X-Large	Medium	Night/Animal Accent

Loading Dataset

The important modeling detail is the equality rule: “same product” is defined by the pair (product_name,mrp), not by URL, not by price, not by brand. That means you’re doing a set difference problem on a composite key.

Grain (what one output row means): one row per Amazon product that does not have an equivalent entry (by product_name + mrp) in TopShop or Macy’s.

Interview Framing (How It’s Asked + Typical Follow-Ups)

In interviews, this is usually framed as “find exclusives” or “dedupe across sources,” and the follow-ups are where people lose points: “What does ‘same product’ mean?”, “Do we compare by name only or name + MRP?”, and “If Macy’s lists the same product twice, does it matter?”

Those questions are a hint that the core skill is choosing the right join key/comparison key before you write SQL.

A common follow-up is portability: “Would you solve this with NOT IN, NOT EXISTS, or an anti-join?” Because the correctness depends on null behavior and key definition.

Trade-Offs / Decision Rules (Why This Approach vs Alternatives)

The provided solution uses a tuple NOT IN against a union of competitor keys. That’s concise and reads like the business requirement: “keep Amazon rows whose (name, mrp) pair is not present elsewhere.”

In practice, many people prefer NOT EXISTS (or a left anti-join) because NOT IN can behave unexpectedly if the subquery contains nulls. If product_name or mrp can be null in the competitor catalogs, a single null can make the NOT IN filter out everything, depending on dialect and query shape.

So the decision rule is: NOT IN is fine if the key columns are guaranteed non-null; otherwise, NOT EXISTS is the safer interview choice.

Edge Cases (Assumptions + What Breaks the Solution)

The biggest edge case is nulls in the comparison key. If any competitor row has product_name or mrp null, the tuple NOT IN can become unreliable.

Another edge case is normalization: if two retailers format names differently (“Calvin Klein Women’s…” vs “Calvin Klein Womens…”), they won’t match even if they are conceptually the same product.

This question explicitly avoids fuzzy matching by defining equality as exact product_name + mrp. Finally, duplicates don’t change the logic here because you’re comparing set membership, not counts, but you still want DISTINCT on the competitor key list to keep the subquery smaller.

Validation Checks (Fast Sanity Tests)

A quick correctness check is to take a couple of returned Amazon products and search for the same (product_name, mrp) pair in the TopShop and Macy’s tables; there should be zero matches.

Then pick a known overlapping product from Macy’s or TopShop and confirm it does not appear in the Amazon-exclusive output. If the result set is unexpectedly empty, the first thing to test is whether competitor keys contain nulls: rewrite using NOT EXISTS and compare counts to diagnose null-related behavior.

Solution

Let’s code our logic.

1) Build the list of product keys sold at Macy’s or TopShop

We first extract only the comparison keys: product_name and mrp. UNION ALL combines both retailer lists into one dataset (duplicates don’t really hurt, but we can DISTINCT later). This gives us “everything that exists outside Amazon,” according to the equality definition.

PostgreSQL

Tables: innerwear_macys_com, innerwear_topshop_com, innerwear_amazon_com

2) Deduplicate that competitor key list (so matching is clean)

Now we wrap the union and select DISTINCT to get a unique set of (product_name, mrp) pairs. This avoids repeated keys showing up multiple times if the same product appears in multiple rows. The result is the exact list we want to exclude from Amazon.

PostgreSQL

Tables: innerwear_macys_com, innerwear_topshop_com, innerwear_amazon_com

3) Final query: return Amazon-only products (complete solution)

Finally, we select products from Amazon and exclude anything that appears in the competitor key set. The (product_name, mrp) NOT IN (...) check matches your equality rule exactly. Then we return the required Amazon columns: product_name, brand_name, price, and rating.

PostgreSQL

Go to the question on the platformTables: innerwear_macys_com, innerwear_topshop_com, innerwear_amazon_com

Output

product_name	brand_name	price	rating
Calvin Klein Women's Bottoms Up Hipster Panty	Calvin-Klein	$11.00	4.5
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$60.00	4.4
Calvin Klein Women's Carousel 3 Pack Thong	Calvin-Klein	$19.99	4
b.tempt'd by Wacoal Women's Lace Kiss Bralette	b-temptd	$11.65	4
Wacoal Women's Front Close T-Back Bra	Wacoal	$46.00	4.2
Calvin Klein Women's Modern Cotton Bralette and Bikini Set	Calvin-Klein	$44.00	4.6
Calvin Klein Women's 3 Pack Invisibles Hipster Panty	Calvin-Klein	$29.75	3.9
Wacoal Womens Basic Beauty Contour T-Shirt Bra	Wacoal	$11.95	4.2
Wacoal Women's Underwire Sport Bra	Wacoal	$65.00	4.3
Hanky Panky Women's Vikini Panty	Hanky-Panky	$30.00	4
Wacoal Women's Halo Underwire Bra	Wacoal	$48.00	4.4
Wacoal Women's Basic Beauty Contour Bra	Wacoal	$55.00	4
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$60.00	4.5
Wacoal Women's How Perfect Soft Cup Bra	Wacoal	$51.52	4.2
Wacoal Women's Body By Wacoal Underwire Bra	Wacoal	$43.00	4.4
Wacoal Women's How Perfect Soft Cup Bra	Wacoal	$60.00	4.1
Wacoal Women's Retro Chic Contour Bra	Wacoal	$65.00	4.2
Wacoal Women's Halo Strapless Bra	Wacoal	$46.00	4.2
Calvin Klein Women's Modern Cotton Boyshort Panty	Calvin-Klein	$19.49	4.4
Wacoal Women's Underwire Sport Bra	Wacoal	$52.67	4.3
b.tempt'd by Wacoal Womens Ciao Bella Tanga Panty	b-temptd	$19.00	4.1
Wacoal Women's Embrace Lace Bra	Wacoal	$44.50	4.3
Calvin Klein Women's Naked Glamour Strapless Push Up Bra	Calvin-Klein	$28.27	3.9
Calvin Klein Women's Perfectly Fit Lightly Lined Memory Touch T-Shirt Bra	Calvin-Klein	$41.40	4.3
Wacoal Women's Amazing Assets Contour Bra	Wacoal	$65.00	3.8
Wacoal Women's Sport Zip Front Contour Bra	Wacoal	$69.84	3.8
Wacoal Women's Awareness Underwire Bra	Wacoal	$65.00	4.4
Wacoal Women's Bodysuede Underwire Bra	Wacoal	$60.00	4.4
Wacoal Women's Underwire Sport Bra	Wacoal	$45.50	4.3
Wacoal Women's How Perfect Soft Cup Bra	Wacoal	$60.00	4.1
Wacoal Embrace Lace Bikini Panty	Wacoal	$27.00	4.3
Wacoal Women's Halo Underwire Bra	Wacoal	$34.95	4.3
Wacoal Women's Lace Affair Bikini Panty	Wacoal	$22.00	4.8
Wacoal Women's Awareness Underwire Bra	Wacoal	$65.00	4.4
Wacoal Women's Basic Beauty Contour Bra	Wacoal	$55.00	4
Calvin Klein Women's Naked Glamour Strapless Push Up Bra	Calvin-Klein	$29.99	3.9
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$48.00	4.4
Calvin Klein Women's Perfectly Fit Lightly Lined Memory Touch T-Shirt Bra	Calvin-Klein	$46.00	4.3
Calvin Klein Women's 3 Pack Carousel Thong Panty	Calvin-Klein	$20.99	4.7
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$48.00	4.4
Calvin Klein Women's Sheer Marquisette Demi Unlined Bra	Calvin-Klein	$36.00	4.6
Wacoal Women's Basic Beauty Front Close Contour Bra	Wacoal	$55.00	4.2
Wacoal Women's Retro Chic Contour Bra	Wacoal	$52.97	4.2
Calvin Klein Women's Perfectly Fit Lightly Lined Memory Touch T-Shirt Bra	Calvin-Klein	$40.48	4.3
Wacoal Women's Body By Wacoal Underwire Bra	Wacoal	$39.46	4.4
Calvin Klein Women's Standard Radiant Cotton Bikini Panty	Calvin-Klein	$9.88	4.4
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$60.00	4.4
b.tempt'd by Wacoal Women's Ciao Bella Balconette Bra	b-temptd	$38.00	4.3
Wacoal Women's Bodysuede Underwire Bra	Wacoal	$60.00	4.4
Wacoal Women's Halo Underwire Bra	Wacoal	$48.00	4.4
Wacoal Women's Embrace Lace Bra	Wacoal	$35.00	4.3
Calvin Klein Women's Naked Glamour Strapless Push Up Bra	Calvin-Klein	$29.99	3.8
Wacoal Womens Basic Beauty Contour T-Shirt Bra	Wacoal	$54.77	4.2
Wacoal Women's Embrace Lace Bikini Panty	Wacoal	$27.00	4.3
Calvin Klein Women's Sheer Marquisette Demi Unlined Bra	Calvin-Klein	$26.19	4.5
Wacoal Womens Basic Beauty Contour T-Shirt Bra	Wacoal	$55.00	4.2
Calvin Klein Women's Everyday Lightly Lined Demi Bra	Calvin-Klein	$38.00	3.8
Calvin Klein Women's Bottoms Up Bikini Panty	Calvin-Klein	$8.00	4.3
Calvin Klein Women's ID Wide Waistband Unlined Triangle Cotton Bralette	Calvin-Klein	$19.01	3.8
Wacoal Women's Body By Wacoal Underwire Bra	Wacoal	$46.00	4.4
Calvin Klein Women's Sheer Marquisette Demi Unlined Bra	Calvin-Klein	$27.00	4.6
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$69.99	4.4
b.tempt'd by Wacoal Women's Ciao Bella Balconette Bra	b-temptd	$38.00	4.3
Wacoal Women's Red Carpet Strapless Bra	Wacoal	$65.00	4.4
Calvin Klein Women's Seductive Comfort Lift Strapless Multiway Bra	Calvin-Klein	$39.60	4.1
Wacoal Women's Slimline Seamless Minimizer Bra	Wacoal	$65.00	4.3
Wacoal Women's Awareness Underwire Bra	Wacoal	$55.25	4.4
Wacoal Women's Embrace Lace Bra	Wacoal	$50.00	4.3
Hanky Panky Women's Bare Godiva Thong Panty	Hanky-Panky	$25.00	4.2
Calvin Klein Women's Seductive Comfort Lift Strapless Multiway Bra	Calvin-Klein	$39.60	3.9
Wacoal Women's Halo Underwire Bra	Wacoal	$48.00	4.4
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$48.00	4.4
Calvin Klein Women's 4 Pack Stretch Lace Bikini Panty	Calvin-Klein	$28.01	4.1
Calvin Klein Women's Perfectly Fit Lightly Lined Memory Touch T-Shirt Bra	Calvin-Klein	$34.50	4.3
Wacoal Women's Red Carpet Strapless Bra	Wacoal	$65.00	4.4
Calvin Klein Women's ID Tanga Wide Waistband Cotton Panty	Calvin-Klein	$16.97	4.3
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$60.00	4.4
Calvin Klein Women's Ombre 5 Pack Thong	Calvin-Klein	$59.99	5
Wacoal Women's How Perfect Soft Cup Bra	Wacoal	$60.00	4.2
Wacoal Womens Basic Beauty Contour T-Shirt Bra	Wacoal	$55.00	4.2
Wacoal Women's Underwire Sport Bra	Wacoal	$42.90	4.3
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$60.00	4.4
Wacoal Women's Retro Chic Underwire Bra	Wacoal	$54.60	4.4
Wacoal Womens Basic Beauty Contour T-Shirt Bra	Wacoal	$55.00	4.2
Wacoal Womens Basic Beauty Contour T-Shirt Bra	Wacoal	$55.00	4.2
Calvin Klein Women's Perfectly Fit Lightly Lined Memory Touch T-Shirt Bra	Calvin-Klein	$41.40	4.3
Calvin Klein Women's Modern Cotton Bikini	Calvin-Klein	$11.36	4.7
Wacoal Women's Sport Contour Bra	Wacoal	$47.40	4.2
b.tempt'd by Wacoal Women's Ciao Bella Balconette Bra	b-temptd	$21.89	4.3
Wacoal Women's Sport Contour Bra	Wacoal	$65.96	4.2
Wacoal Women's Bodysuede Underwire Bra	Wacoal	$60.00	4.3
Wacoal Women's Halo Strapless Bra	Wacoal	$46.00	4.2

Amazon SQL Interview Questions #8: First Day Retention Rate

First Day Retention Rate

Last Updated: February 2022

HardID 2090

Calculate the first-day retention rate of a group of video game players. The first-day retention occurs when a player logs in 1 day after their first-ever log-in. Return the proportion of players who meet this definition divided by the total number of players.

Go to the Question

Data View

Table: players_logins

player_id	login_date
101	2021-12-14
101	2021-12-18
101	2021-12-15
101	2021-12-19
102	2021-12-31

Loading Dataset

This dataset looks simple (just player_id and login_date), but the interview goal is whether you translate “first-ever login” into the correct baseline per player, and then avoid double-counting players who logged in multiple times.

Grain (what one output row means): one row total, a single retention_rate for the whole population.

What the table implies: multiple rows per player_id (a player can log in many times), so any join back to the raw logins can multiply rows and inflate counts if you don’t use COUNT DISTINCT.

Interview Framing (How It’s Asked + Typical Follow-Ups)

Interviewers usually phrase this as “day-1 retention” or “D1 retention,” then follow up with clarifications like “Is it exactly one day after, or within 24 hours?” and “Do multiple logins count multiple times?”

In SQL interviews, the expected interpretation is usually a calendar-day difference (since the column is a date here), and retention is a per-player event (a player is either retained or not).

A common follow-up is: “What if someone logs in multiple times on day 1?”, which is basically a test of whether you count players, not sessions.

Trade-Offs / Decision Rules (Why This Approach vs Alternatives)

This CTE approach is clean because it separates the baseline computation (first login date) from the retention check (did they show up on first_day + 1). It’s also easy to audit step-by-step.

Alternatives that are also valid:

A correlated EXISTS (“does a day+1 login exist for this player?”) can be simpler to read, and it avoids join-multiplication issues naturally.
Pre-aggregating to distinct (player_id, login_date) first can protect you if the raw data has duplicates.

If you’re unsure about duplicates, default to COUNT(DISTINCT player_id) in numerator/denominator, or use EXISTS.

Evaluation Criteria / Rubric (How Interviewers Score It)

A strong answer usually hits these points: it correctly defines each player’s first-ever login date, checks for a login exactly one day after that baseline, counts retained players uniquely (not per row), keeps the denominator as all unique players, and returns a non-integer proportion (float/decimal), not truncated integer division.

Common Mistakes (What Candidates Do Wrong + How To Avoid)

A frequent mistake is counting rows instead of players after joining back to logins, which overstates retention for players with multiple logins.

Another common slip is using BETWEEN 0 AND 1 (which accidentally includes same-day logins) even though the prompt explicitly says “1 day after.” People also sometimes compute “day-1 after each login” instead of after the first-ever login, which answers a different question.

Edge Cases (Assumptions + What Breaks the Solution)

If a player’s first login is near the end of the dataset window, they might not have an opportunity to log in the next day, and this affects interpretation, but the question doesn’t mention censoring, so we include them in the denominator.

If timestamps (not dates) were used, you’d need to define whether “1 day” means “next calendar day” or “within 24 hours.” Also, if the table can contain duplicate (player_id, login_date) rows, you’ll want a distinct day-level table (or keep COUNT(DISTINCT ...)) to prevent inflation.

Solution

Let’s code our logic.

1) Find each player’s first-ever login date

We start by grouping by player_id and taking the minimum login_date. This gives us a single first_day anchor date per player. Once we have that, we can compare all other logins against it.

PostgreSQL

Tables: players_logins

2) Identify players who logged in exactly 1 day after their first login

Now we join the first login dates back to the raw login table. This lets us compute login_date - first_day for each login event per player. We then filter to keep only events where the difference equals 1 day, which matches the retention definition.

PostgreSQL

Tables: players_logins

3) Final query: compute retention rate as retained / total (complete solution)

Finally, we count how many distinct players satisfy the first-day retention condition and divide by the total number of distinct players. We cast the numerator to FLOAT so we get a decimal result instead of integer division. The final output is a single value: the first-day retention rate.

PostgreSQL

Go to the question on the platformTables: players_logins

Output

retention_rate
0.5

How to Solve Amazon SQL Questions (Interview Framework)

How to Solve Amazon SQL Interview Questions

Common Amazon SQL Mistakes (and How to Avoid Them)

Amazon-style SQL interviews are designed to test whether you can work with data the way it actually exists in production: messy, full of edge cases, and requiring careful thought. Here are the most common mistakes candidates make:

1) Losing track of table grain (then patching it with DISTINCT)

This happens when you join a one-row-per-user table to a many-rows-per-user fact table without thinking through the implications. Suddenly, one customer's revenue gets multiplied by however many transactions they have, and your metrics are wildly inflated.

The solution: Always state the grain out loud before writing your joins, "this table has one row per purchase, this one has one row per user." Aggregate to the grain you need before joining, or be very deliberate about when and how you group after the join.

2) Forgetting about returns, refunds, and cancellations

Negative values in your data can throw off revenue calculations significantly. If the prompt asks you to exclude returns but you forget to filter them out, your numbers will be off or even negative.

Handle this upfront: decide whether you're including or excluding these transactions, and filter explicitly (for example, WHERE purchase_amount >= 0). Amazon prompts typically call this out directly, so read the requirements carefully.

3) Treating dates as strings

This is a subtle but common error. When you group or sort dates that have been converted to strings too early, you end up with alphabetical sorting instead of chronological, 2024-02 appearing after 2024-10, for instance.

Always use proper date/timestamp functions for grouping and sorting, and only format dates for display purposes at the very end of your query. This is especially important when output requirements specify formats such as YYYY-MM.

4) Rolling metrics without accounting for partial windows

When calculating rolling averages or other window metrics, your first few time periods often don't have complete data windows. If you're not careful, these partial windows get treated as if they're complete, skewing your results.

Follow the prompt's definition precisely. If your data starts midstream, acknowledge that early windows are incomplete and either keep them labeled as partial or filter them out, depending on what the question actually requires.

5) Window functions that appear correct but have logical errors

Forgetting to include PARTITION BY, or using a default window frame that doesn't match the question, is extremely common. You might think you're calculating a metric per user when you're actually calculating it across all users.

Always specify what you're partitioning by (user, product, time period, etc.), order by the appropriate column, and verify your logic by tracing through one example manually. Amazon interview guidance specifically highlights missing partitions as a frequent pitfall.

6) LEFT JOINs accidentally becoming INNER JOINs

This happens when you write a LEFT JOIN to preserve all rows from your left table, then add a WHERE clause that filters on the right table. That filter drops all the non-matching rows, and converts your LEFT JOIN into an INNER JOIN.

If you want to filter the right table while keeping unmatched left rows, put those conditions in the ON clause instead of WHERE. This is one of the classic interview mistakes.

7) NULL handling errors

WHERE column = NULL doesn't work, it returns nothing. NULLs also behave unexpectedly in counts, comparisons, and aggregations, often leading to incorrect results.

Use IS NULL and IS NOT NULL for comparisons, and think through what NULL actually represents in your context: unknown, not applicable, or zero. Amazon interview materials specifically call out NULL behavior as a frequent trap.

8) Confusion between WHERE and HAVING in grouped queries

Using WHERE to filter aggregate results generates an error, while filtering row-level data in HAVING can change your metrics in unexpected ways.

The rule is straightforward: row-level filters belong in WHERE, and aggregate filters belong in HAVING. This is one of those "easy to miss" mistakes that Amazon interview guides consistently highlight.

2-Week Amazon SQL Prep Plan

This plan assumes you’re practicing with Amazon-style, scenario-driven SQL questions where the real test is translating a question prompt into a clean definition (grain + constraints), then picking a pattern that survives duplicates, ties, and missing rows.

How to use this plan day-to-day

Aim for 60-90 minutes a day (or more, if you are able).

For each question, force yourself to write down (mentally or on paper) the grain and the “trap” before you touch SQL. After you solve it, do a 2-minute validation pass: row counts, spot checks, and “did I accidentally change the grain with joins?”

Week 1: Fundamentals + Amazon patterns (Easy to Medium)

Day 1: Grain discipline + basic aggregation
Do “Total Cost of Orders” (ID 10183). Focus on grouping at the correct grain (customer) and validating totals.

Day 2: Filtering cohorts by date + group and sort
Do “Workers by Department Since April” (ID 9847). Treat it as “cohort + breakdown,” and be strict about inclusive boundaries (>=).

Day 3: Max-with-constraints + ties
Do “Workers With The Highest Salaries” (ID 10353). The key is computing the max over the qualified population (titled workers) and returning all ties.

Day 4: Retention-style definition (second purchase within 1-7 days)
Do “Finding User Purchases” (ID 10322). Lock the definition of “second purchase” (second distinct day, not second row), then implement with a clear stepwise approach.

Day 5: Ranking done right (ties + top N ranks)
Do “Top-Rated Support Employees” (ID 10554). Practice choosing DENSE_RANK vs RANK vs ROW_NUMBER and explaining why.

Day 6: Review day (speed + explanation)
Re-do Day 1-5 questions under a timer. Your goal is not new SQL, it’s cleaner reasoning: state grain, state trap, then code. If you rely on DISTINCT to “fix” duplicates, stop and re-check joins/grain.

Day 7: Mock interview mini-loop
Pick any 2 questions from the week and “talk through” your approach out loud. If you can’t explain why you chose a pattern (CTE vs EXISTS vs window), you’re not interview-ready yet.

Week 2: Advanced patterns (Medium to Hard)

Day 8: Time-series KPI shaping (monthly rollup + MoM change)
Do “Monthly Percentage Difference” (ID 10319). The must-have habit: aggregate to the month first, then LAG the monthly totals.

Day 9: Set difference across sources (anti-join / NOT EXISTS / NOT IN)
Do “Exclusive Amazon Products” (ID 9608). Practice arguing the trade-off: NOT IN is concise but can be risky with NULLs; NOT EXISTS is safer.

Day 10: Retention rate as a ratio (dedupe + denominator correctness)
Do “First Day Retention Rate” (ID 2090). Your focus is counting players, not login rows, and preventing join multiplication.

Day 11: Mixed practice set (3 questions, back-to-back)
Do one easy + one medium + one hard from the list above. This simulates interview context switching. Keep your walkthrough structure consistent: assumptions, grain, trap, solution, and validate.

Day 12: Failure-mode day (break your own query)
Pick 2 hard-ish questions and actively try to break your solution: add a tie, add duplicates, add NULLs, remove a month, add multiple rows per key. If your query changes meaning, rewrite it in a more robust pattern.

Day 13: Full mock (45-60 minutes)
Do 2 questions under time pressure: one ranking/window question and one set logic question. Afterward, rewrite each solution to be more readable (fewer moving parts, clearer grain control).

Day 14: Final consolidation checklist
You’re ready if you can do all of this reliably: define grain first, pre-aggregate before ranking, preserve ties intentionally, choose the right anti-join pattern, and validate outputs with quick sanity checks (counts, totals, spot checks).

Practical rule of thumb for progression

Start with the Easy questions to lock in grain + aggregation discipline, move to Medium for window functions and definition traps, then finish with Hard for set logic and time-based analysis. If you’re getting stuck, it’s usually not “SQL syntax”; it's that the data definition isn’t pinned down yet.

More Big Tech SQL Interview Guides (Meta, DoorDash, Uber, etc.)

If you’re prepping for Amazon, it helps to practice on other Big Tech question styles too. The datasets change, but the patterns repeat: user activity logs, orders/transactions, time windows, and window functions.

Meta (Facebook) SQL Interview Questions: Great for analytics-style metrics (engagement, feeds, user activity) and clean aggregation + filtering patterns.
DoorDash SQL Interview Questions: Strong practice for marketplace-style data and messy edge cases where date logic and formatting matter.
Uber SQL Interview Questions: Solid mix of business metrics and time-based analysis that often pushes you into window functions.
Google SQL Interview Questions: Good when you want structured thinking and careful problem setup before writing code.
Microsoft SQL Interview Questions: Helpful for sharpening fundamentals, plus practical query design under real constraints.

FAQs:

How hard is the Amazon SQL interview?

Amazon’s SQL questions can range from easy to medium and hard, and they tend to be practical coding problems rather than product/theory trivia.

For example, this interview question is a medium-level Amazon SQL question, while in this one, the rolling-average problem is explicitly described as hard because it requires a clear understanding of data manipulation and time-based calculations.

So, difficulty mostly depends on your comfort with multi-step business queries and careful requirements.

What SQL topics does Amazon ask most?

Nearly 70% of Amazon SQL questions involve data in multiple tables and require JOINs, CTEs, or subqueries. It also highlights aggregation (COUNT()/SUM() with GROUP BY) and “more complicated” window functions as other highly prevalent concepts.

For data engineering roles, EDA/validation, aggregation/metrics, joins, and text/datetime manipulation are also hot topics.

Do I need window functions for Amazon?

Yes! Plan to know window functions.

Amazon questions often use them, as interviewers generally expect candidates to use window functions when possible. For instance, in this interview question, the walkthrough is built around using DENSE_RANK() with PARTITION BY to rank sellers within each product category and filter the top three per group.

What’s the best way to practice Amazon SQL?

Do targeted, repetitive practice on real interview-style questions.

It is crucial to practice solving SQL interview questions before an Amazon interview. You can build your own solution based on the framework explained in this article, try other approaches, and then deliberately cover edge cases.

How long should I spend preparing?

If you already know SQL fundamentals, plan from 2 to 4 weeks of focused practice; if you’re still learning joins/CTEs/subqueries, budget closer to 2 to 3 months to reach an interview-ready intermediate level.

That longer window matters for Amazon because its SQL questions often involve multi-table merging (JOIN/CTE/subquery) and frequently use window functions.