Amazon Business Analyst Interview Questions

Amazon Business Analyst Interview Questions


Sharpen your skills of writing SQL queries to maximize your chances of getting a Business Analyst job at Amazon

Amazon moves more than a million packages every day. Even small improvements in the efficiency of logistical and business operations can save millions of dollars down the line. Business Analysts can analyze hard data, gain insights and communicate their findings to decision makers at the company.

In this article, we will discuss how business analysts fit into Amazon’s organizational structure. Knowing their future responsibilities can help aspiring business analysts get the job and advance their career at Amazon.

Responsibilities of Business Analyst at Amazon

Business Analysts at Amazon work with data to gain insights about business operations, logistics, customer behavior and habits, and use them to increase profits. There are two ways to do this: cut unnecessary costs and increase revenues.

Amazon, like other tech businesses today, collects large volumes of user data. Business Analysts increase revenues by finding ways to increase user spending. On the other hand, looking at data is one of the safest ways to find and eliminate inefficiency within an organization.

Interviews for this role reflect Business Analyst’s responsibilities at Amazon. Candidates have to demonstrate strong knowledge of business and the ability to work with data. Writing SQL queries is one of Business Analysts’ day to day tasks.

Prospective Business Analysts have to answer questions ranging from business cases to actual SQL challenges. SQL questions usually deal with business processes, eliminating inefficiencies and increasing revenues.

Analyzing data is only a part of a Business Analyst’s job. One of their main responsibilities is communicating insights with decision makers at the company.

What are interviewers at Amazon looking for?

Business Analysts do a lot of work in SQL, so interviewers expect a strong proficiency in this language. Unlike other data science specialties, business analysts typically don’t need more than a basic knowledge of Python. It helps to have a good understanding of Excel as well.

Having the ability to focus on the task and detect patterns in data can also help you succeed on the job. Such skills are hard to measure, so interviewers usually observe candidates in the process of answering the question.

Aspiring Business Analysts also need to show a deep knowledge of business. The ability to analyze and aggregate data is important, but not enough if you can’t gain insights and make the right conclusions in a business context. To get the sense of non-technical questions during these interviews, read a general article about Business Analyst Interview Questions.

The culture at Amazon is focused on improving customer experience and making their website the most convenient option for shopping online. Having the same focus on improving customer experience can help you land a job at Amazon.

Framework for Solving Amazon Business Analyst Interview Questions

1. Understand the question

The best way to approach an SQL challenge is to carefully read the description. Especially if the question has a lot of concepts and conditions. New questions can be confusing at first. Read them multiple times to stay on the right track.

Some questions are clear and specific. Others are broad and open to interpretation. When questions are unclear in any way, it’s best to directly ask the interviewer to specify.

Questions often introduce concepts like ‘unique user’ or operations like ‘calculate monthly difference in sales’. Some questions have specific definitions for these concepts and tasks, while others don’t.

Often a description of the task also specifies how to arrange data before returning it, how to format the output, or similar matters.

2. Analyze data

Looking at the available data is the second most important step towards solving an SQL question. If you’re lucky, the available data will be contained within one table, but that’s not always the case.

Some questions have multiple tables and each can have more than a dozen columns. It’s important to develop a methodical approach to easily understand the available data.

The first step is to look at values mentioned in the question description and identify which tables and columns contain them.

Looking at data types of each column can be helpful. It can help you decide whether you’ll need to cast or format values. This is often the case with date values, which need to be formatted or converted into text values.

Also, uniqueness and limitations of each value in the data are important. So looking at the actual data is worth it.

3. Plan your approach

Once you understand the question and available data, it’s time to lay out logical steps for finding a solution.

Writing down your plan can help you focus on solving the challenge. You can start with the most direct approach, and eventually you might come up with improvements to the original idea. Written down logical steps are easy to follow as you’re writing the query.

Making a habit of planning your approach can be very beneficial. It’s an opportunity to show off your thinking during an interview. Coming up with multiple ways to approach a problem can be a testament to your SQL skills and overall sharpness as a data scientist.

These steps should describe what operations (casting values, multiplication, division, formatting) you are going to perform to go from initial data to the final answer.

You don’t have to write specific code for each step, but you can write pseudo code, or reference the SQL features you’ll use when writing the actual code.

4. Write code

Transform your logical outline into a valid SQL query. Start with the most basic query and gradually add complexity to it.

For example, if you need to filter values in a certain column, first SELECT all values from the column. Incrementally add conditions for filtering, formatting rules, aggregation and so on.

Check the output of your query as you add these complex features. This way, you’ll know what’s working and if the additional code has the desired effect. You can also detect problems in the code and easily fix them.

Amazon Business Analyst Interview Question #1:
Find the number of customers without an order

Amazon business analyst interview question to find the customers without an order

One of the Business Analysts day to day tasks is to understand customer behavior. In this role, you might have to analyze the behavior of users who did not place an order. Perhaps you need to understand why they did not place an order. Or how Amazon employees can encourage users to order more.

Analyzing user behavior is one of the secrets to Amazon’s success. For this reason, we start the list of potential interview questions with this SQL challenge.

Amazon Business Analyst Interview Question to Find the customers

Link to the question: https://platform.stratascratch.com/coding/10089-find-the-number-of-customers-without-an-order

Understand the question

The question seems fairly simple and clear on what candidates need to do. It is marked as ‘Medium’ difficulty on the StrataScratch platform.

In this Amazon business analyst interview question, we have to work with two tables - one that contains information about orders, and the other that contains information about customers.

It seems logical that in order to find the answer, we will have to cross-check two tables to get to the answer.

Without further ado, let’s get started.

Analyze Data

As previously mentioned, we can extract all the necessary information from two tables. First, let’s take a look at the orders table:

id:int
cust_id:int
order_date:datetime
order_details:varchar
total_order_cost:int

Available data may be easier to understand if we look at the actual table:

Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80

Let’s go over each column:

  • The id column contains a unique integer value to identify each order.
  • The cust_id column contains an integer value to identify the customer placing the order. Since the question asks to find customers without orders, we will have to find users who are absent from this column.
    In real life, one customer can place multiple orders. cust_id values describe the relationship of many orders to one customer, so they are not necessarily unique.
  • The question mentions nothing about chronology of orders, so the values in the order_date column can be safely ignored.
  • Also, the question does not ask us to find what the customer ordered, so the order_details column can be ignored.
  • We do not have to calculate the volume of orders for each user. No need to track values in the total_order_cost column either.

The customers table is another essential piece of the puzzle:

id:int
first_name:varchar
last_name:varchar
city:varchar
address:varchar
phone_number:varchar

Now that we know about columns and types of values contained in them, let’s look at the table itself:

Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201

Let’s go over each column:

  • The id column contains a unique integer value to identify each customer. Each row describes only one customer, so the id values are unique.

Essentially, we have a list of all users, regardless of whether they placed an order. We have to cross-examine this list with the cust_id column in the orders table.

  • The question does not ask us to identify users by their full names. Therefore we can ignore values in first_name, last_name columns.
  • Customer’s location is not a point of interest, so we can ignore values in the city and address columns.
  • We also do not need the user's contact information contained in the phone_number column.

First, we need to identify important columns. In this case, that is going to be the id column in the customers table and cust_id column in the orders table.

Common mistake is to SELECT the id column from the orders table. We need the cust_id column because in this table, the id column identifies orders, not customers.

Looking at the data makes it obvious that every order is tied with a customer. Sometimes there are multiple orders for one customer, but there is no order without a cust_id value, which identifies who has placed the order.

Plan Your Approach

1. Combine data from two tables

In this case, the obvious choice is to use a JOIN to combine data from two tables. We only have to choose the right type of JOIN.

In order to perform a JOIN, we need to define the shared dimension - identity of the customer. In the orders table, customer identifiers are found in the cust_id column. In the customers table, these values are stored in the id column.

The question asks us to find the number of customers who meet a certain criteria. The first table is orders, and the second is customers. We use RIGHT OUTER JOIN to make sure that customer ids from the second table that are absent from the first table are not deleted.

The shared dimension between two tables is the customer's identity. In the customers table, these values are stored in the id column. In the orders table, they are stored in the cust_id column.

2. Filter records to find users without orders

Next, you need to find instances when there are no orders for a specific customer id. In other words, the id of the customer can not be found in the cust_id column of any order.

We perform a JOIN, but there is no guarantee that every value in the id column of the customers table will appear in the cust_id column of the orders table. In cases when there is no match, the cust_id column will be NULL.

At this stage, we have the list of all customers with corresponding order data. We can use the WHERE statement to only keep users without orders.

3. Get the number of rows in the filtered result

Once we have all the records that do not have a cust_id value, we’ve almost arrived at the answer.

The last step is to use the COUNT() aggregate function to get the number of filtered rows. This is the number of customers without orders.

4. Output the result and rename the column

It’s a good practice to use the AS keyword to give the output column a more descriptive label.

Write code

1. Combine data from two tables

SELECT *
FROM orders o
RIGHT OUTER JOIN customers c ON o.cust_id = c.id

2. Filter records to find users without orders

SELECT *
FROM orders o
RIGHT OUTER JOIN customers c ON o.cust_id = c.id
WHERE o.cust_id IS NULL

3. Get the number of rows in the filtered result

SELECT COUNT(*)
FROM orders o
RIGHT OUTER JOIN customers c ON o.cust_id = c.id
WHERE o.cust_id IS NULL

4. Output the result and rename the column.

SELECT COUNT(*) AS n_customers_without_orders
FROM orders o
RIGHT OUTER JOIN customers c ON o.cust_id = c.id
WHERE o.cust_id IS NULL

Experienced SQL developers sometimes change the order of tables to use LEFT JOINs instead of RIGHT OUTER JOINs.

Try to solve this Amazon business analyst interview question using a LEFT JOIN if you’re up for the challenge.

Amazon Business Analyst Interview Question #2: Find departments with less than 5 employees

Amazon business analyst interview question to find departments with less employees

Business Analysts sometimes need to detect structural inefficiencies in the organization. For this reason, Amazon interviewers might ask them to solve this question, where candidates need to find departments with a certain number of workers.

Amazon business analyst interview question to find departments

Link to the question: https://platform.stratascratch.com/coding/9860-find-departments-with-less-than-5-employees

Understand the question

To answer this amazon business analyst interview question, the candidate must find the number of workers for each department, and then filter the list to meet the criteria.

The final output contains only two columns - the department and corresponding number of workers.

Analyze Data

All the information necessary to solve this question is contained in one worker table. Here is the model of the table:

worker_id:int
first_name:varchar
last_name:varchar
salary:int
joining_date:datetime
department:varchar

We have six columns with different types of values. Besides paying attention to types of values, it helps to look at actual data in the table:

Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20 09:00:00HR
2NiharikaVerma800002014-06-11 09:00:00Admin
3VishalSinghal3000002014-02-20 09:00:00HR
4AmitahSingh5000002014-02-20 09:00:00Admin
5VivekBhati5000002014-06-11 09:00:00Admin

Let’s go over values in each column:

  • The worker_id column contains a unique integer value to identify each employee. We will use these values to calculate the number of employees for each department.
  • We should not use values in the first_name and last_name columns to identify each user. There may be employees with the same names, so it’s better to use unique worker_id values to identify them. For this reason, these values can be ignored.
  • The question does not ask us about compensation. Therefore the salary column can be ignored.
  • Candidates don’t have to chronologically track the joining date of each employee. We don’t need to work with values in the joining_date column.
  • Finally, the department column contains information about the working place of each employee. The question asks us to return department information along with the number of employees.

The most important columns in the workers table are worker_id and department. We have to get the number of employees for each department. The safest approach to get the number of employees is to count unique user_id values for each department.

Plan Your Approach

1. SELECT the data to work with

We need to write a statement to SELECT department and worker_id columns from the workers table. This will give us the list of employees and their place of work.

We don’t have the number of employees in a separate column. We will need to find the total number of workers some other way.

2. Aggregate the number of workers for each department

We can get the number of workers by using the count() aggregate function on the worker_id values. We need to use GROUP BY statement to output the total number of workers for each department.

By default, the count() aggregate function gives columns a generic count label. We can use the AS keyword to give the column a descriptive name.

At this stage, we should only have unique department values and the number of employees for each one of them.

3. Filter out departments with more than 5 workers

We can not use the WHERE statement to filter the result of aggregate functions. SQL developers use the HAVING() clause to discard groups (departments) that do not meet the criteria. In this case, the condition is that each department must have more than 5 employees.

Write code

1. SELECT the data to work with

SELECT 
    department,
    worker_id
FROM worker

2. Aggregate the number of workers for each department

SELECT department,
       count(worker_id) AS num_of_workers
FROM worker
GROUP BY department

3. Filter out departments with more than 5 workers

SELECT department,
       count(worker_id) AS num_of_workers
FROM worker
GROUP BY department
HAVING count(worker_id) < 5

Amazon Business Analyst Interview Question #3: Highest Salaried Employees

Amazon business analyst interview question to find highest salaried employees

To solve this ‘Medium’ difficulty question, candidates have to analyze the data in the worker table. It contains information like workers’ salary and their department.

Amazon business analyst interview question to find highest salaried employees

Link to the question: https://platform.stratascratch.com/coding/9865-highest-salaried-employees

Understand the question

The task is fairly simple - we have information about workers, their salary and the department they belong to. We need to find the highest earner for each department. Question description also specifies which values need to be returned.

You have to decide how to solve the edge case when multiple employees working at the same department share the highest salary. Most likely you should return information about all of them, but you might want to clarify this with the interviewer.

Analyze Data

Prospective Business Analysts need to write SQL query to work with data in the workers table. This is the table from question 2. However, in this case, we will work with different values to get the final answer:

worker_id:int
first_name:varchar
last_name:varchar
salary:int
joining_date:datetime
department:varchar

Let’s once again look at concrete example of available data:

Table: worker
worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20 09:00:00HR
2NiharikaVerma800002014-06-11 09:00:00Admin
3VishalSinghal3000002014-02-20 09:00:00HR
4AmitahSingh5000002014-02-20 09:00:00Admin
5VivekBhati5000002014-06-11 09:00:00Admin

Let’s go over columns in the table:

  • For this Amazon business analyst interview question, we don’t need to keep track of values in the worker_id column. We simply have to find the row with the highest salary and output value in its first_name column.
  • Our final output includes the first_name value of individuals we identify as having the highest salary in each department. So we will need to SELECT values from this column.
  • We don’t need to work with the last_name values at all. Therefore this column can be ignored.
  • To solve this question, candidates need to find employees with the highest salary for each department. Therefore we need to define separate groups of workers based on their place of employment, and find the highest salary for each group.
  • The timeline of when each employee joined is not important, therefore the joining_date column can be ignored.
  • We have to find the highest salary in each department. So, we’ll have to look for multiple workers that have the same value in the department column, and output the one with the highest salary.

In order to answer this Amazon business analyst interview question, we’ll need to work with salary and department columns. In the end, the output needs to include the respective first_name value of the worker as well.

Plan Your Approach

1. Find the highest salary among workers at each department

After looking at the available data, it becomes clear that there are many workers assigned to only a few departments. We need to separate workers based on the department where they work, and find the highest earner in each group.

To do this, most likely we are going to use the MAX() aggregate function as well as GROUP BY statement to put them in separate groups.

2. Combine data from main and temporary tables

In order to access values in all columns, we need to create a temporary table that contains the highest earners. Then we must JOIN it with the main table to gain access to all values from both tables.

We can perform an INNER JOIN to weed out the records of workers who are not highest earners. For that, we need to specify the shared dimension using the ON statement. INNER JOINs will do the rest.

3. Output the values in three columns

The main reason why we JOIN the temporary table with the main table is to have access to all columns of the main table.

The question asks us to find people with the highest salaries in each department, and output values in their salary, department and first_name columns.

Write code


1. Find the highest salary among workers at each department

SELECT 
      max(salary) AS TotalSalary,
      department 
FROM worker 
GROUP BY department

2. Combine the data from two tables

SELECT *
FROM
  (SELECT max(salary) AS TotalSalary,
          department
   FROM worker
   GROUP BY department) AS TempNew
INNER JOIN worker t ON TempNew.department = t.department
AND TempNew.TotalSalary = t.salary

3. Output the values in three columns

SELECT t.department,
       t.first_name,
       t.salary
FROM
  (SELECT max(salary) AS TotalSalary,
          department
   FROM worker
   GROUP BY department) AS TempNew
INNER JOIN worker t ON TempNew.department = t.department
AND TempNew.TotalSalary = t.salary

Summary

The Business Analyst role is unique, because it requires technical and business expertise, both being equally important. To get a job, candidates are often given SQL challenges. Their ability to write SQL queries will affect their chances of success.

In this article, we discussed three interview questions for a Business Analyst position at Amazon. You can explore other SQL questions on StrataScratch, a platform where data scientists of any level can practice and improve their SQL skills.

Amazon Business Analyst Interview Questions


Become a data expert. Subscribe to our newsletter.