Data Engineer SQL Interview Questions From Top Employers

Published: September 27, 2023

Categories:

Written by:
Irakli Tchigladze

In this article, we will list important SQL concepts and walk you through 10 data engineer SQL interview questions asked by employers like Google and Amazon.

Companies collect and maintain large volumes of data to better understand their customers and improve various facets of their business. However, raw data can be a bit messy and disorganized. Companies need data engineers to refine initially collected data so that it’s easier to work with.

Data can be used for various tasks - analysis, visualization, or fed into machine learning models, just to name a few. Data engineers prepare data according to the specifics of the task so that data scientists can utilize the data instead of fixing its inconsistencies.

To do this, data engineers need to have a specific set of skills. In this article, we will discuss SQL - the main programming language for working with relational databases. We’ll go over the most important SQL concepts tested in SQL interview questions and walk you through data engineer SQL interview questions asked at top organizations today.

Concepts tested in data engineer SQL interview questions

Data engineers perform a wide variety of tasks. Few examples: validating and ordering data, handling edge cases, implementing consistent formatting, fixing bad data, and changing data types.

Data engineers will most likely use a combination of programming languages and tools. SQL and Python are two of the most important skills. StrataScratch blog has published an article about Python Data Engineer Interview Questions as well.

We compiled a list of SQL concepts most often used in data engineers’ day-to-day jobs. If you’re preparing for an interview, make sure to master these concepts. Data engineer SQL Interview questions often require a strong knowledge of these features.

Concepts tested in data engineer SQL interview questions

We will start with easier concepts and cover difficult ones as well.

Filtering

As a data engineer, you must know how to use the WHERE clause to filter tables. It is usually paired with a condition and returns only rows that satisfy the condition.

It’s important to know how to set up a condition for the WHERE clause and how to couple it with SELECT, UPDATE and DELETE statements.

HAVING is another important filtering feature of SQL. Unlike WHERE, it is used to filter aggregated group data. It’s important to know the differences between WHERE and HAVING clauses and when to use them.

DDL and DML

Data Definition Language (DDL) and Data Manipulation Language (DML) allow data engineers to define and modify database structure - create new tables and objects, specify their name, define columns and specify types of values in them. This is a very short list of what data engineers can do using DDL and DML.

Data Type Conversion

Sometimes, data engineers have to convert values to another type. This may be necessary to perform certain SQL operations like comparison.

To be an effective data engineer, you need to know the difference between implicit and explicit type conversions. It’s important to know how to use CAST(), and CONVERT() functions for explicit conversion, and even better if you can explain the differences between these two functions. This will set you apart from other candidates interviewing for the same job.

You should also be prepared to explain implicit conversions and when they occur. This is important because one unexpected data conversion could lead to errors that are hard to identify and resolve.

ORDER BY

Arranging data using the ORDER BY statement is a basic but important SQL skill. Sometimes complex data engineering tasks can be solved using a simple ORDER BY statement.

The basic syntax of the ORDER BY statement is simple - you need to specify column values and the order (ascending or descending) to arrange rows. Advanced data engineer SQL interview questions might ask you to order rows based on values in multiple columns.

Subqueries

Any candidate interviewing for a data engineer job should know what subqueries are and how to write them.

Sometimes subqueries are described as a query in another query. They allow us to perform complex SQL operations.

Data engineers should also know how to use subqueries with various SQL features: WHERE, HAVING, FROM, SELECT.

Aggregate functions

An essential SQL feature to summarize large volumes of data. Aggregate functions are often used with GROUP BY statements to separate rows into groups and aggregate values for each group. Other times, you’ll need to aggregate data for the entire table.

There are five main aggregate functions in SQL:

SUM() finds the total sum of all numeric values in a column.
COUNT() finds the number of rows in the entire table or a group
MIN and MAX() are used to find the highest and lowest values in a table or a group.
AVG() calculates the average of all values in the column or a group

You can use these aggregate functions within a subquery to set conditions for WHERE and HAVING clauses.

You can find many examples of SQL Interview Questions involving aggregate functions on our blog.

In this article, we’ll solve interview questions in PostgreSQL, which has a wide variety of aggregate functions. Go to PostgreSQL documentation for a full list.

NULL values

Data engineers sometimes have to do what other data scientists don’t want to do - deal with NULL values.

Removing or replacing NULL values can improve the quality of data, which can make data more accessible and easy to work with.

Before dealing with them, you must understand what a NULL value is. It is not 0 or an empty text (‘ ‘) but an absence of any value. The absence of a value is typically a challenge to performing accurate data analysis.

You can’t use normal comparison operators like > < = to work with NULL values.

You can use IS NULL and IS NOT NULL operators to find values that are (and are not) NULL. Or use functions like ISNULL() to return a specific value instead of NULL.

Finally, it’s important to know how aggregate functions and NULL values work together. For example, how you can use the COUNT() aggregate function to find the number of NULL values.

COUNT(*) returns a number of values including NULL ones, whereas COUNT(column) returns the number of non-null values. The difference between them will be the number of NULL values in the table.

Text functions

In SQL, text values belong to data types such as VARCHAR, CHAR, or TEXT, and they are very common. It’s important for data engineers to be able to use built-in text functions in SQL.

CHAR_LENGTH() returns a number of symbols in the text. LOWER() and UPPER() convert text values to lowercase or uppercase. They help you ensure consistent capitalization of letters, which is absolutely necessary if the task involves case-sensitive functions.

Other important text functions are TRIM() and SUBSTRING(). The former allows you to remove unnecessary spaces before and after text values. The latter allows you to extract a specific portion of the string.

Finally, the REPLACE() function finds substring instances and replaces them with another text.

This was a short list of the most important text functions in PostgreSQL. Refer to the documentation for a full list. The more text functions you know, the better.

JOINs

Data engineers use JOINs to combine data from multiple tables into one. There are four main types of JOINs in SQL.

INNER JOINs return rows with overlapping values in the shared dimension.
LEFT JOINs return all rows from the first table but only overlapping values from the second table.
RIGHT JOINs are the opposite and return all rows from the second table and only matching values from the first table.
FULL JOINs return combined columns of both tables. If rows do not overlap, it fills absent columns with NULL.
CROSS JOINs return all possible combinations of columns from both tables.

Understanding various types of JOINs will help you succeed as a data engineer. You should understand their syntax and be able to recognize which type of JOIN is needed for a given task. Knowing how to use the ON statement to define a shared dimension is also important.

Window functions

You can use window functions to create new values. For example, they can be used to assign each record a numerical rank. Or create new aggregate values of the entire table or certain group of rows.

Unlike the GROUP BY statement, aggregate values are stored in a separate column, not collapsed into one row.

Before going into an interview, practice important window functions like ROW_NUMBER(), RANK(), and DENSE(). Learn how to use PARTITION BY to create groups and apply window functions to each group, not the entire table.

In later parts of this article, we’ll see practical examples of window functions and how the aforementioned functions can solve real business challenges.

Array functions

As a data engineer, you’ll have to work with complex data structures like arrays. So it’s important to have a good grasp of array functions.

You can start with learning the ARRAY() command, which takes values and adds them to an array.

ARRAY_AGG() function aggregates all items of the array. ARRAY_APPEND() function allows you to add specific value to an array. Or join two arrays together using the ARRAY_CONCAT() function.

CASE expression

Setting up conditions for filtering and joining tables is an integral part of a data engineer’s job. CASE expression can be used to set up complex conditions for WHERE and HAVING clauses.

You should know how to use CASE expressions to chain multiple conditions. That includes WHEN, THEN, and ELSE clauses, which make up the condition.

It’s important to understand how SQL determines the outcome of a CASE expression. A chain of CASE statements will return the value of whichever condition is met first. If none of the conditions are met, it will return the value of the ELSE clause. If there is no ELSE clause, CASE will return NULL.

DISTINCT

Data engineers use the DISTINCT clause for many purposes. When used with a SELECT statement, it returns only unique values. When used with aggregate functions, it ensures that they are applied to unique values only.

DISTINCT provides a very simple yet effective way to remove (or ignore) duplicate values. For that reason, it is a useful tool at any data engineer’s disposal.

Basic Data Engineer SQL Interview Questions

Now, let’s see these SQL concepts used in practice.

Question #1: Cast Values to Integer

This is a typical data engineer SQL interview question, where you are asked to cast values, remove inconsistencies, and clean up the table.

Cast stars column values to integer and return with all other column values

Yelp

EasyID 10056

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Cast stars column values to integer and return with all other column values. Be aware that certain rows contain non integer values. You need to remove such rows. You are allowed to examine and explore the dataset before making a solution.

Table: yelp_reviews

Link to the question: https://platform.stratascratch.com/coding/10056-cast-stars-column-values-to-integer-and-return-with-all-other-column-values

Understand the question

This data engineer SQL interview question is fairly straightforward as long as you take the time to understand what’s being asked.

There are two clear objectives: cast stars values to integer and clear the table of rows that don’t have numeric star values. Otherwise, leave the table as it is and return all other columns as well.

Analyze data

To answer this question, we have to work with a single yelp_reviews table. Let’s look at the types of values in all columns:

Table: yelp_reviews

As you can see, the stars column stores text values of the ‘varchar’ type.

Business_name, review_id, user_id, review_date, and review_text similarly contain text values.

Funny, useful, and cool columns are integers.

All columns will stay the same except for the stars column. Values in this column will be converted to an integer. We must also remove rows where the stars column contains values other than numbers.

We can see that the stars column contains whole numbers to represent the yelp rating of each business. However, there are exceptions when the value is a question sign (‘?’).

It might look like values in the stars column are already numbers and don’t need to be converted. However, it’s important to understand that the ‘4’ you see in the column is actually text. Not a number 4, but a piece of text, like ‘4 apples’.

Plan your approach

To solve this data engineer SQL interview question, we need to perform two tasks - keep rows where the value of the stars column is a number (not the ‘?’), and convert values of the stars column to the integer type.

Finally, we need to return all columns. Only values in the stars column need to be converted to integers, others are returned as they were before.

Write the code

Convert stars values

Let’s start by converting stars values from ‘varchar’ to ‘integer’ type.

Let’s select all other columns as well since our query needs to return them all.

In PostgreSQL, we can use the double-semicolon syntax to easily cast the value of the stars column.

SELECT business_name,
       review_id,
       user_id,
       stars :: INTEGER,
       review_date,
       review_text,
       funny,
       useful,
       cool
FROM yelp_reviews

In MySQL, you can use either CAST() or CONVERT() functions instead.

Remove rows with non-numeric star values

As we’ve seen in the previous section, some records have the stars value of ‘?’.

We can simply set a WHERE clause to only keep rows where the value of the stars column is not ‘?’.

SELECT 
    business_name,
    review_id,
    user_id,
    stars :: INTEGER,
    review_date,
    review_text,
    funny,
    useful,
    cool
FROM yelp_reviews
WHERE 
    stars <> '?'

That’s it. This query will convert the stars column to integer, and filter out rows where the value of stars column is ‘?’.

Output

Output table should look exactly like before, but the value of the stars column will be of integer type. Also, records with the ‘?’ value in the stars column will be filtered out.

Question 2: Arrange worker records based on values in two columns

In this question, aspiring data engineers are asked to arrange names in an alphabetic order.

Sort Workers in Ascending Order by First Name and Descending Order by Department

Oracle

Amazon

EasyID 9836

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Sort workers in ascending order by the first name and then in descending order by department name.

Table: worker

Link to the question: https://platform.stratascratch.com/coding/9836-sort-workers-in-ascending-order-by-the-first-name-and-in-descending-order-by-department-name

Understand the question

The premise of this data engineer SQL interview question is fairly simple. We need to arrange records based on names (text values) in two columns.

The question only tells us to sort records in ascending and descending orders. However, since values are text, it means we need to order rows alphabetically.

There’s a twist to this question - for one column (first names), we must order in ascending order (from A to Z). For the second column (department names), we must order them in descending order (from Z to A).

Analyze data

All the necessary information is contained in a single workers table. Let’s start by looking at the types of values in each column.

Table: worker

We have six columns.

worker_id contains integer values to identify workers. Most likely, we won’t have to work with this column.
The first_name column contains first name values. We will order records by looking at values in this column.
We don’t need to work with values in the last_name, salary, or joining_date columns.
Department names are stored in the department column. We will order records by looking at values in this column.

Looking at the table, all values look standard. It’s worth noting that first and last names are capitalized.

Plan your approach

This question asks us to order rows based on values in two columns - first_name and department. We don’t have to change rows or values in them.

We need to select all rows and use the ORDER BY statement to arrange them in a specific order.

It’s important that you know how to order rows according to values in two columns.

Write the code

Try to write the code yourself.

Output

The correct query should return a sorted table.

Question 3: Find the last five records of the dataset

In this data engineer SQL interview question, data engineers are asked to select a specific part of a large dataset.

Last Five Records of Dataset

Visa

Amazon

EasyID 9864

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find the last five records of the dataset.

Table: worker

Link to the question: https://platform.stratascratch.com/coding/9864-find-the-last-five-records-of-the-dataset

Understand the question

Description of this question is fairly simple.

Before attempting to answer the question, it’s a good idea to look at available data and examine values, their types, and the ordering of records.

Analyze data

There is one available table named worker. We already worked with this table in the previous question.

In this case, we are not concerned with any of the six columns or values contained in them. We simply have to find and return the last five records.

However, it’s important to look at the actual table (with data) to see if we can notice anything that will help us find the last five records.

Without further ado, let’s look at the worker table with actual data:

Table: worker

As you can see, every record has its own unique worker_id value.

It’s safe to assume that records are arranged in ascending order according to worker_id values. Double-check with the interviewer to confirm this pattern. Once you do, solving this data engineer SQL interview question becomes much easier.

Plan your approach

Before we can find and extract the last five records, we need to define the criteria for identifying them.

We’ve seen that each record has its own unique identifier of integer type. As long as these numbers are unique and increasing, counting the number of all rows will give us the id of the last record.

We can use that information to get the worker_id value of the last five records and set up the condition only to keep rows with those ids.

Or we can find the worker_id of the fifth row before the last and keep records where worker_id is higher than the threshold.

Write the code

Practice your SQL skills by answering the question yourself.

Output

Your final query should return the last five records.

Question 4: Primary Key Violation

In this data engineer SQL interview question, you have to find duplicate cust_id values and return the number of times they are repeated. You can find other Amazon Data Engineer Interview Questions on the StrataScratch blog.

Primary Key Violation

Last Updated: May 2022

Apple

Amazon

EasyID 2107

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Write a query to return all Customers (cust_id) who are violating primary key constraints in the Customer Dimension (dim_customer) i.e. those Customers who are present more than once in the Customer Dimension. For example if cust_id 'C123' is present thrice then the query should return two columns, value in first should be 'C123', while value in second should be 3

Table: dim_customer

Link to the question: https://platform.stratascratch.com/coding/2107-primary-key-violation

Understand the question

The question has detailed instructions, but it can be confusing. Make sure to read it multiple times to get to the essence of the question.

To solve this challenge, you must know that primary keys are unique identifiers for each record. They can not be repeated. The question refers to this as the ‘primary key constraint’.

There are also instructions for how to format the output. We need to return rows with the duplicate cust_id values and the number of times they appear.

Analyze data

Looking at the available data could help you better understand the question.

Let’s start by looking at the types of values in every column of the only available table named dim_customer:

Table: dim_customer

To solve this data engineer SQL interview question, we’ll have to find duplicate cust_id values and the number of times they are repeated. This column contains text values, specifically of the ‘varchar’ type.

cust_name, cust_city, cust_dob and cust_pin_code values are not important.

Plan your approach

In simple words, we need to find duplicate cust_id values and the number of times they are repeated.

The simplest way is to create a group of rows for each unique cust_id value, find the number of rows in each group, and return those with more than one row in the group.

Then return the number of records in each group. This will return the number of rows with the same cust_id value.

Write the code

Now that you have a logical outline of what to do, try to match it by writing the SQL query yourself.

Output

Query should return two columns - the key and the number of times it was repeated.

Advanced data engineer SQL interview questions

Let’s also look at some of the more difficult SQL questions asked during data engineer interviews.

Question 5: Verify the first 4 numbers of all phone numbers

In this question, we have to validate data by checking phone number values and confirm that they begin with specific 4 numbers.

Verify that the first 4 digits are equal to 1415 for all phone numbers

City of San Francisco

MediumID 9737

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Software Engineer

Verify that the first 4 digits are equal to 1415 for all phone numbers. Output the number of businesses with a phone number that does not start with 1415.

Table: sf_restaurant_health_violations

Link to the question: https://platform.stratascratch.com/coding/9737-verify-that-the-first-4-digits-are-equal-to-1415-for-all-phone-numbers

Understand the question

This data engineer SQL interview question asks us to verify that all phone numbers begin with 1415. It’s safe to assume that one of the columns contains phone number values. We will have to make sure that the first four symbols of values in this column are 1415.

It’s unclear what kind of answer we need to return. It could be binary - to show whether or not all phone numbers begin with 1415. Or we might have to return the number of phone numbers that begin with 1415 or do not begin with 1415.

In situations like this, you should ask the interviewer for specific instructions.

Analyze data

The sf_restaurant_health_violations table contains information about businesses, including their phone numbers.

Let’s take a closer look at the types of values contained in each column.

Table: sf_restaurant_health_violations

The table has a lot of columns.

The only column that contains a phone number is business_phone_number, so it’s safe to assume that we need to check values in this column.

We can confirm our assumption by looking at the actual table above.

It's important to note that not all rows have values in this column. Some are empty.

Plan your approach

To solve this question, we have to filter rows by two criteria - the value in the business_phone_number column is not empty. If it has a number, it must start with 1415.

Let’s assume that the question asks us to return the number of rows where the phone number does not start with 1415.

Finally, we should return the number of rows that satisfy both criteria.

Write the code

1. Get the number of all rows

We can use the COUNT() aggregate function to get the number of all records in the table.

SELECT COUNT (*)
FROM sf_restaurant_health_violations

COUNT() will return the total number of records in the table. Output will be collapsed into one row.

2. Set up two conditions

To get the answer, we need to find the number of records that satisfy the two aforementioned criteria.

We can add a WHERE clause to remove records that do not meet the criteria.

SELECT COUNT (*)
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL
  AND LEFT(business_phone_number :: TEXT, 4) <> '1415'

The COUNT(*) aggregate function will apply to filtered records.

We use the LEFT() function to take four symbols from the left side (beginning) of the value, and compare it with 1415.

Output

Our query will return the number of phone number values that exist but do not start with 1415.

Question 6: SMS Confirmations From Users

In this data engineer SQL interview question, data engineers must identify and get the number of invalid records. It is one of many Facebook Data Engineer Interview Questions.

SMS Confirmations From Users

Last Updated: November 2020

Question 7: Find the first 50% of the dataset

In this question, candidates have to return the first half of all rows in the table.

First 50% of Records From Dataset

Bosch

Amazon

MediumID 9859

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find the first 50% records of the dataset.

Table: worker

Link to the question: https://platform.stratascratch.com/coding/9859-find-the-first-50-records-of-the-dataset

Understand the question

The premise is fairly simple - we need to return the first half of all rows in the dataset.

Analyze data

We are given one table named worker.

Let’s look at the types of values in every column:

Table: worker

To answer this data engineer interview question, we don’t need to work with any of these columns. We just have to return the first half of all rows.

While looking at data, you might notice that every record seems to have an incremental worker_id value.

This is a limited example with only 8 rows. In reality, data engineers work with much larger datasets.

Plan your approach

To get the first 50% of all rows, first, we must find the total number of rows in the table.

Once we do, we can divide the total by 2. This will give us half of all rows, let’s call this number X. We can tell SQL to return the first X number of rows, but not more.

Rows already have worker_id values, which appear unique and incremental. We can set a condition to return records where the worker_id value is less than X (a number that represents half of all records). However, there are no guarantees that these values are unique and/or incremental.

Instead, we can use window functions to assign each row a unique incremental number and tell SQL to return records with numerical values of less than X (half of all records).

Write the code

If you like a challenge, try to write a query yourself.

Output

The query should output four records because the example data showed 8 records.

Question 8: Formatting Names

This question involves cleaning up data and ensuring its consistency.

First Name with Left White Space Removed

Amazon

EasyID 9831

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Software Engineer

Print the first name after removing white spaces from the left side.

Table: worker_ws

Link to the question: https://platform.stratascratch.com/coding/9831-formatting-names

Understand the question

The task is fairly simple - remove unnecessary white spaces from the left side of first name values.

The question asks us to print (return) trimmed values.

Analyze data

Let’s start by looking at the table and dataset available for this question:

Missing data

To solve this data engineer SQL interview question, we are only concerned with values in the first_name column.

Plan your approach

We need to take first name values and trim them.

The question tells us to remove white spaces only from the left side of first name values. So the standard TRIM() function is not going to work.

Write the code

Solving this challenge would be a great practice before going into a data engineering interview.

Try to write the query yourself:

Output

The query should return only trimmed first name values and nothing else.

Question 9: Rows With Missing Values

In this data engineer SQL interview question, we have to find and return records with multiple empty columns.

Rows With Missing Values

Last Updated: April 2022

Google

MediumID 2106

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

The data engineering team at YouTube want to clean the dataset user_flags. In particular, they want to examine rows that have missing values in more than one column. List these rows.

Table: user_flags

Link to the question: https://platform.stratascratch.com/coding/2106-rows-with-missing-values

Understand the question

The objective is clear - identify rows with more than one missing value and return them.

Analyze data

Let’s look at columns in the user_flags table:

Table: user_flags

As you can see, all columns contain text values.

We’ll have to check all columns to find out whether they are empty or have values in them.

The last record has two empty columns. It is a prime example of the kind of records we’re looking for.

Plan your approach

The question asks us to return rows with missing columns, so we need to:

Use the WHERE clause to remove rows that don’t have two or more missing values.
Set up a complex condition (using CASE statement) to identify and keep count of empty columns for each row.

Write the code

Try to write the query yourself.

Output

The final query should return all the records with more than one missing value.

Question 10: Find the combinations

As a data engineer, sometimes you’ll have to prepare data for analysis. For example, extract data that meets certain criteria.

Find The Combinations

Lyft

Uber

MediumID 10010

Data Engineer

Data Scientist

BI Analyst

Data Analyst

ML Engineer

Find all combinations of 3 numbers that sum up to 8. Output 3 numbers in the combination but avoid summing up a number with itself.

Table: transportation_numbers

Link to the question: https://platform.stratascratch.com/coding/10010-find-the-combinations

Understand the question

You might have to read this data engineer SQL interview question multiple times to understand it fully.

We are given a list of numbers and need to find and output various combinations of three numbers that add up to 8.

There is an extra condition - we can not add a number to itself. Every number in the equation must be unique.

Analyze data

The transportation_numbers table is essentially a list of numbers. It has two columns, index and number, and both of them contain integer values.

We need to go through the list and pick three that add up to 8.

Table: transportation_numbers

It looks like all values in the number column are below 8.

Plan your approach

The initial table has one column with numbers.

We need to combine the table with itself three times, so that every record contains three number values.

If you use JOIN to combine tables, you can set the ON condition to ensure that all three numbers are different from each other.

Finally, you need to set up a filter that checks if three number columns add up to 8.

Write the code

Try to solve this interesting challenge yourself.

Output

Summary

Hopefully, by now, you better understand the day-to-day responsibilities of a data engineer and what a data engineer does and the importance of their work.

To do the job successfully, data engineers need to have a very specific set of skills, one of them is SQL. In this article, we listed the most important SQL concepts and walked you through data engineer SQL interview questions asked by reputable employers.

Practice on these questions to maximize your chances of landing a data engineer job. Go to the Data Engineer Interview Questions post on the StrataScratch blog to find more questions to solve. The platform has hundreds of SQL interview questions of various levels of difficulty. You can write queries, check their correctness, get hints and guidance all within one platform. Soon enough, you’ll master SQL and be prepared to solve any interview questions with confidence.

Data Engineer SQL Interview Questions From Top Employers

Concepts tested in data engineer SQL interview questions

Filtering

DDL and DML

Data Type Conversion

ORDER BY

Subqueries

Aggregate functions

NULL values

Text functions

JOINs

Window functions

Array functions

CASE expression

DISTINCT

Basic Data Engineer SQL Interview Questions

Question #1: Cast Values to Integer

Cast stars column values to integer and return with all other column values

Question 2: Arrange worker records based on values in two columns

Sort Workers in Ascending Order by First Name and Descending Order by Department

Question 3: Find the last five records of the dataset

Last Five Records of Dataset

Question 4: Primary Key Violation

Primary Key Violation

Advanced data engineer SQL interview questions

Question 5: Verify the first 4 numbers of all phone numbers

Verify that the first 4 digits are equal to 1415 for all phone numbers

Question 6: SMS Confirmations From Users

SMS Confirmations From Users

Question 7: Find the first 50% of the dataset

First 50% of Records From Dataset

Question 8: Formatting Names

First Name with Left White Space Removed

Question 9: Rows With Missing Values

Rows With Missing Values

Question 10: Find the combinations

Find The Combinations

Summary

Latest Posts:

Unsupervised Clustering: Methods, Examples, and When to Use

A Guide to Master Machine Learning Modeling from Scratch

How to Create a Bubble Plot with Python and Matplotlib?