Data Engineer SQL Interview Questions From Top Employers

Data Engineer SQL Interview Questions

Categories:

  • Author Avatar
    Written by:

    Irakli Tchigladze

In this article, we will list important SQL concepts and walk you through 10 data engineer SQL interview questions asked by employers like Google and Amazon.

Companies collect and maintain large volumes of data to better understand their customers and improve various facets of their business. However, raw data can be a bit messy and disorganized. Companies need data engineers to refine initially collected data so that it’s easier to work with.

Data can be used for various tasks - analysis, visualization, or fed into machine learning models, just to name a few. Data engineers prepare data according to the specifics of the task so that data scientists can utilize the data instead of fixing its inconsistencies.

To do this, data engineers need to have a specific set of skills. In this article, we will discuss SQL -  the main programming language for working with relational databases. We’ll go over the most important SQL concepts tested in SQL interview questions and walk you through data engineer SQL interview questions asked at top organizations today.

Concepts tested in data engineer SQL interview questions

Data engineers perform a wide variety of tasks. Few examples: validating and ordering data, handling edge cases, implementing consistent formatting, fixing bad data, and changing data types.

Data engineers will most likely use a combination of programming languages and tools. SQL and Python are two of the most important skills. StrataScratch blog has published an article about Python Data Engineer Interview Questions as well.

We compiled a list of SQL concepts most often used in data engineers’ day-to-day jobs. If you’re preparing for an interview, make sure to master these concepts. Data engineer SQL Interview questions often require a strong knowledge of these features.

Concepts tested in data engineer SQL interview questions

We will start with easier concepts and cover difficult ones as well.

Filtering

As a data engineer, you must know how to use the WHERE clause to filter tables. It is usually paired with a condition and returns only rows that satisfy the condition.

It’s important to know how to set up a condition for the WHERE clause and how to couple it with SELECT, UPDATE and DELETE statements.

HAVING is another important filtering feature of SQL. Unlike WHERE, it is used to filter aggregated group data. It’s important to know the differences between WHERE and HAVING clauses and when to use them.

DDL and DML

Data Definition Language (DDL) and Data Manipulation Language (DML) allow data engineers to define and modify database structure - create new tables and objects, specify their name, define columns and specify types of values in them. This is a very short list of what data engineers can do using DDL and DML.

Data Type Conversion

Sometimes, data engineers have to convert values to another type. This may be necessary to perform certain SQL operations like comparison.

To be an effective data engineer, you need to know the difference between implicit and explicit type conversions. It’s important to know how to use CAST(), and CONVERT() functions for explicit conversion, and even better if you can explain the differences between these two functions. This will set you apart from other candidates interviewing for the same job.

You should also be prepared to explain implicit conversions and when they occur. This is important because one unexpected data conversion could lead to errors that are hard to identify and resolve.

ORDER BY

Arranging data using the ORDER BY statement is a basic but important SQL skill. Sometimes complex data engineering tasks can be solved using a simple ORDER BY statement.

The basic syntax of the ORDER BY statement is simple - you need to specify column values and the order (ascending or descending) to arrange rows. Advanced data engineer SQL interview questions might ask you to order rows based on values in multiple columns.

Subqueries

Any candidate interviewing for a data engineer job should know what subqueries are and how to write them.

Sometimes subqueries are described as a query in another query. They allow us to perform complex SQL operations.

Data engineers should also know how to use subqueries with various SQL features: WHERE, HAVING, FROM, SELECT.

Aggregate functions

An essential SQL feature to summarize large volumes of data. Aggregate functions are often used with GROUP BY statements to separate rows into groups and aggregate values for each group. Other times, you’ll need to aggregate data for the entire table.

There are five main aggregate functions in SQL:

  • SUM() finds the total sum of all numeric values in a column.
  • COUNT() finds the number of rows in the entire table or a group
  • MIN and MAX() are used to find the highest and lowest values in a table or a group.
  • AVG() calculates the average of all values in the column or a group

You can use these aggregate functions within a subquery to set conditions for WHERE and HAVING clauses.

You can find many examples of SQL Interview Questions involving aggregate functions on our blog.

In this article, we’ll solve interview questions in PostgreSQL, which has a wide variety of aggregate functions. Go to PostgreSQL documentation for a full list.

NULL values

Data engineers sometimes have to do what other data scientists don’t want to do - deal with NULL values.

Removing or replacing NULL values can improve the quality of data, which can make data more accessible and easy to work with.

Before dealing with them, you must understand what a NULL value is. It is not 0 or an empty text (‘ ‘) but an absence of any value. The absence of a value is typically a challenge to performing accurate data analysis.

You can’t use normal comparison operators like > < = to work with NULL values.

You can use IS NULL and IS NOT NULL operators to find values that are (and are not) NULL. Or use functions like ISNULL() to return a specific value instead of NULL.

Finally, it’s important to know how aggregate functions and NULL values work together. For example, how you can use the COUNT() aggregate function to find the number of NULL values.

COUNT(*) returns a number of values including NULL ones, whereas COUNT(column) returns the number of non-null values. The difference between them will be the number of NULL values in the table.

Text functions

In SQL, text values belong to data types such as VARCHAR, CHAR, or TEXT, and they are very common. It’s important for data engineers to be able to use built-in text functions in SQL.

CHAR_LENGTH() returns a number of symbols in the text. LOWER() and UPPER() convert text values to lowercase or uppercase. They help you ensure consistent capitalization of letters, which is absolutely necessary if the task involves case-sensitive functions.

Other important text functions are TRIM() and SUBSTRING(). The former allows you to remove unnecessary spaces before and after text values. The latter allows you to extract a specific portion of the string.

Finally, the REPLACE() function finds substring instances and replaces them with another text.

This was a short list of the most important text functions in PostgreSQL. Refer to the documentation for a full list. The more text functions you know, the better.

JOINs

Data engineers use JOINs to combine data from multiple tables into one. There are four main types of JOINs in SQL.

  • INNER JOINs return rows with overlapping values in the shared dimension.
  • LEFT JOINs return all rows from the first table but only overlapping values from the second table.
  • RIGHT JOINs are the opposite and return all rows from the second table and only matching values from the first table.
  • FULL JOINs return combined columns of both tables. If rows do not overlap, it fills absent columns with NULL.
  • CROSS JOINs return all possible combinations of columns from both tables.

Understanding various types of JOINs will help you succeed as a data engineer. You should understand their syntax and be able to recognize which type of JOIN is needed for a given task. Knowing how to use the ON statement to define a shared dimension is also important.

Window functions

You can use window functions to create new values. For example, they can be used to assign each record a numerical rank. Or create new aggregate values of the entire table or certain group of rows.

Unlike the GROUP BY statement, aggregate values are stored in a separate column, not collapsed into one row.

Before going into an interview, practice important window functions like ROW_NUMBER(), RANK(), and DENSE(). Learn how to use PARTITION BY to create groups and apply window functions to each group, not the entire table.

In later parts of this article, we’ll see practical examples of window functions and how the aforementioned functions can solve real business challenges.

Array functions

As a data engineer, you’ll have to work with complex data structures like arrays. So it’s important to have a good grasp of array functions.

You can start with learning the ARRAY() command, which takes values and adds them to an array.

ARRAY_AGG() function aggregates all items of the array. ARRAY_APPEND() function allows you to add specific value to an array. Or join two arrays together using the ARRAY_CONCAT() function.

CASE expression

Setting up conditions for filtering and joining tables is an integral part of a data engineer’s job. CASE expression can be used to set up complex conditions for WHERE and HAVING clauses.

You should know how to use CASE expressions to chain multiple conditions. That includes WHEN, THEN, and ELSE clauses, which make up the condition.

It’s important to understand how SQL determines the outcome of a CASE expression. A chain of CASE statements will return the value of whichever condition is met first. If none of the conditions are met, it will return the value of the ELSE clause. If there is no ELSE clause, CASE will return NULL.

DISTINCT

Data engineers use the DISTINCT clause for many purposes. When used with a SELECT statement, it returns only unique values. When used with aggregate functions, it ensures that they are applied to unique values only.

DISTINCT provides a very simple yet effective way to remove (or ignore) duplicate values. For that reason, it is a useful tool at any data engineer’s disposal.

Basic Data Engineer SQL Interview Questions

Basic Data Engineer SQL Interview Questions

Now, let’s see these SQL concepts used in practice.

Question #1: Cast Values to Integer

This is a typical data engineer SQL interview question, where you are asked to cast values, remove inconsistencies, and clean up the table.

EasyID 10056

Cast stars column values to integer and return with all other column values. Be aware that certain rows contain non integer values. You need to remove such rows. You are allowed to examine and explore the dataset before making a solution.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10056-cast-stars-column-values-to-integer-and-return-with-all-other-column-values

Understand the question

This data engineer SQL interview question is fairly straightforward as long as you take the time to understand what’s being asked.

There are two clear objectives: cast stars values to integer and clear the table of rows that don’t have numeric star values. Otherwise, leave the table as it is and return all other columns as well.

Analyze data

To answer this question, we have to work with a single yelp_reviews table. Let’s look at the types of values in all columns:

Table: yelp_reviews
Loading Dataset

As you can see, the stars column stores text values of the ‘varchar’ type.

Business_name, review_id, user_id, review_date, and review_text similarly contain text values.

Funny, useful, and cool columns are integers.

All columns will stay the same except for the stars column. Values in this column will be converted to an integer. We must also remove rows where the stars column contains values other than numbers.

We can see that the stars column contains whole numbers to represent the yelp rating of each business. However, there are exceptions when the value is a question sign (‘?’).

It might look like values in the stars column are already numbers and don’t need to be converted. However, it’s important to understand that the ‘4’ you see in the column is actually text. Not a number 4, but a piece of text, like ‘4 apples’.

Plan your approach

To solve this data engineer SQL interview question, we need to perform two tasks - keep rows where the value of the stars column is a number (not the ‘?’), and convert values of the stars column to the integer type.

Finally, we need to return all columns. Only values in the stars column need to be converted to integers, others are returned as they were before.

Write the code

  1. Convert stars values

Let’s start by converting stars values from ‘varchar’ to ‘integer’ type.

Let’s select all other columns as well since our query needs to return them all.

In PostgreSQL, we can use the double-semicolon syntax to easily cast the value of the stars column.

SELECT business_name,
       review_id,
       user_id,
       stars :: INTEGER,
       review_date,
       review_text,
       funny,
       useful,
       cool
FROM yelp_reviews

In MySQL, you can use either CAST() or CONVERT() functions instead.

  1. Remove rows with non-numeric star values

As we’ve seen in the previous section, some records have the stars value of ‘?’.

We can simply set a WHERE clause to only keep rows where the value of the stars column is not ‘?’.

SELECT 
    business_name,
    review_id,
    user_id,
    stars :: INTEGER,
    review_date,
    review_text,
    funny,
    useful,
    cool
FROM yelp_reviews
WHERE 
    stars <> '?'
Missing data

That’s it. This query will convert the stars column to integer, and filter out rows where the value of stars column is ‘?’.

Output

Output table should look exactly like before, but the value of the stars column will be of integer type. Also, records with the  ‘?’ value in the stars column will be filtered out.

All required columns and the first 5 rows of the solution are shown

business_namereview_iduser_idstarsreview_datereview_textfunnyusefulcool
AutohausAZC4TSIEcazRay0qIRPeMAFgjlbPUcCRiXlMtarzi9sW5w52011-06-27Autohaus is my main source for parts for an old Mercedes that has been in the family since new that I am restoring. The old beater is truly a labor of121
Citizen Public HouseZZ0paqUsSX-VJbfodTp1cQEeCWSGwMAPzwe_c1Aumd1w42013-03-18First time in PHX. Friend recommended. Friendly waitstaff. They were understaffed and extremely busy, but we never knew. The short ribs were tende000
Otto Pizza & PastrypF6W5JOPBK6kOXTB58cYrwJG1Gd2mN2Qk7UpCqAUI-BQ52013-03-14LOVE THIS PIZZA! This is now one of my favorite pizza places in phoenix area. My husband i walked into this cute family owned business and were greet000
Giant HamburgersQBddRcflAcXwE2qhsLVv7wT90ybanuLhAr0_s99GDeeg32009-03-27ok, so I tried this place out based on other reviews. I had the cheeseburger with everything, some fries, and a chocolate shake. The burger was okay, 011
Tammie Coe CakesY8UMm_Ng9oEpJbIygoGbZQMWt24-6bfv_OHLKhwMQ0Tw32008-08-25Overrated. The treats are tasty but certainly not the best I have ever had. I would have rated this a two star but the cakes and cookies are REALLY pr132
Marcellino RistoranteGTUOIBCEGGt_aGp-bRogfggopGuEb-ft6cHKMyCZEvJg12010-12-17This place sucks. Food was average and we had to wait an hour even though we had a reservation. My americano tasted like warm soy sauce and they blame230
Shanghai ClubwHfxd0Bq4JYLiUEO55xe4QeVQ_yDkMlF62oofUwc29Kw52013-09-19This is our favorite Chinese restaurant in the area! The service is always consistent and our favorite waitress - Becky - always makes time to spend w000
Freddys Frozen Custard & SteakburgersNfTR_B1yW1hPVEoXlSJV-wYnHYlN1m7jDhAH9XgR4Dlg42013-01-24Love the tiny fries.100
Chipotle Mexican Grillk-Oo0Gs4AC04GJAecu_iWgHjpzhIQFRQbFmc_7CtFDmg42011-05-11When you don't feel like a full restaurant and you want more than the normal Mexican fast food, Chipolte fits the bill. The food is always fresh, the000
Arizona Fire & Water RestorationuvVmBnYQf8Mnt-s64D8XOgzQP7cLujr-MJ207uuNFC1w52011-08-03This is a fantastic company. They have a very high rating with BBB and have won a ton of customer service awards. Hopefully you dont need them but i000
Arriba Mexican Grill0HOrc_RX87-01dbdFMSjJw8m8HQtZox4vS-N-AW3mzxw52013-12-31Open Christmas Day! Their food is delicious. Especially breakfast! Mmmm, the salsa mmmmm. Hatch chills, pork, chicken...pollo asada, carne asada...you110
Renegade Tap & KitchenZaCA3v9bWUpHuwZ6NO8C1QiVTzpbZ6qBdFllvcJLbmeg42011-02-26Dinner on a busy Friday night. Arrived on time for 7:30 reservation. Table not ready, restaurant and bar full. Host and hostess pleasant, but not o001
Chipotle_gcGIGfziNkhaIlkjhjKHg3uU_6L8GnFOHTsO4I3oedg32013-09-24I absolutely looove Chipotle. My problem is when you make me pay almost $2 extra for guacamole... please don't be stingy with it. Hook it up, Chipotl121
SanTan VillageLXiDBkXxcyL4IPnXbjw0VQBa-tIR3a8hhwIk-y_hVzFg42011-03-27Next month when the temps skyrocket to triple digits, i will prolly not think soo highly of this place, but for now, San Tan has gotten alot of my fre012
Love'sEr4Y-yj1JBW9cCbIf3ViKgbC3By-saT9ylKu-dwWgtcw42012-11-13Plenty of gas pumps and convenient to get some cooked food inside. If getting gas, caution for enter and exits at pumps.000
Dirty DrummertMYUWXoFuLdFecqqP60R3AX_kPh3nt0AJPNPHye2rTlA42011-06-18I was introduced to this place by my coworker and friend. She goes here every single day at lunch and its cute because everyone knows her there! Any132
Euro Pizza Cafex2atXyt-QwCTzHhglzxj3QJKp42Y520azWI_WBzUMxTw52012-04-07This is a nice cafe with a diverse menu. There's indoor or outdoor dining with a view of the famous fountain! I ate here with my friend Connie who als001
US Airwayso_YetnCcK_96ueIULO84fAVl4k0FiMNCRzEQwOpe4hXw22012-07-20Well, they have good flights and connections from time to time. Their fleet is very archaic though, probably most of their planes are older than me (a010
Wal-Mart Neighborhood Marketc7JHcdWo5pZ3rMIDbsDt_gIDHrwv_RCildFvmfWTkj5Q22011-10-02I occasionally stop in this store while driving past. I will try to keep my comments on the positive side. NOTE if you don't read anything else: if 011
Flancer's Cafe78XeKBmSE0reBjsmqg7HNgAYGHNy8gPxl2Q-etTT3hZw32012-12-01This place has decent food, cute atmosphere, but the service is problematic. I was stuck in Mesa for training and had lunch there on Halloween. My pal575
Pappadeaux Seafood Kitcheny6q-inMFFoEci-wRATp1-AS3bvMOL50vgS_8-TtlGi4w12010-12-02$20 for a double Maker's Mark? Me thinks not T.G.I. McPappadeaux. I was not impressed by anything that was there and will not only steer clear but m1366
Lone Star Steakhousefp73RBYM6NAnNWii9bxZ8w7Ot-v89x44U_VdIPgD3qKg42011-10-18Great food and a rare bird of a honest manager when it comes to whats in the food. They have very comfy booths. Did nor care for the country music. I000
EVO3D3Avu2d8Gj-HEbqqWhswgH982l-WK1p49z9jZFNMEfQ42013-12-29Great atmosphere and decor..a welcome change from the chains and anchor restaurants in the area that all feel the same. My wife could not stop raving000
Conocido Park0ESAQ8Ynk1nZPt5LayMwng3OelvbzNK3KSmMdL0O9nRQ42013-10-08I play this disc golf course weekly. The baskets are frequently moved to keep the course fresh. Trees provide difficulty and also shade! There are Dis010
Nate's Barber ShopKEMsCW33Y1ZQEGTiMKmcrAHm_7pViZyrp_Z62lBRopAg52012-12-12Nates barber shop is one of the best shops that I have encountered in my life. If your looking for a great inexpensive ($13) good looking haircut this000
S & S Tire and Automotive Service Center99_nV5h4JHomT7cgh0V6lghYKjQHu2fk4nMgCWO50f-w52011-02-25My partner and I needed two new tires and an alignment. We have a Chevy Cobalt and its only two years old. Went to the dealership and they wanted almo100
La Parrilla SuizaldvKeuzBSIesZEmFcr4ooQEXvhtd_05d1H9RlXa2CnIQ52012-07-27This food is just....fantastic. I don't know what else to say. I grew up eating at La Parrilla Suiza in Tucson. The tortilla soup and "Queso Suiza"010
Scratch Pastries & BistroznMtXO5hY5XPqAMj_7VLRgb2DKC4kC8-QeSeGZ_MF3XQ52012-03-16Yes, it is in a strip mall. Don't let that fool you. This is some downright good food with affordable prices. Even though service was very attentive 000
Joyride Taco HouseuyLuLYfjs3S_8u3OkrIdmwmY6zzvFbK0ENnQOdgtiT4Q22013-10-19Great food! But not worth the HORRIBLE SERVICE! Took about 15 mins for drinks that included a dr pepper. The server said the bartender is working hard232
In-N-Out BurgernxoxgQka8mTK-rLCh7sg3wOwVB3YzcYeTRV09tpNDBSA52011-07-08Nothing beats a #1 with grilled onions, no tomatoes, well done french fries and a pink lemonade. I've been known to eat here twice to three times a we000
Roka AkorfpjKqP8ONJ9rT82VoUhIQQ4ozupHULqGyO42s3zNUzOQ52011-07-18I hate to admit it, but it had been a long while since my last visit to Roka Akor. I deserve a hand slap. But last week, I had the perfect excuse to p5810
Ruth's Chris Steak Housez3pSiipCrQM3B6i9PrnoGwhJBOxmNREXmMGTfXgMcGug52010-03-30Best steak I have ever eaten is at Ruth's Chris steakhouse. Comes to your table sizzling hot. Sides are sold individually but are pretty good. The des000
Yupha's Thai Kitchenarf6Ne6h0UDXizsbMcOomQAkJFqLqHHAKY3H5R8p7cPQ52012-12-09Yupha Thai is definitely a "yuppy" in terms of being a great spot to go to. There are not too many places to eat near ASU Research Park, which is rig000
Hotel Indigo ScottsdaleVDtEMw1X397ViDlP7oErTw8Oy9-UwJQWffS0yOwPG6Ew42013-07-01I stayed here for one night while on a recent business trip! I wish I had discovered this place sooner.... I was pleasantly surprised! I never heard 000
Da VˆngxbVGTBSsXmvu56FTbXp7AwF6mQhKLdj_PEdxLvDYOm2Q52011-12-15I haven't had better pho anywhere in az. the large pho is enormous for the price. i usually go with my family and order a regular sized pho for myself000
Trader Joe'sHzI7nVlXJQJR3GO1KXYxlAcMmQsFyrYBv6hIE6NffqZQ52011-01-03Trader Joe's always goes above and beyond in all of their services. The food is fresh and delicious, the prices reasonable and the people are great. 000
Beaver Choice18fIpXUbcm9k6Pmtkbf0aA3gIfcQq5KxAegwCPXc83cQ42011-04-21So I went here tonight with a friend. I was really excited and nervous to try this place after reading the reviews. So you walk in and walk up to the110
Some BurrosOqogqje3RKspPwVcREfsXAGnqNc74So5Pc8C3hkA2hCg52009-07-10Came here w/ the hubster's. I've actually been craving this for some time now. The pollo fundido is great! So goooooood! Its a pretty big portion. The021
Superstition Ranch MarketB1xnRb2j_iW2Ws0u1B0FNwEOLRikjQxTIpXB4aV1hbPQ52012-01-05Great place to shop, buy what you will use within a few days.000
Fuego TacosJ6nrjjCjXc-hnRpZZPrLnQA99dyhEqcd_yXKPfBWeZHA32011-08-18I went to a late lunch on a Saturday and the Esplanade area was quite dead (I'm surprised because in the old days ('98-'99), I remember it to be prett000
The Woodshed#NAME?lsp7p2NuC5MX4_iuch3_OA12012-06-17We only went here because it has been a traditional Father's Day event with friends. We decided to join them for the 1st time......really...we will n010
Rosati's PizzaLd4Qg2Du0S3ulcdDCdm7JgSEDJTWEzMdqp7UsS1W3KXw32012-10-24First of all, let me just say their food is fantastic. I love their pizza. I love their salads. I love the cheesy garlic bread and their chicken parm.100
Sekai Sushi8TB8vM1H_SuEK2hS-5wu7gTDlgqAxf268QOw-OUk2Urw22012-05-02I am sorry to say people of Mesa you must have pretty low standards if this is the best sushi bar in Mesa. I went there last night and I really did no020
D'Vine Bistro & Wine BarJvHH1Z84UJ1P5T9uIxEnyQrT4ycOjlrKefSAcjoQga5g42012-04-12Love the atmosphere and fantastic happy hour! Favorite spot in Mesa:)010
Scottsdale StadiumeAYq_HT_gbD_ECgIWn3GoAMt3dPqOlnlGyVCftCcokmg42012-03-27Ignoring the fact that Scottsdale Stadium is a bit overpriced these days for Spring Training Giants tickets, its fun. Its basically one big party (in 030
Thai House#NAME?fczQCSmaWF78toLEmb0Zsw42008-07-21Damn... Helen Y beat me to the punch and got the FTR for this place! Oh well, I will say that she did a great job with her review - I think I was Tha387
Jalape–o Inferno Bistro MexicanoVRiSQiIfUnZdp0CxNMkLWg4nJ5ryQTcQKs8mCrgt8-BQ22011-04-05The dinner we had here was OK... nothing to rave about. The tortilla chips were very good as others have mentioned... a mix of corn and deep fried flo000
Caffe Boahi6a3fvAbtZq9jMIM8gkwQqa05pUVNapADHZXpHMPMeA32010-06-26Caffe Boa is an interesting jokester, so, it is interestingly hard to review it. I'll keep it short. I have gone here several times, and each time wit000
Golden ValleyZYEAmRpYHxJYcbIv-c7S2wgg_OKjOAl_vVmdh5ZETuiw32012-02-25Can I tell the difference between Uzbek cuisine and other Middle Eastern or Mediterranean cuisine? Nope. For all I know it's only a matter of where000
Canyon Cafegi9hLYOPk_fbvOr2mCHu7gnH9OZEGfgseWjC5_IPGCXw52010-08-09Everything about this place is wonderful!! I love the huge windows and outdoor patio! Gorgeous! Their food is amazing everything from the chips (there001
Panda Expressr52OE-CfRoJQyjBtn0vHIQfMyKbyYY9Poy9B_1QZPKcg12010-05-31My young children love Panda Express orange chicken. They eat it 2-4 times a month. We were at the mall on a Saturday afternoon and stopped into the f000
Gordon Biersch Brewery RestaurantpfPFWY5SXQEEnlVJbFaNqAnKaR5Z9Qmqc4RsakLLX_7w32010-03-29It's very hard for me to enjoy a house brewed schwarzbier that, to me, tastes more like Bud Light with a hint of acrid smoke flavor than what I consid243
Chili's Grill & Bar42EOZ0KMF4wU1Sz7oeLfpAAfyzIHPy5zds_mqf2Jdc9g32011-02-07I love Chili's. My husband and I always go for the 2 for $20 deal at whatever Chili's location we may be, but this location seemed to skimp out on the000
Beckett's TablelNLiQx1zi-ctta6v4LLhXwGJwbccjXgoRPbNuWcNKYXA52012-01-29Beckett's Table is a fantastic restaurant for people who want great food, great wine, and great service. My wife and I dined there with a couple of fr000
My Big Fat Greek RestaurantIuSys52QuyTxGv3HLFKBSw1gY1N3pkxTzh7kK4BxANyw42011-05-07I hated Greek food, until I tried this place. My "health nut" girlfriend had to drag me to this place, kicking and screaming. After all, my long ago233
Goldmans Deli6DggWM9rgzC_mIo4THFpMAPKZvqm3IeWiWBYoDDoEG4w52012-09-02Traveled in from the east coast so I got to Arizona very early in the day. I needed to kill some time before I could check into my hotel room so I en000
Macayo's Depot Cantina98nvcyGhtHlKO8pDlOcCsAbZFRqP7s0Vszxeu8_IwYow32008-03-21I finally ate lunch here after not being able to get decent parking for Quiznos on Mill Ave. My coworker and I were starving and needed a place to sh023
Rayner's Chocolate & Coffee ShopGCdNDjutQWsT-qaYwW0zxwM28A6JPQFBJnRBCfODe8IA42013-05-15Cute bakery/coffee shop hidden in a little plaza on 51st Ave off of Thunderbird Rd. Nice selection of unique baked goods, chocolates and coffee drinks010
Sushi Brokersup3ueFZ1xJh_ts6dVu3_0AhDlSSyDreM9xY4yQWPm54w22009-02-26--expensive for business lunch --servers very attentive, prompt --lacks nth-degree detail of a Japanese chef running things; rolls and standards are s433
The Vig Uptownrib7dXO863eL5VGUDsot8guQCk37gNl1bEmkjAv6_kAw42011-04-17My meal was great; the decor / layout is great also, with a lovely patio out back. The only downsides are parking, and the fact that the entrance is a010
Lo-Lo's Chicken & WafflesIlFoK4meMZ7Ws4enESzeTQEacK6XwZjsTD6QYSIRlJ7Q52008-07-14At first most of my noobie friends are very skeptical about the fusion of Chicken and Waffles. but after taking them to this place and experiencing f232
Shoe CarnivalYhlJA_CuoZlK4FIJUHlCnw_PzSNcfrCjeBxSLXRoMmgQ22010-05-17I had a $5 coupon in the mail so I was like what the heck. And is it next to Home Goods (one of my favie home decoration stores). I got to the store i210
My Big Fat Greek RestaurantQe0FO565tGfTxb7QtNCVwg2vl3MXKr8iQOWTNse5kgdw32008-04-03Nice menu selection; food was tasty. Good atmosphere. Wished they had a restaurant in the LA area.010
Casey Moore's Oyster HouseR7ZJPW4qEXuqI41aaWmO0ArLtl8ZkDX5vH5nAx9C3q5Q42009-04-02This is a fun place for appetizers and drinks if it is not too crowded and the temperature is just right outside. Otherwise the inside gets way too p121
Q to U BBQIJxqQwzJjAURPBAB_-iOAAMSgZpSWlf8T2H_46OWNgCQ52011-09-08Really enjoyed the Ribs and fries, I came with friends that really like BBQ, they agree, the ribs were terrific! I am sure we will be going back to Q111
Lightning LubekezCWAz6MO1wKwXB_DK-3QbwmXfjwrogAaGqV33kSVpQ32013-08-24$115 for an oil change and two air filters for my civic. I must've had a really long day or she had magical powers and made me forget I could simply c011
Athens GyrosaAgVzZU2b0YbYGi4byeI6wL8_GwFxxtGSYR2F_dglpSg42013-08-17Great food nice service. The girl that worked up front introduced herself to the other patrons that were there and asked them how the food was but no000
Metro Light RaillG8Swugg_DQxY3NgT_BEigLqgGgWi3FLHBViX9tmZ9sw32011-10-31I just wish that this stupid HUGE metropolis could have more LIGHT RAIL connections!! compared to the circus you have to stand in the buses, the LR is111
Oregano's Pizza Bistro7QvgM_LJi6SRp_GuOXPFZQfor16MiFS1M_8_cne6IbIw42013-02-04Big Delicious portions!000
Gallo BlancogULD5qz_CQI9clPWh2FNHAZ02XdD0muEz2FFQKPERMYQ52013-12-29Stopped in here one night right before Christmas. Short and sweet: Margaritas - very good (and huge by the way) Tacos - awesome Guacamole - exce222
Los Dos MolinosJcWhDcyNl3r_Tbeqiac15QGoymUzKqvET2QOZkIWZi9w42014-01-07We have a friend who said the salsa is way too hot. All I heard was "you must try Los Dos Molinos". We love spicy food and are always up to a challeng011
Nancy's Nail SalonZGo8c57MrzQrSN6R7zO1uQA9g7YnTtsSV-wEIo3HI1YQ42011-06-09Came here with my sister in law she had a coupon for a 27.99 mani/ spa pedi. The staff was friendly they have a tv and plenty of magazines to look thr010
Changing Hands Bookstore1xzMe1EEwhF23RNh3InKkQfPHLPrymsyb6WSFFKoMrTQ52010-10-26This not-so-little bookstore has it all... new and used books, a unique gift section, book signings and events, wonderful staff and a cool, organized,001
Tortilla FishOmSYYxZskG9BeRMwb5DltwMxO7EY766jVoFEZzkpwmOQ22013-10-06My experience wasn't bad, just not up to the hype of all the other reviews. I tried the shrimp, fish, campechana and machaca tacos. The shrimp tacos w000
Pet Club0LvO1yc-52fJ6vIHaFVdAwLXOhR4ZUULSbBNztxYZ2dQ22013-07-16That awkward moment when local competitors come write negative reviews about a store and then direct traffic to their own store....011
Taste of TopsJ2lGBvJOcuhmauWs3rgMSgaIAjAU-6NH583EkQ6E9KRw42009-10-09Okay, in interest of full disclosure, I literally live around the corner and across the street from Tops Liqour and have been waiting forever for this111
Carolina's Mexican FoodN6eg6Jc_mL_XHMGmw6GElwcbxUyCUMjkWAs1h4auYeAw42012-01-31If you're looking for real Mexican in a hurry this is your place. This is one of places ill always bring out of town guests who can't find good Mexica001
Super L Ranch MarketcK3J7FAqruLZM_Y5J29Q8Qz06IHGXI_ofBc2DkAbCgnA52011-05-09HOLY CRAP THEY HAVE FROZEN XIAOLONGBAO. :) These delicious little bites of porky, soupy dumpling heaven have eluded me since I first tasted them in 221
Salt CellarIXGX_Lk2NgCH-0OQNcGMpQE4HbTIHd9PVjUnEKpysaLw52011-10-04My experience here was absolutely amazing. My boyfriend and I had reservations for 7:30 pm on a Saturday night and the service was amazing and the fo021
Rosita's Placef_yQqlsim0S9YAIIYFvR5Q7nlZJW84Adt6oYn2shnn_g32013-05-30The food here is delicios, but it takes forever. From the time I ordered to when I was served 25minutes. Come only if you have time to spare to sit an000
Matador Restaurantx927gFqVNPSOPwNrKqPmmQnyHh14Vb9S269-kGKaUelg42011-08-09I've eaten at Matador almost once a month for about15 years. I won't get into the logistics of others reviews. I give the higher than average star rat001
Hanny'sriZp_RIN28ld-U2Q5dhKhAGRgBu4K7GOb3354esp_xkg42010-06-08Food = 4 stars Place = 5 stars Service = 4 stars I REALLY like this place. It looks super classy inside and the deco on both floors is a333
Crust Pizza and Wine CaferTc3d_GYXyHuf_tQoED80gAOmdmYYSeLUstcN084_wMA12012-10-16I told my friend that I'd rather eat out of a vending machine, and he said.. "Yelp THAT!" I had the calamari app, the caprese salad and the eggplant210
Ocean Air9gLTx4HjE-NeSa3KTfGJJQK0U0Hp6rgXHrYCG4jpPT8w52013-09-01Thank you! Ocean Air came highly recommended and now I know why!! Excellent, fast, friendly service!!! Reasonably price AC maintenance and FAST respon000
Sleepy Dog Brewpub9gtyU7vjWUjddmFrT97swwFm0EXFwIfDQoIm9RgcAOKQ32013-04-06As a number of others have said, the beer is good with a good number of choices. The food is pretty good too. The biggest issue has been service. The000
unPhogettableyVK0x3_-o16ufBbIyqGJRwsWh4Tjwa8ch_rziHtTN9LA52013-08-27Always amazing service! Always amazing pho! Add veggies to a meat dish! The spring rolls A1 & A6 are the best in town!!! We are now here on a weekly b111
Paradise Valley Burger CompanybkZ67PfRlKLKl4x5mIFYSgff00OcqImnNYy-OvSgUZyw52013-11-04Best burgers in town! They don't skimp on quality or ingenuity.000
Hanny's4n_3G2Xux0stcgOUsrzYawev7D2jo5OUDeHf0dWoWlsQ22012-06-25Super disappointing, they won't take reservations and I wanted to make a reservation for 20 persons. I've been here many many times and love the food 002
Golden PandaikNpO72tj7uI5VTatHpoAA80OFMLRA0yW3sE4ciYg_vA12009-10-10Why oh why is it so hard to find good Chinese food in this town? The two behind the counter at this place are Chinese - please don't tell me you eat000
Breakfast ClubU_oJEB166nCeBNY-wqadxwAMYi-53cxstrCR5wqyY1KA52011-01-11O-M-Goodness! What luck to have eaten here for breakfast!! Huge portions served with fresh fruit slices or mixed berries. Great service and very nice000
Ulta Salon Cosmetics & Fragrance9e3MOWg4zrq_NOqKP3fMcQF6QsMoJdvtohlbnST-fDyQ42011-03-14I really like this store and have been going here for as long as they have been open. 18 or 19 years. It is always exciting to be rewarded for the pro110
Essence Bakery CafŽNkekoPY-4txUxkyoN_Tu4wDrWLhrK8WMZf7Jb-Oqc7ww52012-09-14Ok, I'm not sure I ever had this pastry combination, but it was clearly a great item. It was the Chocolate almond croissant. Usually those two varieti010
zpizzaUJEPSoO6yNnR8kdneDy0rgfSi-yrKtBD58h2vPxjNE1A42010-12-01We ordered 4 rusticas for delivery using their buy-one-get-one promo online. They had a great deal on rustica pizzas, but somewhere in the fine print 211
Kona GrillosYRF4FQe4cziGIXz33eQQgITFg65GtRDUb-0n460vNg42011-03-17I've eaten at Kona Grill twice in two days while doing business in the area. I had the Kona Burger and the Pepperoni Pizza. The burger was fantastic000
White Housexn2LkVHBuRZ_jAg-LIiQ4QwFweIWhv2fREZV_dYkz_1g42011-07-25It's been a while since I took part in the nightlife in Scottsdale, it's not my scene but I was invited to White House for a party so you know how tha354
Crowne Plaza Resort Hotel San Marcos Golf ResortbMKW11Cf1Zeu1zWkzDbtrQKt9NwDONle_mc0QHTud9jw12011-07-18Rating the golf course, horrible!! Thank goodness there was no one in front of us and we zipped around the course. They clearly stopped maintaining 000
Hon Machimu8Gst6LkzG5ahmolCH55gKucBnMrhalzxnD9AWrxwYQ52011-06-21Great place for sushi and tepan - period. Not the "high-end" places but a rock solid place with lots of variety and good prices.010
Bourbon Steak a Michael Mina RestaurantAA6QQUFGWWkZlbpat46OfQcEIeuU0-4fX0Y4qCUW3PwQ22008-12-01My husband, some friends and I went to this place for restaurant week. We all sampled different dishes to experience a range of items - the multi-flav012
Tempe's Front PorchJ71o5dOSoxoOhcR8NEo4OgR4Ax3btoJ6qLXhqq6J50VQ42014-01-06This is the front porch of Monti's, my boyfriend and I were pretty confused looking for it. It's outdoors with a bunch of heating lamps, be sure to s121
Joyride Taco HousepKe_ORPqaW0vfGyFkbxdHwxv9nUSKR5RqnkgD0tufTfA42013-12-02Tried this place tonight with my boo and I am definitely a fan. He loved his carne asada burrito and my enchiladas were super tasty. They are a littl000
LunardisfpjKqP8ONJ9rT8209thIQQ9itypHULqGyO42s3zNUzOQ52018-06-11This is the nicest grocery store in the city. I actually met my wife at this grocery store while shopping for avocados.6710

Question 2: Arrange worker records based on values in two columns

In this question, aspiring data engineers are asked to arrange names in an alphabetic order.

EasyID 9836

Sort workers in ascending order by the first name and then in descending order by department name.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9836-sort-workers-in-ascending-order-by-the-first-name-and-in-descending-order-by-department-name

Understand the question

The premise of this data engineer SQL interview question is fairly simple. We need to arrange records based on names (text values) in two columns.

The question only tells us to sort records in ascending and descending orders. However, since values are text, it means we need to order rows alphabetically.

There’s a twist to this question - for one column (first names), we must order in ascending order (from A to Z). For the second column (department names), we must order them in descending order (from Z to A).

Analyze data

All the necessary information is contained in a single workers table. Let’s start by looking at the types of values in each column.

Table: worker
Loading Dataset

We have six columns.

  • worker_id contains integer values to identify workers. Most likely, we won’t have to work with this column.
  • The first_name column contains first name values. We will order records by looking at values in this column.
  • We don’t need to work with values in the last_name, salary, or joining_date columns.
  • Department names are stored in the department column. We will order records by looking at values in this column.

Looking at the table, all values look standard. It’s worth noting that first and last names are capitalized.

Plan your approach

This question asks us to order rows based on values in two columns - first_name and department. We don’t have to change rows or values in them.

We need to select all rows and use the ORDER BY statement to arrange them in a specific order.

It’s important that you know how to order rows according to values in two columns.

Write the code

Try to write the code yourself.

Missing data

Output

The correct query should return a sorted table.

All required columns and the first 5 rows of the solution are shown

worker_idfirst_namelast_namesalaryjoining_datedepartment
9AgepiArgon900002015-04-10Admin
4AmitahSingh5000002014-02-20Admin
8GeetikaChauhan900002014-04-11Admin
12JaiPatel850002014-03-21HR
13JuraJomun9800002013-05-20HR
10MoeAcharya650002015-04-11HR
1MonikaArora1000002014-02-20HR
11NayahLaghari750002014-03-20Account
2NiharikaVerma800002014-06-11Admin
7SatishKumar750002014-01-20Account
6VipulDiwan2000002014-06-11Account
3VishalSinghal3000002014-02-20HR
5VivekBhati5000002014-06-11Admin

Question 3: Find the last five records of the dataset

In this data engineer SQL interview question, data engineers are asked to select a specific part of a large dataset.

EasyID 9864

Find the last five records of the dataset.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9864-find-the-last-five-records-of-the-dataset

Understand the question

Description of this question is fairly simple.

Before attempting to answer the question, it’s a good idea to look at available data and examine values, their types, and the ordering of records.

Analyze data

There is one available table named worker. We already worked with this table in the previous question.

In this case, we are not concerned with any of the six columns or values contained in them. We simply have to find and return the last five records.

However, it’s important to look at the actual table (with data) to see if we can notice anything that will help us find the last five records.

Without further ado, let’s look at the worker table with actual data:

Table: worker
Loading Dataset

As you can see, every record has its own unique worker_id value.

It’s safe to assume that records are arranged in ascending order according to worker_id values. Double-check with the interviewer to confirm this pattern. Once you do, solving this data engineer SQL interview question becomes much easier.

Plan your approach

Before we can find and extract the last five records, we need to define the criteria for identifying them.

We’ve seen that each record has its own unique identifier of integer type. As long as these numbers are unique and increasing, counting the number of all rows will give us the id of the last record.

We can use that information to get the worker_id value of the last five records and set up the condition only to keep rows with those ids.

Or we can find the worker_id of the fifth row before the last and keep records where worker_id is higher than the threshold.

Write the code

Practice your SQL skills by answering the question yourself.

Missing data


Output

Your final query should return the last five records.

All required columns and the first 5 rows of the solution are shown

worker_idfirst_namelast_namesalaryjoining_datedepartment
9AgepiArgon900002015-04-10Admin
10MoeAcharya650002015-04-11HR
11NayahLaghari750002014-03-20Account
12JaiPatel850002014-03-21HR
13JuraJomun9800002013-05-20HR

Question 4: Primary Key Violation

In this data engineer SQL interview question, you have to find duplicate cust_id values and return the number of times they are repeated. You can find other Amazon Data Engineer Interview Questions on the StrataScratch blog.

Last Updated: May 2022

EasyID 2107

Write a query to return all Customers (cust_id) who are violating primary key constraints in the Customer Dimension (dim_customer) i.e. those Customers who are present more than once in the Customer Dimension. For example if cust_id 'C123' is present thrice then the query should return two columns, value in first should be 'C123', while value in second should be 3

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/2107-primary-key-violation

Understand the question

The question has detailed instructions, but it can be confusing. Make sure to read it multiple times to get to the essence of the question.

To solve this challenge, you must know that primary keys are unique identifiers for each record. They can not be repeated. The question refers to this as the ‘primary key constraint’.

There are also instructions for how to format the output. We need to return rows with the duplicate cust_id values and the number of times they appear.

Analyze data

Looking at the available data could help you better understand the question.

Let’s start by looking at the types of values in every column of the only available table named dim_customer:

Table: dim_customer
Loading Dataset

To solve this data engineer SQL interview question, we’ll have to find duplicate cust_id values and the number of times they are repeated. This column contains text values, specifically of the ‘varchar’ type.

cust_name, cust_city, cust_dob and cust_pin_code values are not important.

Plan your approach

In simple words, we need to find duplicate cust_id values and the number of times they are repeated.

The simplest way is to create a group of rows for each unique cust_id value, find the number of rows in each group, and return those with more than one row in the group.

Then return the number of records in each group. This will return the number of rows with the same cust_id value.

Write the code

Now that you have a logical outline of what to do, try to match it by writing the SQL query yourself.

Missing data

Output

Query should return two columns - the key and the number of times it was repeated.

All required columns and the first 5 rows of the solution are shown

cust_idn_occurences
C2762
C2812
C2743

Advanced data engineer SQL interview questions

Advanced data engineer SQL interview questions

Let’s also look at some of the more difficult SQL questions asked during data engineer interviews.

Question 5: Verify the first 4 numbers of all phone numbers

In this question, we have to validate data by checking phone number values and confirm that they begin with specific 4 numbers.

MediumID 9737

Verify that the first 4 digits are equal to 1415 for all phone numbers. Output the number of businesses with a phone number that does not start with 1415.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9737-verify-that-the-first-4-digits-are-equal-to-1415-for-all-phone-numbers

Understand the question

This data engineer SQL interview question asks us to verify that all phone numbers begin with 1415. It’s safe to assume that one of the columns contains phone number values. We will have to make sure that the first four symbols of values in this column are 1415.

It’s unclear what kind of answer we need to return. It could be binary - to show whether or not all phone numbers begin with 1415. Or we might have to return the number of phone numbers that begin with 1415 or do not begin with 1415.

In situations like this, you should ask the interviewer for specific instructions.

Analyze data

The sf_restaurant_health_violations table contains information about businesses, including their phone numbers.

Let’s take a closer look at the types of values contained in each column.

Table: sf_restaurant_health_violations
Loading Dataset

The table has a lot of columns.

The only column that contains a phone number is business_phone_number, so it’s safe to assume that we need to check values in this column.

We can confirm our assumption by looking at the actual table above.

It's important to note that not all rows have values in this column. Some are empty.

Plan your approach

To solve this question, we have to filter rows by two criteria - the value in the business_phone_number column is not empty. If it has a number, it must start with 1415.

Let’s assume that the question asks us to return the number of rows where the phone number does not start with 1415.

Finally, we should return the number of rows that satisfy both criteria.

Write the code

1. Get the number of all rows

We can use the COUNT() aggregate function to get the number of all records in the table.

SELECT COUNT (*)
FROM sf_restaurant_health_violations

COUNT() will return the total number of records in the table. Output will be collapsed into one row.


2. Set up two conditions

To get the answer, we need to find the number of records that satisfy the two aforementioned criteria.

We can add a WHERE clause to remove records that do not meet the criteria.

SELECT COUNT (*)
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL
  AND LEFT(business_phone_number :: TEXT, 4) <> '1415'
Missing data

The COUNT(*) aggregate function will apply to filtered records.

We use the LEFT() function to take four symbols from the left side (beginning) of the value, and compare it with 1415.

Output

Our query will return the number of phone number values that exist but do not start with 1415.

All required columns and the first 5 rows of the solution are shown

count
1

Question 6: SMS Confirmations From Users

In this data engineer SQL interview question, data engineers must identify and get the number of invalid records. It is one of many Facebook Data Engineer Interview Questions.

Last Updated: November 2020

MediumID 10291

Meta/Facebook sends SMS texts when users attempt 2FA (two-factor authentication) to log in. The fb_sms_sends table logs all SMS texts sent by the system.

However, due to an ETL issue, this table contains some invalid entries, specifically, rows where type = 'confirmation' or other unrelated message types (like friend requests). These records should be ignored.

Only rows with type = 'message' represent actual 2FA texts that were sent to users.

Use the fb_confirmers table to identify which of these messages were successfully confirmed by users.

Calculate the percentage of confirmed SMS 2FA messages (where type = 'message') sent on August 4, 2020.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10291-sms-confirmations-from-users

Understand the question

Read the description of this question multiple times because it’s a lot to take in at once.

In short, a table contains records that should not be there. Specifically, confirmation and friend request records. It’s our job to identify misplaced records.

Finally, we are asked to find what percentage of confirmation texts were valid on August 4.

Analyze data

You’ll have to work with two tables to find the answer.

One is the fb_sms_sends table. The question tells us that some of the records in this table are invalid.

The other fb_confirmers table contains correct confirmation records.

We need to compare these two tables to calculate the percentage of confirmed texts for a specific date.

Let’s look at columns in each of these tables:

Table: fb_sms_sends
Loading Dataset

the ds column contains datetime values, most likely the date of sending a message. We will need to check this column to find sms confirmations for the 4th of August.

The question does not ask us anything about the location or network of the user, so we can ignore the country and carrier columns.

Values in the type column might help us identify whether the record represents a valid message, or false records.

The phone_number contains a phone number that receives the confirmation message.

Let’s move on to the next table, which contains correct records of confirmation messages.

Table: fb_confirmers
Loading Dataset

Here we have just two columns, values in the date column indicate when the message was sent. Phone numbers are stored in the column of the same name.

Some of the confirmation records in fb_sms_sends table represent an unsuccessful attempt. We need to check them against information in the fb_confirmers table, which contains records of successful confirmations.

Plan your approach

We need some way to identify ‘confirmation’ and ‘friend request’ records that should not be in the fb_sms_sends table. We can do that by merging two tables on shared dimensions of the phone number and the date. If records from the ‘unverified’ table are not in the valid fb_confirmers table, it means they are invalid.

We will also have to set up a condition to only keep rows for messages that were sent on the 4th of August 2020 and remove messages sent on all other days.

Finally, we need to calculate what percentage of all messages were confirmed. Once we merge the table, we can get the number of records from the fb_confirmers table that are also present in the fb_sms_sends table. Then find the number of all records of the ‘message’ type and calculate the percentage.

Write the code

1. Perform a LEFT JOIN to filter out invalid records

To compare valid and invalid confirmation records, we need to combine data from two tables. The safest bet is to perform a LEFT JOIN, which will keep all records from the fb_sms_sends (including invalid records) and only match values from the fb_confirmers table.

SELECT *
FROM fb_sms_sends a
LEFT JOIN fb_confirmers b ON a.ds = b.date
AND a.phone_number = b.phone_number

We will use the ON clause to define two shared dimensions - the date when the message was sent and the phone number.

In the combined table, records from the fb_sms_sends table that don’t share the date and phone_number values will have empty columns. This will help us identify messages that were not confirmed.

All required columns and the first 5 rows of the solution are shown

dscountrycarrierphone_numbertypedatephone_number
2020-08-05PKat&t9812768962friend_request2020-08-059812768962
2020-08-06LTsprint9812768973friend_request2020-08-069812768973
2020-08-06VCat&t9812768976friend_request2020-08-069812768976
2020-08-04SZrogers9812768988friend_request2020-08-049812768988
2020-08-04ITat&t9812768991message2020-08-049812768991


2. Filter records by send date and their type

The question specifically tells us that we need to work with messages sent on the 4th of August in 2020. Looking at the table, we can see that dates are written in the ‘MM-DD-YYYY’ format.

We need to chain another condition to specifically select records of the ‘message’ type.

SELECT *
FROM fb_sms_sends a
LEFT JOIN fb_confirmers b ON a.ds = b.date
AND a.phone_number = b.phone_number
WHERE a.ds = '08-04-2020'
  AND a.type = 'message'

Running this query will only keep message records where the value of ds column is ‘08-04-2020’ and has a type of ‘message’.

All required columns and the first 5 rows of the solution are shown

dscountrycarrierphone_numbertypedatephone_number
2020-08-04SAat&t9812768913message
2020-08-04HNat&t9812768921message
2020-08-04TJrogers9812768934message
2020-08-04ITat&t9812768991message2020-08-049812768991
2020-08-04NRrogers9812768997message

3. Calculate the percentage of valid confirmations

We need to find the percentage of valid confirmation messages vs. all messages from the fb_sms_sends table.

Looking at the output of the previous step, we know that valid confirmation records from the fb_confirmers table have date and phone_number values.

We can use the COUNT() aggregate function to find the number of valid confirmation records and the number of all messages.

Provide phone_number or date columns as an argument to the COUNT() function to find the number of records with values that are not NULL.

Repeat the same to find the number of all message records. Only this time, the argument to the COUNT() function should be one of the five columns where all records have values.

Then divide two aggregates, and you’ll have a decimal ratio. Finally, multiply the ratio by 100 to get the percentage value.

SELECT COUNT(b.phone_number)::float / COUNT(a.phone_number) * 100 AS perc
FROM fb_sms_sends a
LEFT JOIN fb_confirmers b ON a.ds = b.date
AND a.phone_number = b.phone_number
WHERE a.ds = '08-04-2020'
  AND a.type = 'message'
Missing data

The result of the COUNT() function is always a whole number. Don’t forget to convert it to a float so it can be divided by another whole number.

Finally, use the AS keyword to give the final column a descriptive name.

Output

Query should return one row with a single percentage value.

All required columns and the first 5 rows of the solution are shown

perc
20

Question 7: Find the first 50% of the dataset

In this question, candidates have to return the first half of all rows in the table.

MediumID 9859

Find the first 50% records of the dataset.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9859-find-the-first-50-records-of-the-dataset

Understand the question

The premise is fairly simple - we need to return the first half of all rows in the dataset.

Analyze data

We are given one table named worker.

Let’s look at the types of values in every column:

Table: worker
Loading Dataset

To answer this data engineer interview question, we don’t need to work with any of these columns. We just have to return the first half of all rows.

While looking at data, you might notice that every record seems to have an incremental worker_id value.

This is a limited example with only 8 rows. In reality, data engineers work with much larger datasets.

Plan your approach

To get the first 50% of all rows, first, we must find the total number of rows in the table.

Once we do, we can divide the total by 2. This will give us half of all rows, let’s call this number X. We can tell SQL to return the first X number of rows, but not more.

Rows already have worker_id values, which appear unique and incremental. We can set a condition to return records where the worker_id value is less than X (a number that represents half of all records). However, there are no guarantees that these values are unique and/or incremental.

Instead, we can use window functions to assign each row a unique incremental number and tell SQL to return records with numerical values of less than X (half of all records).

Write the code

If you like a challenge, try to write a query yourself.

Missing data

Output

The query should output four records because the example data showed 8 records.

All required columns and the first 5 rows of the solution are shown

worker_idfirst_namelast_namesalaryjoining_datedepartment
1MonikaArora1000002014-02-20HR
2NiharikaVerma800002014-06-11Admin
3VishalSinghal3000002014-02-20HR
4AmitahSingh5000002014-02-20Admin
5VivekBhati5000002014-06-11Admin
6VipulDiwan2000002014-06-11Account

Question 8: Formatting Names

This question involves cleaning up data and ensuring its consistency.

EasyID 9831

Print the first name after removing white spaces from the left side.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/9831-formatting-names

Understand the question

The task is fairly simple - remove unnecessary white spaces from the left side of first name values.

The question asks us to print (return) trimmed values.

Analyze data

Let’s start by looking at the table and dataset available for this question:

Table: worker
Loading Dataset

To solve this data engineer SQL interview question, we are only concerned with values in the first_name column.

Plan your approach

We need to take first name values and trim them.

The question tells us to remove white spaces only from the left side of first name values. So the standard TRIM() function is not going to work.

Write the code

Solving this challenge would be a great practice before going into a data engineering interview.

Try to write the query yourself:

Missing data


Output

The query should return only trimmed first name values and nothing else.

All required columns and the first 5 rows of the solution are shown

first_name_trim
Monika
Niharika
Vishal
Amitah
Vivek
Vipul
Satish
Geetika
Agepi

Question 9: Rows With Missing Values

In this data engineer SQL interview question, we have to find and return records with multiple empty columns.

Last Updated: April 2022

MediumID 2106

The data engineering team at YouTube want to clean the dataset user_flags. In particular, they want to examine rows that have missing values in more than one column. List these rows.

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/2106-rows-with-missing-values

Understand the question

The objective is clear - identify rows with more than one missing value and return them.

Analyze data

Let’s look at columns in the user_flags table:

Table: user_flags
Loading Dataset

As you can see, all columns contain text values.

We’ll have to check all columns to find out whether they are empty or have values in them.

The last record has two empty columns. It is a prime example of the kind of records we’re looking for.

Plan your approach

The question asks us to return rows with missing columns, so we need to:

  • Use the WHERE clause to remove rows that don’t have two or more missing values.
  • Set up a complex condition (using CASE statement) to identify and keep count of empty columns for each row.

Write the code

Try to write the query yourself.

Missing data


Output

The final query should return all the records with more than one missing value.

All required columns and the first 5 rows of the solution are shown

user_firstnameuser_lastnamevideo_idflag_id
CourtneydQw4w9WgXcQ
GinaKorman
Greg5qap5aO4i9A
Ct6BUPvE2sM

Question 10: Find the combinations

As a data engineer, sometimes you’ll have to prepare data for analysis. For example, extract data that meets certain criteria.

MediumID 10010

Find all combinations of 3 numbers that sum up to 8. Output 3 numbers in the combination but each combination must use three different rows (do not reuse the same record).

Go to the Question

Link to the question: https://platform.stratascratch.com/coding/10010-find-the-combinations

Understand the question

You might have to read this data engineer SQL interview question multiple times to understand it fully.

We are given a list of numbers and need to find and output various combinations of three numbers that add up to 8.

There is an extra condition - we can not add a number to itself. Every number in the equation must be unique.

Analyze data

The transportation_numbers table is essentially a list of numbers. It has two columns, index and number, and both of them contain integer values.

We need to go through the list and pick three that add up to 8.

Table: transportation_numbers
Loading Dataset

It looks like all values in the number column are below 8.

Plan your approach

The initial table has one column with numbers.

We need to combine the table with itself three times, so that every record contains three number values.

If you use JOIN to combine tables, you can set the ON condition to ensure that all three numbers are different from each other.

Finally, you need to set up a filter that checks if three number columns add up to 8.

Write the code

Try to solve this interesting challenge yourself.

Missing data


Output

All required columns and the first 5 rows of the solution are shown

num_1num_2num_3
503
341
314
305
701
035
125
170
107
413
071
431
152
017
530
710
251
134
053
512
350
143
521
215

Summary

Hopefully, by now, you better understand the day-to-day responsibilities of a data engineer and what a data engineer does and the importance of their work.

To do the job successfully, data engineers need to have a very specific set of skills, one of them is SQL. In this article, we listed the most important SQL concepts and walked you through data engineer SQL interview questions asked by reputable employers.

Practice on these questions to maximize your chances of landing a data engineer job. Go to the Data Engineer Interview Questions post on the StrataScratch blog to find more questions to solve. The platform has hundreds of SQL interview questions of various levels of difficulty. You can write queries, check their correctness, get hints and guidance all within one platform. Soon enough, you’ll master SQL and be prepared to solve any interview questions with confidence.

Share