Solving LeetCode Two Sum Problem for Data Science Interviews

LeetCode Two Sum Problem

This is an introductory level question where you have to find two numbers from a list that add up to a specified value. Before writing complex algorithms to work with data, you should aim to understand the task at hand.

Link: https://leetcode.com/problems/two-sum/

Solving the LeetCode Two Sum Problem

Choosing the right solution comes down to factors like execution time and efficient use of storage memory. Data science companies are looking to hire data scientists who can follow instructions and write efficient algorithms.

Framework to Solve this LeetCode Two Sum Problem

To work as a data scientist, you need to have a proven way to approach problems in your day-to-day job. This is a mentally exhausting process, so you’ll need a system to help keep your focus on the question. This way, you can ensure that your algorithms are efficient and answer the question comprehensively.

When approaching a data science problem, you should always study the question first. It’s much easier to formulate an approach to the question if you understand it logically.

In order to fully digest the question, pay attention to all the important aspects of it: available data, conditions for the task, and possible limitations.

Here are few broad steps for approaching the problem:

1. Understand your data

Question descriptions on LeetCode include samples of data. Study these carefully. Specifically, pay attention to the type of input data, range of values, and expected output. Consider these factors to write an error-free solution to the question.
Look for edge cases in available data. If you find any abnormalities, adjust your algorithm to handle them logically.
Pay attention to what the question description mentions about the data. For this question, it is specified that the nums array (available data) contains exactly one pair of values that add up to the target number. Considering such details can help you write an algorithm that produces consistent results.
Consider constraints and limitations of available data. For instance, minimum and maximum range of values and number of items in the array.

2. Formulate your approach

Think of your algorithm as a combination of few operations. Create a step for implementing each operation. This way, you can easily approach even the most difficult questions.
Decide which features of Python you are going to use. For example, in this question, it’s obvious that we have to loop through an array. It’s safe to assume that we’ll have to set up a for loop.
Describe your solution to an interviewer. If you’re on the right path, it could be an opportunity to display your problem-solving skills. If you get stuck, you can still show that you can partially solve the problem. You might be missing something trivial, and the interviewer might help you proceed with your solution.
When thinking of an answer, pay close attention to the limitations of available data.

3. Code execution

Start with the simplest, brute force approach. Gradually, as you’re working on the problem, you’ll start to get ideas for fine-tuning your solution.
Try to think of a solution where you can reduce execution time by storing numbers in the object. When you’re looping through the array, check the object and find a suitable pair for the current number.
Don’t overcomplicate your algorithm. Try to keep your code simple.
Describe your code as you write it. Communication is a rare but valuable skill among data scientists.

Understand Your Data

Looking at actual data and expected output can help you understand what your algorithm needs to do. This way, you can digest the question faster than reading about the task.

For this LeetCode Two Sum question, we have three sample arrays, target values, and expected output. The first example also includes an explanation for the output.

Understanding data for solving leetcode two sum question

Looking at the data, we can notice a few patterns:

The third example shows us that an array can contain two instances of the same number. If two array items add up to the target number, we must return their indices, even if they are the same number.
However, we are prohibited from adding one array item to itself to get the target number. In short, the answer to this question must be two different indices of two array items. Not an index of one number that we summed with itself to get a target number.
The algorithm should output indices, not the numbers themselves. When coming up with a solution, we have to keep track of the indices of each value. Also, keep in mind that Python starts counting indexes from 0, not 1.
Data demonstrates that every nums array has just one pair of values that adds up to the target number.

During an interview, it’s important to be aware of the constraints. For this LeetCode Two Sum question, constraints describe minimum and maximum length of an array (2 to 10 000 array items), minimum and maximum values of the numbers in the array, minimum and maximum value of target, and the fact that every array contains just one pair of numbers.

Our solution should be aligned with these constraints, and return the correct answer for any variation of values within this range.

Before starting to write the code, it’s useful to think about what the target value represents. It is a sum of two numbers from the array. We have to write an algorithm that finds which pair of values can be added up to get the target value. For each number in the array, we’ll look for another number that can ‘fill the gap’ between the current number and target value.

Solution 1:

The obvious solution to this LeetCode Two Sum question is to check each number against every other number in the array to see if their sum is equal to target value.

Formulate your approach

This is a ‘brute force’ approach, where we’ll need two for loops to check each number against other numbers in the array. We will have three broad steps:

Define a function and its arguments
Create a loop to iterate over numbers in array
Add a nested loop to cross-check the current number with other numbers in the array.

To find the right answer, we have to abide by two conditions. One is that we must add up two distinct numbers from the array. It can be two instances of the same number, but not a single instance of a number added to itself to produce target value.

In the first iteration, we check the first number against all other items in the array. When we move on to the next number, we don’t have to go back to check it with the first number again, because we already checked that pair during the first iteration.

This approach allows us to do fewer calculations and execute our algorithm a little faster. It also ensures that we won’t add the number to itself to get the target value.

Code Execution

First we define a function that accepts two arguments: nums and target. Nums is the array of numbers, and target is the value we want to get by adding two different numbers from the array.

Python allows us to specify the types of values. nums value is a list, and target value is an integer. The function will return a list of two indices.

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        for i in range(len(nums)):
            for j in range(i + 1, len(nums)):
                if nums[j] == target - nums[i]:
                    return [i, j]

Note: When running the code on LeetCode, please select Python3 in the code editor for the code to work.

Inside the function, we have a loop that goes over every item in the array. Within this loop, we have a nested loop to check the current number against all others in the array. If we find a pair of numbers that add up to the target value, we return the indices of two values.

Note that in the nested loop, we go over every number that comes after the current iteration in the main loop. The numbers that come before the current iteration are already checked.

This solution isn’t very fast because we use nested loops to find a pair of numbers.

Each for loop takes O(n) time to execute for every item in the array. For every element, our algorithm executes many nested loops. This slows down our function. For this reason, time complexity for this algorithm is O(n2), where n is the number of items in the array.

In this case, space complexity is constant O(1), and does not change depending on the size of the array.

As you can see, the execution time for this algorithm grows astronomically with the number of items in the array. This is a good basic approach to start thinking about the problem, but it needs improvements.

Solution 2:

It’s possible to write an algorithm where the execution time does not dramatically increase with the volume of data.

Formulate your approach

To make our algorithm more efficient, we need to understand the target value in relation to numbers in the array. If we have an X number in the array, our goal is to find the value to ‘bridge the gap’ between this number and target value. It’s safe to say that we’ll be looking for the difference between target and X values.

Another key component of this approach is to create a hash table to store values (and their indices) as we iterate over them. While we loop over every item in the array, we will store these numbers in a hash table.

Once we create the hashtable object, we don’t need a nested for loop to check every number against other numbers in the list. Storing data in the object requires memory space, so this solution cuts down on runtime complexity at the expense of increasing space complexity.

For every iteration, we will check if the difference between the current number and target value is in the hashmap. If it is, we should return the indices of the pair. The question states that every array contains just one pair of values that add up to the target value. By the time we get to the last item in the array, we are guaranteed to find a pair.

Code Execution

We define the twoSum function that accepts the same arguments, with the same types, and output as in the first approach.

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        hashmap = {}
        for i in range(len(nums)):
            hashmap[nums[i]] = i
        for i in range(len(nums)):
            difference = target - nums[i]
            if difference in hashmap and hashmap[difference] != i:
                return [i, hashmap[difference]]

In the first line of the function body, we initiate an empty hashmap object. This is where we’ll store numbers and their index in the array.

In this algorithm, we loop through the array two times. In the first iteration, we go over every number in the array. For every iteration, we add the number and its index as key-value pairs to the hashmap object.

In the second iteration, we find the difference between a current number and target value. Then we check if the difference is in the hashmap. If the object contains this value, we return indices of the current number and its pair from the hashmap object.

Storing values in the hashmap object allows us to find the answer by iterating through the array twice. We don’t have to check every item against all other items, so the algorithm executes much faster. Time complexity for this function is O(n).

The reduction in runtime comes at the expense of increase in space complexity. The more elements we have in the array, the more memory space it will take to store their values and indexes. Space complexity for this algorithm will be O(n).

Solution 3:

In the third approach we update the hashmap and find a pair of values using just one for loop.

Formulate your approach

Similar to the previous approach, we create an object to store numbers and their respective indices as we iterate through the array. Similarities don’t end there. We also have to calculate the difference between a specific number and target value.

We must also check if the difference between the current number and target value is stored in the object. If it’s there, it means that we found the pair that adds up to the target value, and we can return indices of the current number and the complementary number.

If you can’t find a pair, then we add the current number and its index to the hashmap object and continue checking numbers in the array.

The main difference is that in this approach, we accomplish all of this in one for loop instead of two.

You must remember to return indices, not the numbers themselves. This adds an additional layer of complexity to the task.

Code Execution

This function takes the same arguments as the first and second algorithms discussed above.

Inside the function, we create an object to store values and their indices as key-property pairs.

class Solution:
    def twoSum(self, nums: List[int], target: int) -> List[int]:
        hashmap = {}
        for i in range(len(nums)):
            difference = target - nums[i]
            if difference in hashmap:
                return [i, hashmap[difference]]
            hashmap[nums[i]] = i

In this solution, we have only one loop. In it, first, we create a difference variable to store the difference between current number and target value.

As we’re checking numbers in the array, we also check if the difference is in the hashmap object. If it isn’t, we store the array item and its index as key-value pairs in the object.

The hashmap object stores previously checked numbers with their indices. If any of the numbers in this object can be summed with the current number to get the target value, we can return the indices of the pair.

In this algorithm, our for loop executes once for every item in the array. Therefore the relationship between number of array items and execution time can be described as O(n).

Algorithms for second and third solutions have O(n) time complexity, which means that with additional data, their execution time increases at a linear pace.

If the three algorithms were applied on one list, the third would be executed the fastest, because it contains only one for loop.

As we loop through the array, we store every number and its index in the hashmap object. This algorithm has a space complexity of O(n).

Comparison of Approaches to Solve this LeetCode Two Sum Question

There are few important differences between the three solutions. The first, and most important is the execution time. Naturally, every algorithm takes longer to execute if it is applied to a higher volume of input data. Still, it’s important to measure the relationship between execution time and volume of data.

In simple words, time complexity describes how the volume of data is correlated with execution time of the algorithm. First algorithm has a time complexity of O(n2), which means that the execution time grows at an exponential pace with the size of input. This is not very efficient for checking large arrays.

However, it can help you wrap your head around the question, further refine your algorithm and improve its efficiency. Still, if the brute force approach should not be your final answer to the question.

Therefore, this LeetCode Two Sum question asks us to find an algorithm that is faster than O(n2) time complexity.

In the second algorithm, we introduce an object to store all the numbers as we iterate through the array. This means that for every item, we will have to execute two operations. One to store the current number (with its respective index) in hashmap object, and another to look for complementary values in hashmap.

This algorithm is designed to execute much faster, mainly because it doesn’t rely on a nested loop. Instead we create an object to store previously checked numbers. This requires memory space, which increases the algorithm's space complexity.

The third algorithm is simply a more efficient version of the second approach. Instead of twice looping through an array, we accomplish the same in one loop. This way our algorithm executes even faster than before. Space complexity remains the same, because we still use hashmap to store the numbers and their indices.

Questions on the LeetCode have a ‘Discuss’ section, where you can look at other people’s solutions and discussions around every approach. Looking at different solutions and learning about their unique features can help you become a better data scientist. After a while, you’ll have enough knowledge to write efficient solutions to any question.

Conclusion

In this article, we explored three different ways to approach one problem. Hopefully, by now it’s clear why it’s so important to digest the question and look at data before writing an algorithm.

Solving this LeetCode Two Sum question can teach you how to write basic algorithms to work with data. It can also introduce you to concepts like time and space complexity.

Explore the StrataScratch platform to further train your mind on real data science interview questions. The blog post “Top 30 Python Interview Questions” is an overview of 30 questions with answers. You can even look at other people’s solutions to understand multiple different ways to approach a problem.

You can also check our posts on concepts like “How To Use LeetCode For Data Science SQL Interviews”, “LeetCode Python Solutions for Data Science” and “LeetCode vs HackerRank vs StrataScratch for Data Science”.