Module 5: Date, Time & Text Functions35 min

String Functions

Progress Tracking

Log in to save this lesson and continue from where you left off.

Log in

Cleaning and Transforming Text

We’ll be honest: most of your time as a data analyst isn’t writing elegant analytical queries. It’s cleaning up someone else’s mess. Names with trailing spaces. Addresses stored as “city, state zip” in a single field. Phone numbers in twelve different formats.

String functions are your cleanup toolkit. Master these, and you’ll spend less time fighting data and more time analyzing it.

Dialect Comparison: String Concatenation

Concatenation syntax:

  • PostgreSQL — || or CONCAT()
  • MySQL — CONCAT()
  • MSSQL — + or CONCAT()
  • Oracle — || or CONCAT()

Behavior with NULL:

  • PostgreSQL — || returns NULL; CONCAT() ignores NULL
  • MySQL — CONCAT('a', NULL) returns NULL; use CONCAT_WS() to skip NULLs
  • MSSQL — 'a' + NULL returns NULL; CONCAT() ignores NULL
  • Oracle — || and CONCAT() both treat NULL as empty string, returning the non-NULL value
NULL handling differs across dialects

Operators (||, +) return NULL when any argument is NULL — except in Oracle. We’ve seen this silently break reports when one field is unexpectedly NULL. Use CONCAT() (which skips NULLs in PostgreSQL/MSSQL/Oracle) or COALESCE() for safer concatenation.

Safe Concatenation with COALESCE

In production, always assume data can be NULL. Here’s the defensive pattern:

SQL
-- Handle potential NULLs (no awkward double spaces)
SELECT
  COALESCE(first_name, '') || ' ' || COALESCE(last_name, '') AS full_name
FROM users;

-- Even better: TRIM the result to handle missing first OR last name
SELECT
  TRIM(
    COALESCE(first_name, '') || ' ' || COALESCE(last_name, '')
  ) AS full_name
FROM users;

Flags per Video

Use CONCAT with COALESCE to safely combine user_firstname and user_lastname, handling NULL values.

Table: user_flags
user_firstnameuser_lastnamevideo_idflag_id
RichardHassony6120QOlsfU0cazx3
MarkMayCt6BUPvE2sM1cn76u
GinaKormandQw4w9WgXcQ1i43zk
MarkMayCt6BUPvE2sM1n0vef
MarkMayjNQXAC9IVRw1sv6ib
1
Flags per Video
View solution

For each video, find how many unique users flagged it. A unique user can be identified using the combination of their first name and last name. Do not consider rows in which there is no flag ID.

Tables: user_flags

SUBSTRING: Extracting Parts of Strings

Extract a portion of a string by position:

2
Extract Initials and Name Length

Extract the first 3 characters of each first_name and show the full name length. Remember: SQL positions are 1-indexed.

Tables: worker
1-indexed, not 0

Position is 1-indexed in SQL (not 0 like Python or JavaScript). The first character is position 1. This trips up many programmers switching to SQL.

Dialect Comparison: Substring

Substring:

  • PostgreSQL — SUBSTRING(s, start, len)
  • MySQL — SUBSTRING(s, start, len)
  • MSSQL — SUBSTRING(s, start, len)
  • Oracle — SUBSTR(s, start, len)

Left N chars:

  • PostgreSQL — LEFT(s, n)
  • MySQL — LEFT(s, n)
  • MSSQL — LEFT(s, n)
  • Oracle — SUBSTR(s, 1, n)

Right N chars:

  • PostgreSQL — RIGHT(s, n)
  • MySQL — RIGHT(s, n)
  • MSSQL — RIGHT(s, n)
  • Oracle — SUBSTR(s, -n)

Case Conversion: UPPER and LOWER

Case inconsistency is a common cause of “why doesn’t my JOIN work?” bugs. Standardize before comparing:

SQL
-- Convert to lowercase for case-insensitive comparison
SELECT *
FROM products
WHERE LOWER(category) = 'electronics';

-- Format for display
SELECT
  UPPER(first_name) || ' ' || UPPER(last_name) AS name_caps
FROM employees;

Good news: UPPER(s) and LOWER(s) work identically across all major dialects.

3
Find the Admin Department

Find all employees in the Admin department using a case-insensitive comparison.

Tables: worker

TRIM: Removing Whitespace

Invisible trailing spaces are insidious. ‘Admin’ and ‘Admin ’ look the same but won’t match. TRIM is your friend:

SQL
-- Remove leading/trailing spaces
SELECT TRIM('  hello world  ');
-- 'hello world'

-- Clean messy data before comparing
SELECT
  TRIM(product_name) AS clean_name
FROM products
WHERE TRIM(product_name) = 'Widget';

Variants: LTRIM and RTRIM

SQL
-- Left trim only
SELECT LTRIM('  hello');
-- 'hello'

-- Right trim only
SELECT RTRIM('hello  ');
-- 'hello'
Suspect whitespace when matches fail

When debugging “this should match but doesn’t” issues, always suspect whitespace. Adding TRIM() to both sides of JOINs on string columns is a good reflex.

Percentage of Shipable Orders

Use TRIM to clean address data before joining, then calculate the percentage.

Table: orders
idcust_idorder_dateorder_detailstotal_order_cost
132019-03-04Coat100
232019-03-01Shoes80
332019-03-07Skirt30
472019-02-01Coat25
572019-03-10Shoes80
Table: customers
idfirst_namelast_namecityaddressphone_number
8JohnJosephSan Francisco928-386-8164
7JillMichaelAustin813-297-0692
4WilliamDanielDenver813-368-1200
5HenryJacksonMiami808-601-7513
13EmmaIsaacMiami808-690-5201
4
Find the percentage of shipable orders
View solution

Find the percentage of shipable orders. Consider an order is shipable if the customer's address is known.

Tables: orders, customers

LENGTH: String Length

Count characters in a string:

SQL
SELECT LENGTH('Hello');
-- 5

-- Find long product names (maybe truncated in reports)
SELECT
  product_name,
  LENGTH(product_name) AS name_length
FROM products
WHERE LENGTH(product_name) > 50;

Dialect Note

String length:

  • PostgreSQL — LENGTH(s)
  • MySQL — LENGTH(s)
  • MSSQL — LEN(s)
  • Oracle — LENGTH(s)

Finding Substrings: POSITION

Find where a substring appears:

SQL
-- POSITION(substring IN string)
SELECT POSITION('@' IN 'user@email.com');
-- 5

-- Find emails with specific domain
SELECT email
FROM users
WHERE POSITION('@gmail.com' IN email) > 0;

Dialect Comparison: Position/Index

Find position:

  • PostgreSQL — POSITION(sub IN s)
  • MySQL — LOCATE(sub, s)
  • MSSQL — CHARINDEX(sub, s)
  • Oracle — INSTR(s, sub)

Extracting Parts with POSITION + SUBSTRING

Combine POSITION and SUBSTRING to extract dynamic portions:

SQL
-- Extract domain from email (everything after @)
SELECT
  email,
  SUBSTRING(email, POSITION('@' IN email) + 1) AS domain
FROM users;

Combining String Functions

Real data cleaning chains multiple functions. Here’s a pattern we use constantly:

5
Build Clean Name and Department

Create a clean_name column (uppercase trimmed first + last name), a clean_dept column (lowercase trimmed department), and sort by name_length descending.

Tables: worker

Key Takeaways

  • Use || or CONCAT() to combine strings (watch NULL behavior — it’s a silent killer)
  • SUBSTRING extracts parts of strings by position (1-indexed, not 0!)
  • UPPER/LOWER standardize case — use before comparing or joining on text
  • TRIM removes leading/trailing whitespace — suspect it whenever matches fail
  • LENGTH returns string length (LEN in MSSQL, and watch trailing spaces)
  • POSITION finds where substrings appear — combine with SUBSTRING for extraction

What’s Next

These core functions handle most string work. But sometimes you need heavier machinery: splitting comma-separated values into rows, pattern matching with regular expressions, and extracting the Nth element from delimited data. That’s next.