Data Engineer Things

Things learned in our data engineering journey and ideas on data and engineering.

Follow publication

3 Ways to Handle NULL Values in SQL

A practical guide to handle NULLs like a pro from the source and existing table

Balu Rama Chandra
Data Engineer Things
7 min readSep 16, 2023

Image by Author

About me

Hi there! I’m Balu. I have 7 years of experience working in data space in various roles within telecommunication, online streaming, and agri-tech industry. My area of interests are data & analytics engineering and data ops. I mostly write about technical skills for data, data modeling, data quality, and building robust ETL/ELT pipelines.

Introduction

Whether you are a data engineer, data analyst or data scientist, NULL values are always there to give you a headache. You can hardly avoid it and you need to handle them properly to maintain your data integrity, consistency, and avoid calculation errors.

Talking from experience, I can still remember it clearly when I just started my career in data. I missed handling the NULLs in my SQL query and the business reports that I produced were inaccurate and misled my stakeholders. I did not properly account the NULLs when aggregating the data. The results of aggregate functions like SUM, COUNT, and AVG can be skewed by NULLs. So be wary of that!

With that said, in this article I will share 3 most popular methods in handling NULL values in SQL, so that you can maintain your data integrity and accuracy. For illustration purpose, we will use the table below (named as orders). This table has 7 columns and has some NULLs in it. So how to handle them? Let’s discuss it the next section.

Our orders table

To follow along, you can use sqliteonline and copy & paste below query to create the above table.

CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_name VARCHAR(50),
order_date DATE,
product_name VARCHAR(100),
quantity INT,
discount DECIMAL(10, 2),
total_price DECIMAL(10, 2)
);

INSERT INTO orders (order_id, customer_name, order_date, product_name, quantity, discount, total_price)
VALUES
(1, 'John Doe', '2023-09-15', 'Product A', 3, 1.20, 150.75),
(2, 'Jane Smith', '2023-09-16', 'Product B', 2, 1.01, 75.50),
(3, 'Alice Johnson', '2023-09-17', 'Product C', 5, 0.5, 199.95),
(4, 'Bob Brown', '2023-09-18', 'Product A', 1, NULL, 112.9),
(5, 'Eva Davis', '2023-09-19', 'Product D', 4, 4.23, 299.80),
(6, 'Michael Wilson', '2023-09-20', 'Product B', 3, 0.61, NULL),
(7, NULL, '2023-09-21', 'Product E', 2, 1.41, 129.98),
(8, 'David White', '2023-09-22', 'Product C', NULL, 0.98, 79.98),
(9, 'Linda Johnson', '2023-09-23', 'Product A', 6, 3.87, 301.50),
(10, 'William Smith', '2023-09-23', 'Product D', 3, 0.43, 224.85);

Method 1: Using COALESCE function

This is the most popular method in handling the NULLs. This function accepts a number of expressions and return the first non-null expression. You can add more than 2 expressions in the function. You can think of this expression as the default value that the function will use to replace the NULLs in your data.

COALESCE(expression_1, expression_2, ..., expression_N)

An an example below, the customer_name column in orders table has a missing value (NULL). To handle this, we can use this expression: COALESCE(customer_name, 'John Doe') .

It checks if there is NULL value in each row of the customer_name column. If there is, then it will return or substitute the NULL with 'John Doe'. If there isn’t, it returns the actual customer name.

COALESCE function applied on customer_name column that has NULL

Here is the SQL query that you can try to produce the above result:

SELECT
order_id,
customer_name,
COALESCE(customer_name,'N/A') AS customer_name_coalesce
FROM orders

Method 2: using CASE statement

While CASE statements are usually used to group our data into more meaningful categories, we can also use it to conditionally handle NULL values and provide custom replacements.

CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
...
ELSE default_result
END

Under the CASE , you need to specify your condition in the WHEN statement and your desired action (value) in THEN statement (if the condition returns TRUE). The ELSE statement is optional and it only runs when none of the WHEN conditions return TRUE. Let me give you as example.

In our orders table there is a transaction that has NULL value for discount and suppose based on business knowledge we know that this transaction did not receive any discount, but the system somehow recorded it as NULL instead of 0. So if you want to display or convert the NULL to 0 for your business reporting or presentation you need to write this condition in the WHEN and THEN statement.

SELECT
order_id,
product_name,
discount,
CASE
WHEN discount IS NULL THEN 0
ELSE discount
END AS discount
FROM orders
Converting NULL to 0 with CASE statement

Method 3: using DEFAULT constraints

When designing or creating your tables, you can use the DEFAULT constraint to set a default value to a column. So in case there is a missing information during data ingestion, it will use the default value instead of NULL.

💡 This can be helpful in avoiding NULL values altogether.

Here is an example of how to use a DEFAULT constraint during table creation. In this example we define registration_date column with a DEFAULT constraint that sets the CURRENT DATE as the default value. So if there is a missing registration date during data ingestion, it will default to the current date or today’s date.

CREATE TABLE users (
user_id INT PRIMARY KEY,
user_name VARCHAR(50),
registration_date DATE DEFAULT CURRENT_DATE
);

But what if I already have an existing table? Do I need to drop and create my table from the beginning again? NO! If you already have an existing table you can use the ALTER TABLE statement and add the DEFAULT constraint within it.

ALTER TABLE users
ALTER COLUMN registration_date SET DEFAULT CURRENT_DATE;

But please take note that, this constraint will only affect the future ingested data, not the existing ones. With this method, the new data inserted into the table will automatically has default value even when the value is not explicitly provided by the source system.

Putting it all together

Now let’s use the 3 methods that we just discussed to handle the rest of the NULLs in our orders table.

For illustration purpose, suppose that the data ingested from our source system sometimes may not contain order_date information. Hence, the first thing that we need do is to add a DEFAULT constraint (method-3) on our order_date column. And since we already have an existing table, this can be done using the ALTER TABLE statement. We know that the ingestion is done on a daily basis so all the transactions that have missing order date will be defaulted to the current date of the ingestion. Here is the ALTER TABLE statement with DEFAULT CURRENT_DATE constraint on order_date column.

ALTER TABLE orders
ALTER COLUMN order_date SET DEFAULT CURRENT_DATE;

And now let’s insert a new data with missing order_date information.

INSERT INTO orders (order_id, customer_name, product_name, quantity, discount, total_price)
VALUES
(11, 'John Boll', 'Product A', 3, 1.20, 150.75);

And if you run SELECT * FROM orders you should see a new record with the order_date defaulted to your current date.

Here is how our orders table look like now after a new data inserted

As we can we see from our orders table above, we have 2 more NULLS in our table, each in quantity and total_price column. Let me show you how to handle them using CASE statement and COALESCE function.

SELECT 
order_id,
customer_name,
product_name,
quantity AS quantity_original,
-- Handle the NULL in quantity column with CASE statement
CASE product_name
WHEN 'Product A' THEN 1
WHEN 'Product B' THEN 2
WHEN 'Product C' THEN 3
WHEN 'Product D' THEN 4
ELSE 5
END AS quantity_new,
total_price AS total_price_original,
-- Handle the NULL in total_price column with COALESCE function
COALESCE(total_price, quantity*35 - discount) AS total_price_new
FROM orders
This table show before vs after NULLs are handled for quantity and total_price column

If you have followed along and applied the above methods to our orders table, you should have a table similar to this with zero NULLs in it.

Our final orders table

Conclusion

In summary, while COALESCE is the most popular and flexible function to handle NULL values, but I personally believe that we should handle NULL values during database and table design phase. The decisions about whether a column should allow for NULLs or not can impact how the database behaves and how the queries are structured.

So how do you handle NULL values? Is your approach different that mine? Or you have a better approach? Let me know your thoughts and let’s get connected. Thank you for reading!

Sign up to discover human stories that deepen your understanding of the world.

Responses (5)

Write a response

Great read!

--

very informative

--

Can also use IsNull.
IsNull([Field], ValueIfNull)

--