EXP-SUM-LOG and LOG-SUM-EXP

EXP-SUM-LOG and LOG-SUM-EXP

·

2 min read

Recently had to find the products of all rows for a specific column in SQL. I expected to find an aggregate function similar to SUM, named PRODUCT, but it doesn't exist. Instead there is a trick to take the logarithm of a field, sum those results, and then take the exponent to obtain the product.

EXP-SUM-LOG

SQLFiddle

CREATE TABLE numbers (
    number int
);

INSERT INTO numbers VALUES (0), (2), (3), (5), (7), (11);

SELECT EXP(SUM(LOG(NULLIF(number, 0)))) FROM numbers;

-- Yields: 2310

Note that we ensure that number is null if 0 since taking log of zero is undefined. Logarithms and exponents follow similar mathematical properties. Also note that you can use any base for taking the logarithm as long as it's the same base used for the power. Using the natural logarithm and e (2.178...) base as an example above.

In this case we're making use of the product property (as long as the bases are the same): $$ log{_b}(M) + log{_b}(N) = log{-b}(M*N) $$

Since we're reversing the logarithm by taking the exponent we're getting the product of the original row values.

LOG-SUM-EXP

SQLFiddle It seemed like the opposite of the operation would work also, but taking the exponent even small numbers could yield something too large.

SELECT LOG(SUM(EXP(number))) FROM numbers;

-- Yields: 11.02103056277805

Looks like it doesn't work, but what it outputs is pretty close to the largest number in the table. I tried with some additional values and they're all pretty much approximately the maximum value. It turns out that this is what is known as a Softmax function and it turns out that it's the same as a Boltzmann distribution which I first encountered in a Thermodynamics class.

What is useful with this function is that it ties into probability distribution which makes sense why the output is close to maximum. It turns out that this function can also be used to prevent underflows and overflows. Which is useful in the general computer programming, but specifically in machine learning where a set of values or a vector could contain really small or really large numbers.

Neat stuff...

References

Rabbit-holes