SQL and Programmers

Regardless what some programmers may think, SQL is still the best language to express operations with sets. And, yes, I want to stress exactly that. Sets!

Not so long ago... well, actually... aherm! Anyway... Once there was a DBA who reviewed our database for performance issues. He spent some days reviewing our schemas, tables, indexes, stored procedures and, obviously futile, query optimization attempts. Before the weeks's end, he called the development team in for a meeting.

After a few introductions and niceties, he started the session with this question: "How many of you are programmers?" Proudly, of course, we all raised our hands. His next words shocked us, hurt our pride and truly brought our feet back where they belong. By all counts, the DBA was roughly 5 or more years younger than all of us in the room. And yet, for some reason, we all just had to agree when he said, "All of you are programmers... and that's exactly the problem."

If you think about a programmer who writes sequences of instructions, creates variables, defines loops, and applies conditional or logical control constructs, then that programmer is not necessarily who you need to be your SQL developer. SQL is not a programming language. It is not a scripting language. It is not a programmer's language. If you have to be reminded, SQL is exactly what its acronym stands for: it's a query language.

SQL and the RDMS's that support it, are designed, implemented and optimized to work with sets. This is the main reason why most DBA and SQL developers would recommend that you avoid looping with cursors. Before anything else, SQL should be used first to operate on sets -- cursors and loops only as the last resort.

The first mistake that most programmers do when using SQL is that they start their tasks thinking like a programmer. If the problem needs to find records where this is this and that is that, programmers tend to translate that quickly in SQL as:

SELECT tableName.this, tableName.that
FROM tableName
WHERE this = @this 
    AND that = @that

Looks harmless enough. However, the problem starts when applying the same line of thinking in such a way that the query would incorporate sub-queries like this:

SELECT tableName1.this, tableName1.that 
FROM tableName1
WHERE this = (
        SELECT this
        FROM tableName2 
        WHERE tableName1.this = tableName2.this
    ) 
    AND that = (
        SELECT that 
        FROM tableName3 
        WHERE tableName1.that = tableName3.that
    )

Of course, the query would work. BUT, when we are talking about finding a record in thousands or millions of records in tableName1, tableName2 and tableName3, the performance suffers. Regardless how well your indexes are setup, and regardless how "smart" your RDMS's query optimizer might attempt to "fix" your query, this is often the worst way to do it.

Thinking like a programmer, this query loops through all records in tableName1 trying to match each record to tableName2 with this and tableName3 with that. Unfortunately, each check against tableName2 is itself an iteration to look for this and each check against tableName3 is also an iteration to look for that. Programmatically, that's like having a loop with two loops in it:

read tableName1
while not tableName1.EOF
    this = tableName1.this
    that = tableName2.this
    t2 = false
    read tableName2
    while not tableName2.EOF
        if this = tableName2.this
            t2 = true
            break
    loop
    t3 = false
    read tableName3
    while not tableName3.EOF
        if that = tableName3.that
            t3 = true
            break
    loop
    if t2 and t3 then
        collect (this, that)
loop

Seeing it this way can make a good programmer cringe. This is slow! And in fact, a big no-no! If this kind of code is something you wouldn't normally do in your programs, then it must also be what you shouldn't do in SQL.

In order to fix non-performant SQL statements with sub-queries as sampled above, the solution is to think of tableName1, tableName2 and tableName3 as sets. Given sets tableName1, tableName2 and tableName3, the requirement is really looking for the records that are in the intersection of tableName1, tableName2 and tableName3. In SQL, the solution can be easily expressed using INNER JOIN.

SELECT tableName1.this, tableName1.that 
FROM tableName1
INNER JOIN tableName2 ON tableName1.this = tableName2.this
INNER JOIN tableName3 ON tableName1.that = tableName3.that

There is no easy programming equivalent, unless perhaps you can load/enumerate through the table records using query-able collections with support for set operations. Modern programming languages may have such type of collections built-in. However, when thinking of thousands and millions of records, performance is usually best by running the query on the database itself.

Performance is key. If you as a programmer optimizes your codes the best way your favorite programming language allows you, know that you also need to understand how SQL allows you to optimize your queries. If you take time to learn your programming language, take the time to really learn SQL as well. It's worth the investment. SQL remains to be a popular "language" in the software development universe -- considering Stack Overflow's insight in 2017 which places SQL at #2. The point of the survey is really how active the technology is. IMHO, learning SQL is perhaps one of the best language every programmer should have in his/her arsenal.

Comments