BrandonSouthern.com

SQL Best Practices

Brandon Southern
21 May 2016

In this post I’m going to share some best practices for formating SQL that I’ve learned and used over the last 20 years. Many of these tips are things that I’ve collected along the way out of frustration with inherting code, needing to perform updates, chasing down bugs, and performing code reviews for others. Also, when it comes to training new employees, I have found that it is much more difficult for people to understand the domain and code when dealing with poorly written SQL.

In each section of this article, I’ll try to focus in on one area of practice. I’ll provide you with an example of a bad practice and example(s) of best practices along with my reasons behind writing code a specific way. As you’re reading you might think, “Well that isn’t as performant of code” and sometimes this might be the case because I’m trying to hone in on a specific point and avoid excess potential for confusion. Also, sometimes I prefer to have slightly less performant code (depending on how often it is used, the purpose of the code, and the performance hit) if it makes the code easier to read and maintain.

I’m guessing that some people reading this might say, “Well my code executes just fine so I don’t care. I understand what it does.” To that (because I’ve heard these comments many times over the years) I would challenge that assertion. It takes significantly more time to find an read sloppy “bad practice” code than it does when working with clean code. These practices will also help to reduce bugs and your code will be appreciated by everyone that reads it.

When we’re talking about programming languages, you can pretty much write statements as you see fit and the code will execute. Obviously there are some exceptions to this (such as indentations in Python) but that’s for another day. What this means is that when I’m talking about SQL best practices, one could make the argument that the this is just a matter of opinion and writing style.

How you write it matters. Think about a book for example. Sure, you could write an entire book without paragraphs, line returns, standard spacing between sentences, and more. Yes, it would still be a book and yes, the reader could probably understand it. However, how easily would it be for you find a specific section if there weren’t paragraphs? How easy would it be to have confidence that you’re actually in the right section if you were asked to edit something? For these reasons any many more, I’m outlining some best practices.

Please note that while all of these practices could be considered “opinions”, they are practices that I’ve used after 20 years of making mistakes, challenges with efficiently reading, editing, and understanding code, and watching my team members face the same challenges. I’ll try my best to tell you about the practice and the logic behind using them. I’ll leave it up to you to consider these practices, see what works for you, and what works for your team.

Formatting

The first thing that I’d like to talk about is formatting. Code should be well formatted and visually appealing, which makes it very easy to read. Having well formatted code we pay off when it comes to debugging, troubleshooting, and modifying your code.

When it comes to formatting, there are a number of things that should be considered such as intentations, alignment, comma positions, and text case. If you made it this far in the reading you’ve probably noticed how well aligned my code is and that most items are found on a single line. This make a tremendous differece in readability.

Below is an example of code that has been written and is quite unreadable. In this example you’ll notice a number of things go against best practices such as:

  • number of items per line
  • alignment issues
  • trailing commas
  • poor or lack of aliasing
  • lack of comments
  • grouping by number instead of name
  • position of aggregate functions in select statement
  • multiple hard to detect bugs

In this article I’ll discuss these practices and more to help you write clean and bug-free code that you can be proud of.

Bad Practices - Many Issues

 
select u.username user_login_name, max(u.last_updated_ts),u.user_state,u.zip_code, u.date_of_birth as dob
 ,cast('1900-01-01' as date format'YYYY-MM-DD')) as previous_invoice_dt, u.last_name, u.user_street , u.user_city,
  u.favorite color,u.user_country, last_updated_ts u.user_created_ts
  , c.configuration_id
  , default_screen
from users u join preferences p 
  on u.user_id = p.user_id and u.country = p.country
  join p on configurations c
  on p.configuration_id = c.configuration_id and (configuration_id > 100 or active=1 and country=1)
group by 1,3,4,5,6,7,8,9,10,11,12,13,14,15

Best Practices

Look at the code below and compare to the code above. Which one is easier to read? Which one provides the best context about what the intent of the code is and what some of the conditional values mean? Which version provides a cleaner UI that will allow you to quickly spot bugs or avoid bugs all together? I’m hoping that you’ve agreed that the code below serves to overcome all of the challenges found in the above code.

The counter argument that I’ve heard people say is, “Well, you had to write more lines of code.” This is irrelevant. The computer doesn’t care and it’s a couple of key strokes to have a line return and spaces or tabs.

In the following sections I will discuss each of these issues and provide examples of the good and bad practices.

 
/*
  Get user information along with the max last_updated date so we can see the latest
  default screen that a customer was using. 

  Limit the results to users that are in the US (country = 1) and scrub out
  configurations that are deemed to be test configurations. 

  Also, we only want active default screens as some screens have been depricated 

*/
select
    u.username as user_login_name
  , u.user_state
  , u.zip_code
  , u.date_of_birth as dob
  , cast('1900-01-01' as date format'YYYY-MM-DD')) as previous_invoice_dt
  , u.last_name
  , u.user_street
  , u.user_city
  , u.favorite color
  , u.user_country
  , u.last_updated_ts
  , u.user_created_ts
  , c.configuration_id
  , p.default_screen
  , max(p.last_updated_ts) as pref_last_updated_ts
from
  users u
    join preferences p 
       on u.user_id = p.user_id
      and u.country = p.country
    
    join configurations c
       on p.configuration_id = c.configuration_id
      and country = 1 -- US country id
      and (
             configuration_id > 100 -- all ids below 100 are known to be test/dev configurations
            or active = 1 -- value of 1 equals true, which means active and not depricated
          )
group by
    u.username
  , user_login_name
  , u.user_state
  , u.zip_code
  , u.date_of_birth as dob
  , previous_invoice_dt
  , u.last_name
  , u.user_street
  , u.user_city
  , u.favorite color
  , u.user_country
  , u.last_updated_ts
  , u.user_created_ts

Alignment

If you look at the above “best practice” code, notice how well everything is aligned to the left. All commas, spaces, and indentations make the code very easy to read.

One Item Per Line

My general rule is one item per line. This could be one element in your select statement or one condition in an join statement or one case statement. Again, look at the above code on formatting to see the difference of readibility when writing with one item per line. The key here is to be consistent. I’ve seen code that is written with one item per line but then every so often there will be a join clause that has an and and an or statment that are on the same line. This can be immensely frustrating when reading and debugging because it is very easy to overlook the addition condition because it was written to the same line.

Bad Practice - Multiple Case Conditions on a sigle line

Here we see a case statement that is all in one line. This is a bad practice because it make the code hard to read and quickly pick up on all of the conditions that are being evaluated. Also, it is really challenging if not impossible to properly comment the code. I know that in the example, ‘main_reporting’ isn’t descriptive and doesn’t appear consistent with the other values but hey, sometimes you’re told to output values this way and can’t logic to others.

 
select
    u.user_country_id
  , case when u.user_country_id = 1 then 'main_reporting' when u.user_country_id = 3 then 'europe_reporting' 
  when u.user_country_id = 'apac_reporting' then 'DE' end as user_reporting_country

Best Practice - Multiple Case Conditions on Multiple Lines

Here we see a case statement that is written on multiple lines with comments to help provide clarity.

 
select
    u.user_country_id
  , case when u.user_country_id = 1 then 'main_reporting' -- id 1 = US. considered "main reporting" per Finance team
         when u.user_country_id = 3 then 'europe_reporting' -- id 3 = UK
         when u.user_country_id = 4 then 'apac_reporting' -- id 4 = China
      end as user_reporting_country

Commenting Code

Please comment your code. I’ll write another post about code comments in the future but comments are important. I feel like I see a post on LinkedIn or some other site on a daily basis where someone says something along the lines of, “You don’t need code comments. The code is a commment. What’s the matter? Don’t you know how to read code?” Seriously, I’ve heard this sort of thing for years. But here’s the reality. While code is a language and if proficient in the language, a reader can understand what the code is doing. But the code never tells the reader why someone wanted to code to function that way. The possibilities are endless as to why someone wanted to code to work a certain way. Sometimes you could be coding around a bug in the back-end data or maybe there is business logic that dictates how the code should function.

While it’s true that you could read the code and possibly look up the documentation on certain tables, it’s that a lot more work than typing a few characters. Below are some examples of good and bad commenting practices.

Bad Commenting - No In-line Comments

Look at the code below. We can see that the code only wants to return results where u.id > 1000. That’s pretty obvious in this very simple example. But the more important question is why did someone do this?

Maybe they are test users prior to u.id = 1000. Or maybe the code is filtering out all users that are from Michigan, because for some reason someone thought that all users less than u.id 1000 are from the state of Michigan. That might sound like a horrible idea to actually have code that would be written that way but it executes all the same. The point here is that as new users, we don’t know and chances are that in six months you probably won’t know either.

select
    u.username
  , u.user_city
  , u.user_state
from
    users u
where
    u.id > 1000

Better Commenting - In-line Comments

Here we have an in-line comment that tells us a bit more about why we added the u.id > 1000 condition. We evidently have test users that should be scrubbed out of the result set.

select
    u.username
  , u.user_city
  , u.user_state
from
    users u
where
    u.id > 1000 -- scrub all users prior to user id were created during initial development. 

Bad Commenting - No Block Comments

Look at the code below. We can see here that the query is going to return users that are considered to be non-test users. The in-line comment helps us to understand that the desire is to scrub out these test users from the result set. But we had to read a few lines of code. You may be saying, well, it isn’t worth a comment block at the top. It’s just 8 lines of code and it’s obvious what is happening. Simpe or not, the reader doesn’t know the why behind this code. But what if the code wasn’t as simple? You’d surely appreciate some comments. For these reasons and to have better planning before you start writing code (more on that later),

 
select
    u.username
  , p.default_screen
  , p.start_date
  , p.end_date
from
    users u
      join preferences p
        on u.user_id = p.user_id
where
      1=1
  and u.id > 1000 -- scrub all users prior to user id were created during initial development. 

Better Commenting - Block Comments

Below is the same code that we just looked at by now we are telling the user why we want to run this code and things to look out for.

 
/* 
  get all of our users along with their default screen preferences. users can
  change defaults over time and you can have many results for one user. later
  we'll use this info to run deeper analysis on frequncy of changes and most 
  utilized screens in the company.  also, we need to scrub out test users
  from the result set.
*/ 
select
    u.username
  , p.default_screen
  , p.start_date
  , p.end_date
from
    users u
      join preferences p
        on u.user_id = p.user_id
where
      1=1
  and u.id > 1000 -- scrub all users prior to user id were created during initial development. 

Common Table Expressions (CTE)

Common table expressions or CTEs are a way of creating an in-memory table of your query results. This table can then be used throughout the rest of your SQL script. The benefit to using a CTE is that you can reduce code duplication, make your code more readable, and increase your ability to perform QA checks on your results.

Also note the really good block comment.

Example of Code Not Using CTE's

In the code below we can see that there are two sub queries that are returning resuls. These two subqueries are then joined together to produce the final result set. While this code will execute, there are a few concerns:

  1. It’s really hard to perform a QA on the sub queries and inspect the results. For example, what if we wanted to run a some counts on the number of users that have multiple records for default screens? We can’t easily just execute some sql against the sub query. We’d have to copy/paste the sub query and then modify it to perform this qa. It would be much better if we could avoid changing code during our QA process.

  2. If we need to utilize this users sub-query elsewhere in our code, we’d have to re-write or copy/paste that block of code to other places in our script. This would not be a DRY (don’t repeat yourself) process and exposes more potential to bugs. How so? Assume for a moment that you’ve used the users subquery in 5 locations in your script. Also assume that the code that your working with is not easy to read because it doesn’t follow best practices. If you are asked to update the code to add another condition to scrub out additional test users, there’s a good chance that you could miss adding this condition to at least one of the 5 uses of the subquery.

  3. More cycles on the database. Each time that the subquery is execute it performs table scans to return results. With our users subquery contining wildcard conditions, the database is going to have a fair amount of work to do. It’s much cheaper (CPU cycles and dollars if you’re using cloud databases) to perform the subquery once, store it in memory and then just re-use the result set as needed in your code.

  4. More complex to read the entire block of code and understand what is being performed and why. While it is possible to scroll through the code, it may be hard to easily comprehend what is happening. Genreally speaking, if you have to vertically scroll your code on your monitor, your code is too way too long and should be refactored to smaller components.

 
/* 
  get all of our users that have activated their accound, where the user
  is from the US. scrub out any users that are thought to be sales demo 
  accounts or QA test accounts. the scrubing of these users insn't perfect 
  and we may have some valid users that actually have a name that matches
  on of our conditions. But we don't think this is a big issue.

  join to the preferences table. however, dont return results prior to
  2018-03-05 becasuse that is the date that we launched our new web app.
  we only care about the selections and performance since the new app.
  also, there are some bogus users that got into the data under ticket
  ENGR-1911 that we need to scrub out.
*/ 
select
    u.username
  , p.default_screen
  , p.start_date
  , p.end_date
from
    (
      select
          u.user_id
        , u.username
      from
        users u
      where
            1=1
        /*
           attempting to scrub out test accounts and demo accounts.
           has the potential to accidentally scrub actual users
           but we didn't find signs of this on 2019-05-05 under 
           ticket ENGR-1950
        */
        and u.username not like '%test%'
        and u.username not like '%demo%'
        and u.activated = TRUE -- only return users that actived their account
        and u.country = 'United States' -- only focus on US users
    ) u
      join ( 
             select
                 p.user_id
               , p.default_screen
               , p.start_date
               , p.end_date
             from
               preferences p
             where
                   1=1
               and p.start_date >= '2018-03-05' -- app UI overhaul release date
               -- had bug (ENGR-1911) where invalid users got into the data and show a null value
               -- scrub out those users
               and p.end_date is not null 
           ) p
        on u.user_id = p.user_id

Example of Using CTEs

Below we see an example of using CTEs. While CTE’s can be great and help to overcome some of the challenges that we previously pointed out, CTEs typically don’t stay in memory after the final result set has been displayed.

For example, if you were to run this entire block of code, it would execute. But then if you wanted to select all of the results from the users CTE a few minutes later, this data wouldn’t be availble to query. To get around this issue you can use volatile tables or temporary in-memory tables that typically live as long as your session (database connection) remains active. More on this topic later.

 
/* 
  get all of our users that have activated their accound, where the user
  is from the US. scrub out any users that are thought to be sales demo 
  accounts or QA test accounts. the scrubing of these users insn't perfect 
  and we may have some valid users that actually have a name that matches
  on of our conditions. But we don't think this is a big issue.
*/
with users as (
  select
      u.user_id
    , u.username
  from
    users u
  where
        1=1
    /*
       attempting to scrub out test accounts and demo accounts.
       has the potential to accidentally scrub actual users
       but we didn't find signs of this on 2019-05-05 under 
       ticket ENGR-1950
    */
    and u.username not like '%test%'
    and u.username not like '%demo%'
    and u.activated = TRUE -- only return users that actived their account
    and u.country = 'United States' -- only focus on US users
)



/*
  get user preferences but dont return results prior to
  2018-03-05 becasuse that is the date that we launched our new web app.
  we only care about the selections and performance since the new app.
  also, there are some bogus users that got into the data under ticket
  ENGR-1911 that we need to scrub out.
*/ 
, preferences as (
  select
     p.user_id
   , p.default_screen
   , p.start_date
   , p.end_date
 from
   preferences p
 where
       1=1
   and p.start_date >= '2018-03-05' -- app UI overhaul release date
   -- had bug (ENGR-1911) where invalid users got into the data and show a null value
   -- scrub out those users
   and p.end_date is not null 
)

/*
  generate result set of non-test users in the US users and their
  preferences since the new app UI overhaul.
*/
select
    u.username
  , p.default_screen
  , p.start_date
  , p.end_date
from
    users u
      join preferences p
        on u.user_id = p.user_id

Using "select *"

You should never write queries with “select *”. I think the only exception to this rule is if you are trying to inspect a table and in such a case, you should always limit the number of results that are returned. Writing queries this way is a bad idea of many reasons:

  1. Database performance. Returning columns that aren’t needed is more expensive than querying only the columns that you care about.

  2. Challenges debugging. Assuming that you’re using CTEs as described in the previous section, it can be very challening to trace the origins of certain attributes.

  3. Tables change. Even if you actually need to select all columns, there’s no guarantee that your table won’t change over time. And as the table changes you’ll be querying new data that was never intended which could possibly break code elsewhere, cause confusion, or impact database performance and costs.

Bad Practice - Using Select *

 
select
    *
from
    users u

Best Practice - Selecting Only Desired Elements

 
select
    u.username
from
    users u

Aliasing

Aliasing is very important to help readers understand where elements reside and what tables are being used. When aliases aren’t used or poor naming convenstions are used, complexity is increased, and the reading/comprehension of code is reduced.

Bad Practice - No Alias Used on Fields

Below you can see that the tables have an alias of ‘u’ and ‘p’ but the selected elements don’t utilize the alias. This can be very frustrating and can cause run-time errors if more than one of the tables contains a field with the same name. Eg. user_id is found in both the ‘users’ table and the ‘perferences’ table.

 
select
    username
  , default_screen
  , start_date
  , end_date
from
    users u
      join preferences p
        on u.user_id = p.user_id

Best Practice - Alias Used on Fields

Below you can see that the tables and selected elements utlize the table alias name. This makes the code very readable for the end user. Even if you only have a single table it is a good practice to use an alias on the table and field name. Good habbits make for good code.

 
select
    u.username
  , p.default_screen
  , p.start_date
  , p.end_date
from
    users u
      join preferences p
        on u.user_id = p.user_id

Bad Practice - Generic Alias on CTE

Below we have a CTE that has been created but the table name that has been assigned is called ‘cte’. This is a very generic name and tells the end-user absolutely nothing about the data in the table. If you were a user reading the select statement that follows the CTE, you wouldn’t have any indication of what table was being used.

 
with cte as (
 select
    u.username
  , p.default_screen
  , p.start_date
  , p.end_date
 from
    users u
      join preferences p
        on u.user_id = p.user_id
)

select
    c.username
from
  cte c

Best Practice - Specific Alias on CTE

Below we have a CTE that has been created with a more descriptive name. The name give the user some indication of what data is contained within the able.

 
with user_default_screens as (
 select
    u.username
  , p.default_screen
  , p.start_date
  , p.end_date
 from
    users u
      join preferences p
        on u.user_id = p.user_id
)

select
    s.username
from
  user_default_screens s

Leading vs Lagging Commas

In select statements I prefer to have a leading comma as opposed to a trailing comma and this is one of those cases where I would say my personal opinion comes into play. I’ve seen a lot of people write their selected elements with trailing comma’s and in other languages it is common practice to have a trailing comma. However, with other languages, it isn’t common to have a large number of arguments past into a function whereas in SQL it’s quite common to have a large number of elements being selected (and declared in the code). While you may think that I’m being overly opinionated about this usage, here’s a couple of reasons why I find leading comma’s to be beneficial.

  1. Clean looking UI. When you look at the best practice example, look at how nicely the commas are aligned. It is very easy to see that a comma is missing and avoid a run-time error when compared to using trailing commas

  2. No confusion when working with longer case statements that wrap lines. Looking at the example below, it’s difficult to tell if the end of the line is the end of an element or statement or if it is specifying the end of an argument that is being passed into a function.

To add some additional frustration around this bad practice of trailing commas, BigQuery query formatter actually reformats your code to display everything a trailing comma. :(

Bad Practice - Trailing Comma

In this example we can see that leading commas are not used, making it very hard to spot the missing comma.

 
  select
    u.username as user_login_name,
    u.first_name,
    u.last_name,
    u.user_street,
    u.user_city,
    u.user_state,
    u.zip_code,
    u.date_of_birth as dob,
    u.favorite color
    u.user_country,
    u.last_updated_ts,
    u.user_created_ts
from
    users u

Best Practice - Leading Comma

In this example we can see that all of the commas are aligned, making it easy to ensure that no comma is ever missing.

 
  select
     u.username as user_login_name
   , u.first_name
   , u.last_name
   , u.user_street
   , u.user_city
   , u.user_state
   , u.zip_code
   , u.date_of_birth as dob
   , u.favorite color
   , u.user_country
   , u.last_updated_ts
   , u.user_created_ts
from
    users u

Bad Practice - Trailing Comma Confusion

In this example we can see that leading commas are not used. We have a line break that ends with a comma so it is vary hard to tell if this ‘max’ line is really one single statement or if it is part of a longer statement. One could argue that you shouldn’t use line breaks like this in your code and while I support line breaks in the right places (because it makes code easier to read), you would still encounter the same issue when looking at your word-wrapped code in your editor or in a Git diff comparison.

 
  select
    u.username as user_login_name,
    max(u.last_updated_ts) over(partition by u.username order by u.last_updated_ts desc asc), 
    cast('1900-01-01' as date format'YYYY-MM-DD')) as previous_invoice_dt,
    u.last_name
    u.user_street,
    u.user_city,
    u.user_state,
    u.zip_code,
    u.date_of_birth as dob,
    u.favorite color,
    u.user_country,
    u.last_updated_ts,
    u.user_created_ts
from
    users u

Best Practice - Leading Comma with Indentation on Wrapped Text

In the example below it’s easy to see that a comma is missing. You might immediately wonder if someone forgot the comma, but because it’s so easy to read code with leading commas, there’s a higher probabilty that the missing leading comma is by design and there isn’t actually a comma that should be in front of the ‘cast’ statement. Also, by adding indentation to the cast statement, the code appears to have a more obvious intent as to why a comma isn’t necessary.

 
  select
     u.username as user_login_name
   , max(u.last_updated_ts) over(partition by u.username order by u.last_updated_ts desc), 
      cast('1900-01-01' as date format'YYYY-MM-DD')) as previous_invoice_dt
   , u.last_name
   , u.user_street
   , u.user_city
   , u.user_state
   , u.zip_code
   , u.date_of_birth as dob
   , u.favorite color
   , u.user_country
   , u.last_updated_ts
   , u.user_created_ts
from
    users u

Capitalization

In a lot of SQL code in the old days, this was pretty common practice and it might have to do with the fact that SQL has been around for very long time, probably longer than most text editors with syntax highligting. Today, most people are using (or should be using) syntax highlighting in their editors so the capitalization shouldn’t be as necessary to spot the reserve words.

While I have a personal opinion to not use uppercase you may disagree and have a different opinion. Here’s my justification:

  1. I don’t like my code yelling at me. In social context and written communication, using uppercase is synonymous with yelling so I try to avoid writing this way.
  2. The code doesn’t read as fluidly. Psychologic studies have shown that word recognition is easier with lower case words that upper case words. This is because there is more shape variation with lower case words compared to upper case words. Reading speed can decrease by 13-20% when all upper case is used. Also, when you mix upper case and lower case words
  3. It’s extra key strokes to hold down the shift key while I type or lock/unlock the caps key. Not a good argument, I know.

Bad Practice - Uppercase

In this example we can see that reserve words are uppercased and all other words are lowercased.

 
  SELECT
     u.username AS user_login_name
   , MAX(u.last_updated_ts) OVER(PARTITION BY u.username ORDER BY u.last_updated_ts DESC), 
      CAST('1900-01-01' AS DATE FORMAT 'YYYY-MM-DD')) AS previous_invoice_dt
  FROM
    users u
  WHERE
        1=1
    AND u.user_id IS NOT IN (
                                100 -- CEO's account
                              , 200 -- CFO's account
                            ) 

Best Practice - Lowercase

In this example we can see that all of the commas are aligned, making it easy to ensure that no comma is ever missing.

 
  select
     u.username as user_login_name
   , max(u.last_updated_ts) over(partition by u.username order by u.last_updated_ts desc), 
      cast('1900-01-01' as date format 'yyyy-mm-dd')) as previous_invoice_dt
  from
    users u
  where
        1=1
    and u.user_id is not in (
                                100 -- ceo's account
                              , 200 -- cfo's account
                            ) 

Group By - Numbers vs Explicit Fields

I almost always perform my group by with explicit field names as opposed to the position number in the select statement. While this doesn’t have any bering on the results I have found that it saves me time when when it comes to typing and debugging. Normally I’d prefer to just use the numbers because it’s less text on the page but it has caused too many issues and time spent chasing down run-time bugs.

Bad Practice - Group by position number

Here you can see that we are performing a group-by with position numbers based on the selected items. What I don’t like about this practice is that:

  1. If someone puts an aggreate function(s) anywhere but the first or last item in the select statement then you have to skip a position number in the group by. This creates frustrations if you later decide to re-order your selected elements.
  2. You have to count out how many elements you have, minus the aggregated items, and then manually type the position number.
 
select
    u.username as user_login_name
  , max(p.last_updated_ts) as pref_last_updated_ts
  , u.user_state
  , u.zip_code
  , u.date_of_birth as dob
  , u.last_name
  , u.user_street
  , u.user_city
  , u.favorite color
  , u.user_country
  , u.last_updated_ts
  , u.user_created_ts
  , p.default_screen
from
  users u
    join preferences p 
       on u.user_id = p.user_id
group by 
  1,3,4,5,6,7,8,9,10,11,12,13

Best Practice - Group by field name

Here you can see that we are performing a group-by with explicit field names. While it looks like this would be a lot more typing and work versus using position numbers, it is actually faster to type than using numbers. How so? Because all that you have to do is copy what is in the select statement (minus the aggregated field) and paste those values in the group by. When you use numbers you actually have to type the numbers.

 
select
    u.username as user_login_name
  , u.user_state
  , u.zip_code
  , u.date_of_birth as dob
  , u.last_name
  , u.user_street
  , u.user_city
  , u.favorite color
  , u.user_country
  , u.last_updated_ts
  , u.user_created_ts
  , p.default_screen
  , max(p.last_updated_ts) as pref_last_updated_ts
from
  users u
    join preferences p 
       on u.user_id = p.user_id
group by
    u.username
  , user_login_name
  , u.user_state
  , u.zip_code
  , u.date_of_birth as dob
  , u.last_name
  , u.user_street
  , u.user_city
  , u.favorite color
  , u.user_country
  , u.last_updated_ts
  , u.user_created_ts