Over the last 14 years, I’ve been consistently working in the tech space and I’ve worked as an individual contributor and a manager with 6 different companies. I started out in software quality assurance engineering, wrote various applications, and later moved into analytics roles. At each of those companies, hardly any analytics teams outside of Software Engineering utilized standard software development tools such Git (code repository), Jira (issue tracking and project management), text editors (making it easier to write code), or code writing practices (ensuring that good clean code is written and that the code is well documented). I’m actually writing this article now and posting it to my website because I’ve actually written the same content for the last 3 companies that I’ve worked for over the last 8 years.
Because of my background and desire to reduce errors and develop stronger teams, I found myself having an immense passion for creating very well run analytics teams that mirror what I’ve learned from my time leading software development teams. While analysts typically don’t function like software developers, it is my belief that they should. As such, I set out to write a few documents to explain how any why analysts should use these standard tools but I realized that I should briefly explain a bit more about how how software development teams function so that we can compare them to analytics teams. Only after being armed with this understanding of the differences as well as what’s possible can we discuss crafting a plan to incorporate these tools and practices into our analytics teams.
In future articles I’ll discuss issue tracking tools (Jira), code repositories (Git), code writing practices, and how all of these items fit together to create a well functioning team that is fast, efficient, accurate, and built to scale and withstand the test of time. But for now, let’s dive into a simplified view of our software development and analytics teams with a few touchpoints on tools and practices.
Before we dive into some of these tools, it’s important to understand how a software development team functions so that we may contract these teams with the typical analytics team. I’d like to start by taking “software” out of the equation and provide an example that almost everyone can relate to; writing a resume or a college paper. To start, you probably have found yourself writing a few sentences and paragraphs, deleting some items and keeping others, copy/pasting text and moving it to different places. You probably find yourself frustrated at times and save the document, close your laptop, and come back to do some more work at another time. And if you’re like me, you’ve probably gone through this iterative process a number of times before ever getting to what you consider your final piece of work. But at the end of your writing journey, you have your output as a single version of a document. While this is a common example, I’d like you to think back to different documents that you’ve written in your life.
During the writing process, have you ever deleted some text (purposely or accidentally) that you later wish that you hadn’t and now you can’t undo the changes? What about having a corrupt file or hard drive crash? If you’re old enough like me, maybe you’ve had a floppy disk that became a worthless coaster after being smashed in your backpack and now all of your work has been lost. Code is just characters on the computer and it just happens to be in a language that computers (and some people) can understand. If you’re not a software developer, just think of this as if someone gave you this article but it has been written in a language that you can’t understand. Software developers go through the exact same situations.
As a software developer writes code, she will begin developing a component of an application or a full application. During that process she will make mistakes, find flaws in her original design, and she’ll redesign and rewrite her code many times until she has what she thinks is a working version. However, the process isn’t always quite that straight forward. During these redesigns and rewrites she might grab a few lines of code much like you might grab a few sentences of a paragraph, and move some of those lines to a different section. Only during this process did she accidentally break her code. Now what does she do?
With all of the changes that she’s been making it’s quite difficult to remember exactly what was moved or changed and there’s been so much doing and undoing that now the code is a complete mess and doesn’t work at all. What’s worse is that she has no way to go back to the original version and no way to compare her current version to the old working version. What was one a fairly complete piece of code is now a non-working version that might just be better being thrown in the trash and for her to start over. But it doesn’t have to be this way.
What if every time she completed a “significant” block of code she could save that point in time as a version? This would allow her to save an unlimited number of versions and make any changes that she wants and they would all be risk-free. With these saved versions, if she happens to go down a rabbit hole and create a bad design or some extremely broken code, she can just revert back to her last known working version and start over from there. And because of these versions, she’d have the ability to compare any two versions against each other and see every single change that was made since the dawn of time. Think about this as an example. Have you ever listed some facts on your resume, maybe an address, phone number, or accomplishment but later removed it? Have you ever wished that you had it back? I know that I have. As I’ve lived at a lot of different addresses I’ve had to update my resume with my new contact information. Now normally you might be asking, “Why does this guy care about the address of a place that he used to live at’’? Well, as I’ve gone through background checks or applied for loans I’ve been asked to provide sometimes a 7-year history of where I’ve lived. I honestly have no idea of each place and no way to easily find that information. Wouldn’t it be amazing if I could just look at each resume version history and quickly see every time that I’ve updated my address? You bet it would!
By now hopefully you understand the importance of having a way to save your files or code as well as the ability to compare different versions of those files. However, we’ve only been talking about a single software developer and that just isn’t very realistic in the real world. In the real world, teams are usually bigger and more complex. Even if you’re one of those few developers that says, “We’re too small a a team to need this” or “It’s too much overhead”, you’re kidding yourself. In the real development world when code is written, someone is (or should be) going to review your code to ensure that it makes logical sense. Many times a quality assurance engineer is going to test your code. Would it make life easier if those folks could look at your code and see exactly all of the changes? Think about this example for a moment.
Let’s assume kids, roommates, significant others, or dogs in your house. Basically anyone in your house that could possibly eat the food in your refrigerator. Now let’s assume that you do all of the grocery shopping and that you’re about to go to the grocery store. Before you leave, you ask the other people in your household, “What did everyone eat last week. I need to know this so that I can be sure to replenish the refrigerator.” From these disclosures, you write down a shopping list and you’re off to the store. But unbeknownst to you, you were out of milk and nobody mentioned that to you. Now you’re eating dry cereal in the morning and you have to make another trip to the store to buy milk. The same thing happens when you’re a developer or quality assurance engineer. If someone is making changes and you solely rely on their disclosure, inevitably they are going to forget to mention something that changed and this change could have a significant impact on what you test and how the code actually functions.
Now, to hammer this point home even more, in the software development world, changes are happening constantly and maybe a change was made 6 months ago and the developer thought that everything was working. The code changed even when through a quality assurance test and passed. But now a customer has recently uncovered an issue and you’re the person that is in charge of fixing the issue. Wouldn’t it be nice to be able to quickly look at all of the version history and say, “I know that this worked 8 months ago and the only change was 6 months ago. The bug is probably right here.”? That’s what a code repository gives you and for these reasons and many others, this is why software development teams rely on a code repository so heavily.
In summary, here’s how the development team functions
- Susan writes some code and saves it to a code repository
- Susan deletes a number of old lines of working code and replaces them with new code, logging all changes
- Tania performs a peer review of Susan's code changes based on what is displayed as changes in the code repository
- Brad performs quality assurance tests on Susan's code based on what is displayed as changes in the code repository
- Code is released to the customer
- Susan leaves the company
- Eric is told that a bug exists somewhere in the code that Susan and he reviews the cod change history to quickly resolve the issue
With your high-level understanding of how the development team functions we can now contrast how analytics teams typically work. While not every analytics team functions exactly as I’m about to describe, I’ve witnessed this situation across every company that I’ve worked at from start-ups to fortune 500 companies.
Here’s how most analytics teams function
- Jamie asks about the performance of some part of the business
- Brian writes some code and saves the file on his laptop, if at all
- Brian summarizes the result set and sends the information to Jamie
- Monthly later, Jamie asks for the report to be re-generated but the code was never in a repository and Brian has left the company
- Erin has to create the code from scratch and the results don't match what Brian originally produced
- Erin spends countless hours trying to determine if Brian's code was ever correct and/or if Erin has a bug in her code
- Analytics team loses credibility with the business
While analytics teams typically have multiple people on the team, the individual team members typically don’t function the same as software developers. Within Analytics, requests typically come to an analyst, a query is written, output is generated and handed back to the requestor. Rarely during these ad-hoc analysis requests does a different analyst perform any sort of code review on the query or output. Due to the lack of code review, some of the strong reasons for utilizing a code repository tend to lessen. But this isn’t to say that a code review shouldn’t be performed and that a repository shouldn’t be used. It’s probably more important for analytics teams than software development teams to perform code reviews. Why? Because with software development teams, many times there are test engineers that will test the code and there is automated test coverage code (depending on the organization) to offer some additional assurance. However, when it comes to the output from code written by an analyst there aren’t any such automated tests or test engineers to double check the accuracy of the work. What’s possibly most dangerous is that as long as the code executes in run-time, the output can frequently be taken as the absolute truth and the downstream cost can be astronomical if business decisions are based on incorrect analysis.
At one company that I recently worked at, an analyst had written a query and pasted the result set into a pivot table. Unfortunately, the data was never meant to be able to be pivoted in a specific way but that didn’t stop others in the organization from using the pivot table. The end result was that many senior leaders were touting that a specific customer acquisition channel should be completely shut down because 75% of the users generated through this channel were fraudulent users. In reality, while the acquisition channel did have quite a few problems, less than 25% of the users were fraudulent. That’s a major difference that was worth hundreds of millions of dollars and many reputations if the wrong decision was made. With such risk, why is it that analytics teams don’t use the standard software development tools and practices that help to protect the business?
Why Analytics Teams Don't Work Like Engineering Teams
I’m not 100% sure why analytics teams don’t function like engineering teams but I do have my opinions as to why. I’ve always speculated that it is because “analysts” typically don’t come from a software development environment. Meaning, they weren’t forced to use these tools in their day to day jobs so they just worked with whatever they have at hand. If we take a look at the software development environment, pretty much ever development shop on the planet is using a code repository to store and track code changes and almost just as many are using an issue tracking system (eg. Git). Whenever any code is written, whether you’re in a 1-person development shop or you have thousands of developers, a smart developer knows that a code repository is a must.
Even if someone happens to be one of the extremely rare developers that doesn’t see the value or using a code repository, their developer peers will almost certainly force that developer to conform to using a code repository and other standard developer tools. It’s either conform or find yourself out of a job and probably out of a career because these tools are just that important to the individual, the team, and the company. But when it comes to analytics teams these tools are almost completely foreign to team members. Analysts typically never had these tools at any place that they’ve worked at so the practices definitely were not forced on the analyst. As such, there is no team-driven accountability.
Also, analysts typically don’t come from an engineering background and as such, they didn’t grow up using these tools or learn to use these tools in college. Many analysts originated from the business side in non-technical areas where they were good at working pivot tables in excel and were capable of writing some sql statements and creating some visualizations in Tableau. Heck, at the last three companies that I’ve worked at, I’ve been employed within analytics teams and have been paid on a non-technical track even though the bulk of my work is a blend of data engineering and analytics. If analysts are considered non-technical employees but software developers are considered technical employees, it’s no wonder that we have a difference in tools and skills.
What this means is that without a deliberate effort of incorporating standard development tools, analytics teams will struggle to: organize work requests, perform effective project management, have adequate knowledge transfer between departing employees, detect potential issues, debug code, reduce duplicate datasets, save on storage space, reduce database load, have proper understanding of how the code works and the purpose of the code, have exception speed to market, and provide accurate information back to the business.
In my next articles, I’ll discuss some of these standard tools in-depth and how they can be used in your analytics environment.