Dave

Software professional with 25+ years in SaaS product development, coding, consulting, platform transformations, and data migrations.

Contact: hikingdave @ gmail.com

Data > Code

Please indulge me in telling a bit of my personal history with video games to illustrate a point. 

I like video games. I always have. And I spent many hours on them in my youth. But as I got older, they just ate too much time. So I enjoyed it when a game came out called anti-Idle, which had many features that you could just let run, and it played by itself. My little kids and I coined that as a phrase in my family - the "play-by-itself games". This trends continued, and people started to make "idle" games. And I embraced them. All the satisfaction of progressing in a video game, with none of the time sink.

I enjoyed them so much that I'd follow discussions on reddit, and that is where I realized the problem many coders of these games shared. Most of them were beginners, and these games were their projects with which they were learning to code. And their focus was on the code, not the data. So there was a constant theme of, "I upgraded some functions, so your save files no longer work."

Stop right there - those coders have fallen into a trap. They think that their code is the most important part of their project, when it is not. Code is just a tool. People care about their data. Good data is the foundation of useful software. And there is zero reason why a code change should invalidate old data. If your code has undergone massive changes, you may need new data. You may have to generate some default values for new features. But throwing away old data is, frankly, a lazy answer to a solvable problem.

When you are making changes to software, you always have to keep in mind how it will impact data. You need to think of a path forward from the current data structure to the new one, think about what type of migration scropt will be needed to covert them, and when and how to run such a script.

Migration scripts to update data are a constant reality in the professional software world. Deploying new code out to the world is fairly easy these days. But updating all your data is harder, and gets more complex as your data grows and your software scale. You need to get comfortable with data updates when you are practicing on your first projects, so that you are capable of handling it when you have millions or billions of records to update as your work scale up to professional levels.

As usual, I'm not providing answers here, I'm just calling out the pieces of the puzzle. I encourage new coders to put as much thought into their data as their code --  how it is structured, and how to keep it usable.