Being the IT Guy

I work in a secure office with a small team of researchers. I am a software developer; they are mathematicians, analysts and researchers. We have a secure network with five servers in a secured “cupboard” inside the office. The university IT department hates to manage this infrastructure as it isn’t in their own data centre, nor is it accessible remotely. This is probably a bit unfair really as they do help out if really needed (read: desperate), out of good-will, but nothing ever happens very quickly. So things naturally fall to the nearest guy who has a little bit of knowledge - me!

I’m a developer. I’m the server and network manager. I’m the guy who knows how to upgrade to the latest version of Java in IE. I’m the IT Guy!

I’ve learnt a lot about VMware ESX lately. Three of our servers run ESX, all different versions. While I didn’t care when I first started here (yes, I inherited it) it started being a problem when we had a hardware failure. A big problem. A whole RAID set, gone. It turns out that ESX doesn’t want to hook a datastore up to a volume that is in a degraded state. We didn’t even know that one of the disks had failed (closed network, no notifications). So we found out when we turned the server back on after a power test in the building. After a bit of Googling and even some advice from VMware support we parked the idea of hooking up the degraded RAID volume and replaced the failed disk with another one to let it rebuild. It looked good for the first hour or so. The next time I looked, another disk had failed! Argh! I thought it was all over at that point, but one of the university IT managers thought he could still recover the information. Two new disks, several VMware support calls and a couple of weeks later, I eventually found out I was right; it was over. All gone. Five VMs. With data. Gone.

Comic

One of the things I had tried to instil into my research colleagues was the importance for a good backup and restore solution. They understood and even had a process, but it wasn’t verified enough for my liking as we had found times where the automated backup process was not running at all. This was all before the lost RAID set. So, everyone in my group had backed up all their data. Except for me. Yup, I had being preaching to these guys about making sure they had backups of their data and I lost an entire VM (part of my research) that I had no backup for. The other four VMs could be rebuilt/restored relatively easily but mine had data that I wanted. It was why we spent so long trying to recover the data. Lesson learnt - follow your own advice!

As a developer, I don’t want have to worry about this kind of stuff. It makes my hair fall out - more than usual, anyway. I’m happy to help out individuals with their Java version in IE, or getting a PDF under 5MB so it is small enough for the email attachment restrictions, or anything that takes less than 15 minutes and allows both of us to actually move on with our lives. Maintaining a bunch of old servers feels like a full-time job and I’ve spent the best part of this year so far dealing with them and their issues. I hate it. I don’t want to do it. I don’t want to be the IT Guy. It is not why I took this job. But I am, and I will do it, because the people I work with are awesome and good at what they do. They need to get on with their work too so if I can help them achieve that I will. I just need tell that selfish side of me to get over it. And it works, most of the time.

Written on March 17, 2015