I use:

Friday, February 16, 2007

Where's that Keyboard?

This is NOT a good reason for server downtime!

You might think that your network administrator might know enough to have all systems plugged into KVMs or at least ensure they have a BIOS that can be configured to ignore keyboard errors. It becomes troublesome when the responsibility is shifted to the people at the off-site datacentre, but that may have been the case. Frankly, that's the risk of allowing the employees of the datacentre touch your equipment, they don't have the same level of ownership that your own staff do.

Either way, a friend of mine expressed great frustration when a high-priority server rebooted due to a fault and didn't come back because someone had left it detatched from it's keyboard.

Thursday, February 08, 2007

Production vs. Non-Production

A friend of mine had a discussion with his server guy this week about a virtual production server that is running on the development VMWare ESX server. While he told me that the server generally performs well on that box, there are procedural, logical, and policy reasons for running a production VM on a production ESX host. It has been in production for over 4 months and it's about time for this to be resolved.

In addition to this issue, the users in Arizona can't see the staging server to test/base approvals upon without being handed a convoluted internal address or IP Address. My friend's desire to simplify this approval testing by introducing a convenient name is hindered by technical issues, and he eventually resigns the fight.

The following is an IM conversation regarding the situations.

Developer: 2 Questions
Developer: When is the AppOne Production server moving?

Server Support: it will move.
Server Support: no rush..

Developer: Is the AppOneStage URL available from Arizona yet?

Server Support: i think, we spoke on this 2nd issue as well. We have work around which is host file.
Server Support: current setup is very weird.
Server Support: they have local work group joined computers.

Developer: Ok... I'd like to be able to have them use stage, easily. Anyone there.

Server Support: they have issues resolving dns names, only work around we have is to use host file, which is not best solution but at this time. nothing we could offter them right away which will solve the issue.
Server Support: regards to moving production AppOne off of Dev LUN, this will happen but as I understand, it's not causing any issues where it's running it from.
Server Support: bigger issue regards to that move is, it can not happen during the day time. last I spoke call center is open till 11pm and I can't keep my self awake till that time.
Server Support: only day we have is Sunday, I will try to juggle some time on Sunday to do this task

Developer: okay I'll stop asking for either of these... but will say that the Production AppOne box SHOULD be on a production server and that's where I'd like to see it as my preference.
Server Support: I totally Understand, but as user and person supporting it, doesn't' really care where it's running it from. Just pretend you never heard it was running on the DEV LUN.
Developer: How does Arizona get it's DNS?

Server Support: unless someone else is keep asking you for status and you are following it with me, then it's fine.
Server Support: but let's face it, there is no real diff.

Wow... There is a difference, in my books.

Friday, February 02, 2007

Service Metrics Are Your IT Report Card

We all know that user surveys can be baked the way we want, picking and choosing the users you include (or invite) in the survey is not uncommon for smaller, less established, helpdesks. What really shows you're worthy of retention is service metrics on the services and resources your department provides to management.

While some technology groups that measure availability can realize over 99% availability for all services, smaller, less responsible groups would be happy to realize 70% if they knew at all how they were doing. Perhaps they don't understand the benefits of high-availability.

Availability is amount of time all critical services are ready to use and functioning properly to the exclusion of planned downtime (within the Change Window). This Change Window can be quite the advantage when considering that while a full week is 7x24, and some services are truly 7x24 services, most applications in a business are NOT 7x24 instead they are 5x12, or 6x10, or perhaps 5x13+1x10. Any of these scenarios provide windows of opportunity for changes and upgrades, if we consider a business running Monday through Friday from 7AM - 10PM, we can safely consider a Change Window of 6 hours a night and 54 hours over a typical 2-day weekend.

Your measure of availability is against the time outside of those windows, in the above example, we have 84 hours of Production Availability for business systems, that shouldn't be too hard to realize on a regular basis, but things do happen. This does not include web servers and Internet-facing systems, and may not include other systems because there are some systems that demand 7x24 functionality. In these scenarios scheduled windows may be permitted and may deserve the use or provisioning for alternate or redundant hardware.

Tracking your availability, the success of your changes, and reliability of the services you provide may be a painful reminder of your challenges, but they also allow you to set goals to improving your services and becoming (proving you are) a world-class technology group.