Each month we run windowsupdate on a windows 2008 cluster running sql server 2008 r2 and sql server 2012. This will do the patching and then reboot the servers, and by default all the instances will end up on one node, whichever came up first. We have configured autofailback for each group, so that the 6 instances will come up on their 'preferred' node to optimise performance. We enable autofailback only during the date/time that windowsupdate will run, then disable it again as we don't want it to kick in during normal operation.
Last month this process failed and all the instances came up on 1 node. The only obvious difference being that a (backup) service failed to start. This happened in a similar way on 4 clusters.
is the reason that autofailback failed to do its magic because it doesn't kick in if the group doesn't come if a service is in a 'failed' or possibly 'failing' state?