Wednesday, November 27, 2019

The Cold Sweat of Lost Data



Slow Down!

I suppose this is a cautionary tale, but mostly to myself!


For reasons of simplicity but also austerity in the Marcus household I've been drastically reducing the number of computers and storage devices.

It should save money, and also should we have to move continents, well, there will be less to move.

Black Tuesday (almost literally!)
In my new simple regime I have but 1 laptop and 1 large computer server.

And this Black Friday due to new components I've been simultaneously testing and upgrading both the only computer server  (stores over 40 years of data)  and the NAS system  (stores 40 years of data and backups).

EXCEPT .. that during the NAS rebuild, the NAS would store temporarily no data and the Server is also being rebuilt.  can you notice the possible total and utter disaster scenario ...


 In picture terms, upgrading this card above


Caused this primary Data disk to be totally unavailable.  the ReFs formatted disk was converted to RAW:

And recall that the only on country backup was only partially made to the NAS as that itself is mid way thru a testing and upgrade cycle.

Here are my Lessons Learnt to avoid a potential Heart Attack

Don't Panic Prematurely
This means trying to see the cup half full not completely empty

Phone a Friend
Agata was asked a few technical questions.  Whilst Systems backup and recovery is not her expertise, explaining the 'I may have just lost 40 years worth of data' to a calm person really helps.  You think thru what you have done, and what disaster signals are really there (or not)

Don't Rush to A Solution (slow down)
Really analyse the problem and don't be too quick to make a few esoteric commands or actions that 'might fix everything' but might also 'make things a whole lot worse'.   Really make a recovery plan, do it slowly!

Isolating the Problem
I had to isolate the problem to see which bit had actually failed

- The Windows 2019 upgrade
- The NVMe Disk copy
- The Windows System Backup
- The Windows System Restore
- The Server NVMe hardware upgrade
- The Server backout i.e. downgrade ... failed
- The 10G Network Upgrade

Here are the Actual Problems

- The NVMe backout failed because the original (kept safe) BOOT disk was not securely re-inserted into the ASUS Hypercard.   (Human Error)

- The Windows Backup (conjecture) stored something from the 10TB data disk onto the systems backup disk.  On restore I instructed Windows server to leave the 10TB disk alone.  But it did not!  It's written to it

- I booted from a Windows 2019 server system which could not recognise the ReFs formatted 10TB data disk.  But taking this disk out and attaching it to my Laptop proves the data is still there.  From the laptop examination of the disk it's clear Windows Restore has written data to it  (bug?)




Reset
By reset I mean ... 
I have to undo all Server Upgrades until the NAS system is 100% back online and upgraded.  

Once NAS is upgraded make a Server to NAS backup so that any Server Dataloss can be recovered from with ease.

Work on the Server upgrade or the NAS upgrade, but not both at the same time stupid!

All in all it was a close shave (grommit)