Disk doesn’t mount
A recent Saturday evening my external 4TB SSD for TimeMachine backups and some other data has failed to mount when connected. Strange, but it’d happened once before, and an OS restart had helped then. This time, it didn’t… While I was thinking that my SSD has suddenly started dying (it happens with hardware) and what to do about it, about 10 minutes have passed and a message popped up that said something like there is a problem with the drive, but you can still copy your data, and the volume was mounted read-only!
I tried to repair the volume in Disk Utility, but it would fail after several minutes. It wasn’t clear to me if it’s really a disk failure or a filesystem error.
I could find almost nothing useful online regarding such a situation. Answers to a few similar posts on Apple Developer Forums are useless and don’t give any new information. A bunch of articles just paraphrase the Apple’s official documentation and are useless. Compared to that, StackExchange questions are much better because they are much more technical and the answers try to explore possible alternatives, such as:
- Is there a faster way to copy time machine files from one disk to another? is a good one because there are a few ideas, but ultimately no other solution than using Finder.
- Migrate a Time Machine backup in terminal has some information about using
tar, but no answer.
- Copying Time Machine backup, destination takes more than original size, are hardlinks being expanded? doesn’t have a solution.
- Time Machine size explodes when copied to new drive doesn’t have a solution either.
And a few disk failure questions:
- How to repair a corrupted HFS+ partition from a damaged hard-disk?;
- How to recover HFS+ Partition Catalog (Possible failing drive).
I had an older external HDD with a bunch of files from the current one. So I calculated the hashes of those files on both drives like this:
and found that all of the hashes matched on the two drives. So far so good. (The calculation of 41k hashes of 1.1 TB of files in total on SSD was about a magnitude faster than the same on the HDD, approx. 20 minutes vs 5 hours).
I got a new 2 TB SSD because it’s the biggest I could find. How to copy the backups now?
The official way
https://support.apple.com/en-us/HT202380 is Apple’s official article on transferring TimeMachine backups to another disk. They tell you to format the target disk with the GUID partition table and a journaled HFS+ partition, then use Finder to “drag” the entire
Backups.backupdb directory to the new disk, “then wait for the copy to complete”. Obviously it’s for the case when the target disk is at least as big as the source one. What if it isn’t?
I could calculate the backups size only based on the free space and the total amount of all other files. It turned out to be about 1.7 TB, so the entire backups history should fit on the new 2 TB disk. I started the copying with Finder as recommended. It took hours “to prepare” the copying, and after a few hours of “preparing” the target SSD was noticeably warm.
It copied a lot of files during the day. 24 hours later it was saying “Copying 0 objects to “target” 1,6 TB of 1,6 TB – Approximately 5 seconds left”; four hours later only the size changed: “1,8 TB of 1,8 TB”, still “5 seconds left”. When I checked the progress in the morning, I saw an error complaining that the disk was full. Well, maybe the backups were slightly over 2 TB in the end, but no, there was only about a year of backups (2016–2017), so it was not nearly close to the end at all.
ls -i on the same file in a few of the backups and the inodes were the same indeed! I don’t know why it failed to copy all the data then. It seems that Finder is not able to copy TimeMachine backups, at least on the latest OSX 10.14.6.
Ideally I’d save all or at least most of my TimeMachine backups, so they have to be copied to a new disk, but how?
I could think of these ideas:
- Block-copying, as I described in my earlier post, is preferable, but I couldn’t do it because the target disk was smaller. And I couldn’t shrink the volume because the filesystem was read-only and corrupt.
- Any other program (
cp, etc.) won’t copy hardlinks to directories.
- Copying a subset of the backups? I started copying the latest three backups, and soon Finder estimated the total size to be at least 1 TB — which isn’t true, there should be about 600 GB plus some minor changes, so it doesn’t deal with hardlinks in that case.
- When I just created a directory named
Backups.backupdbin the root, I couldn’t copy anything to it — it seemed to be protected by Finder just based on the name. Thus I had to use another name.
- The permissions of the backup directories are insane,
sudo rm -rfdoesn’t work, even after
chflags nouchg. Finder can delete those directories after asking for admin password, duh!
- When I just created a directory named
- Write a program that copies hard-linked directories? That would require a lot of investigation and time.
- Create one logical volume from a 3 TB HDD and 2 TB SSD so that I could block-copy the old disk, but then what? The filesystem would be in exactly the same state, with the same options.
Later I found an old blog post about a hack of disabling the journal on HFS+ by editing the bytes on disk directly and a script to copy TimeMachine backups in Linux, restoring the correct directory hierarchy. Turns out hard links on HFS+ is a very dirty hack.
Is it a dead end? Dropping to the terminal very often gives you much more options than the GUI.
diskutil (which is behind the Disk Utility) actually runs
fsck_hfs when it verifies and repairs an HFS+ volume:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
So I checked the
man page and launched it manually with a few other switches, to rebuild the catalog btree and print more debug information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
What a miracle, it did fix the filesystem! I ran
fsck again and it reported no errors. Then I calculated the same hashes as I’d done initially and all of them matched again, so no data loss there. And finally I ran
fsck with a switch to scan every occupied block to look for I/O read errors:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
A number of command “combinators” here:
noti to display a notification when the command finishes,
time to report how long the command took to execute,
caffeinate so that the Mac doesn’t go to sleep while the command is running.
Overall I’m glad that it was “just” a filesystem corruption and not an SSD failure, and that it recovered fine. I learned some gruesome details about HFS+ and confirmed that it’s often annoying to work with Apple’s proprietary data formats because they provide only one way to do something and that’s not enough — why can only the Finder copy hard-linked directories, and only in some cases? And can it at all?