shell
Rsync Big Files (over 100gb)
Recently I had to transfer some rather huge files (way over 100gb) around, one by one, using regular residential internet. Hence, dropouts were to be expected.
Thankfully, the data storage allowed for rsync to be used. The perfect tool for the job.
I’ve synced all sorts of smaller directories and files using rsync, but never something that big.
3 Painful lessons:
- Use
--partial
. Usually, if the connection drops out, or you quit, rsync will stop the transfer and delete what was already downloaded. Enablingpartial
tells it to keep the file. - Even better:
--inplace
. Rsync usually downloads to temporary files. However, I wanted to resume from where it broke off last time.inplace
makes that happen. Rsync will use the same file to resume. --inplace
implies--partial
. So it’s enough to use inplace.- Without
--append-verify
, rsync will simply duplicate what it already downloaded and start appending to that duplicate. So if you sync a 200gb file, abort at 50gb and resume, it will duplicate the 50gb. So you’d use 100gb locally now and then it resumes copying the 150gb left. Withappend-verify
, it will simply start appending to the initial 50gb and not duplicate anything.
In completeness:
$ rsync --inplace --progress --append-verify remoteuser@remotestorage:thefile.tar.xz .
Another little related thing: I had to use a different SSH port for this.
$ rsync -e "ssh -p2222" --inplace --progress --append-verify remoteuser@remotestorage:thefile.tar.xz .