Why do we use .tar to make backups?

2025-04-10

I just had a thought about the backup scripts out there. Why do we use .tar.gz or .tar.bz2? They work more or less, but I've found two problems with them:

- if you want to retrieve a single file from it, it's very inefficient since the only way is to sequentially scan the tar file for it
- if you have a 2.2 Linux kernel and the standard filesystem, your backup file will get truncated if it exceeds 2 GB

.tar makes sense when you're backing up to tape. But when you're backing up to disk, wouldn't it be better to make a copy of the entire directory tree, a la "cp -r", and then gzip individual files that can be compressed (e.g. not GIF, JPG, MP3, GZ, BZ2, ZIP, etc.)? This would make it a lot easier to retrieve files from the backup, and also solves the 2 GB file size limit problem.

Does anyone know of an incremental backup script that uses the technique I mentioned, or am I going to have to look into writing my own?