article thumbnail
Rsync
The Art of Efficient File Synchronization
10 min read
#ostool, #rsync

Want to sync files like a pro? rsync is the Unix utility that's been the gold standard for file synchronization for 25+ years. It's not just "copy files faster" - it's intelligent delta transfers, atomic operations, and network efficiency that make it irreplaceable. By the end of this article, you'll understand why rsync is still the tool of choice for backups, deployments, and migrations, and how to wield it with confidence.

The Reality: rsync Is Everywhere

Every time you deploy code, sync photos, backup servers, or mirror websites - there's a good chance rsync is involved. Docker layers? Inspired by rsync's delta algorithm. Git? Uses similar concepts. Cloud sync tools? Often rsync under the hood. It's the Swiss Army knife of file operations that every sysadmin, DevOps engineer, and developer should master.

The magic: rsync only transfers the differences between source and destination. A 10GB file with a 1KB change? rsync transfers ~1KB, not 10GB.

Your Journey: From Basic Copies to Advanced Sync

We'll cover basic syntax, essential flags, local and remote operations, exclusions, bandwidth limiting, and real-world scenarios. By the end, you'll be syncing like a wizard.


Part 1: The Basics - Your First rsync

Installation:

# Most Linux/Unix systems have it pre-installed
rsync --version

# Install if needed:
sudo apt install rsync      # Debian/Ubuntu
sudo yum install rsync      # CentOS/RHEL
brew install rsync          # macOS

Basic syntax:

rsync [OPTIONS] SOURCE DESTINATION

Simplest example (local copy):

rsync file.txt /backup/
# Copies file.txt to /backup/file.txt

Copy directory:

rsync -r mydir/ /backup/
# -r = recursive (required for directories)

Key concept: Trailing slash matters!

rsync -r mydir/ /backup/
# Copies CONTENTS of mydir into /backup/
# Result: /backup/file1, /backup/file2, etc.

rsync -r mydir /backup/
# Copies mydir ITSELF into /backup/
# Result: /backup/mydir/file1, /backup/mydir/file2, etc.

This is the #1 rsync gotcha. Remember: trailing slash = contents only.


Part 2: Essential Flags (The -avz Trinity)

The most common rsync command:

rsync -avz source/ destination/

What each flag does:

-a (Archive mode - the all-in-one)

rsync -a source/ destination/

Archive mode enables:

Use -a for almost everything. It's "copy exactly as-is."

-v (Verbose)

rsync -av source/ destination/

Shows what's being transferred:

sending incremental file list
./
file1.txt
file2.txt
subdir/
subdir/file3.txt

sent 1,234 bytes  received 56 bytes  2,580.00 bytes/sec
total size is 10,000  speedup is 7.75

-z (Compress)

rsync -avz source/ destination/

Compresses data during transfer (essential for remote transfers):

Don't use -z for already-compressed files (videos, images, archives)!

-P (Progress + Partial)

rsync -avzP source/ destination/

Shows progress bar and keeps partial transfers:

file1.txt
    1,234,567  45%   123.45MB/s    0:00:02

Equivalent to: --progress --partial


Part 3: The --delete Flag (Mirroring)

Problem: rsync by default only adds/updates files. Deleted files in source remain in destination.

Solution: --delete makes destination match source exactly.

rsync -avz --delete source/ destination/

Example:

Source:

source/
├── file1.txt
└── file2.txt

Destination (before):

destination/
├── file1.txt
├── file2.txt
└── old_file.txt  ← Will be deleted

After rsync -avz --delete source/ destination/:

destination/
├── file1.txt
└── file2.txt

Danger: --delete is destructive! Always test with --dry-run first!

Delete variants:

--delete              # Delete during transfer
--delete-before       # Delete before transfer starts (safer)
--delete-after        # Delete after transfer completes (safest)
--delete-excluded     # Also delete excluded files

Part 4: Remote Sync (The rsync Superpower)

rsync uses SSH by default for remote transfers.

Pull from remote:

rsync -avz user@remote:/path/to/source/ /local/destination/

Push to remote:

rsync -avz /local/source/ user@remote:/path/to/destination/

Specify SSH port:

rsync -avz -e "ssh -p 2222" source/ user@remote:/destination/

Use SSH key:

rsync -avz -e "ssh -i ~/.ssh/mykey" source/ user@remote:/destination/

Performance tip - use compression:

rsync -avz source/ user@remote:/destination/
# -z compresses during transfer (huge bandwidth savings)

Part 5: Dry Run (Always Test First!)

ALWAYS use --dry-run before destructive operations!

rsync -avz --delete --dry-run source/ destination/

Shows what WOULD happen without actually doing it:

sending incremental file list
deleting old_file.txt
file1.txt
file2.txt

sent 456 bytes  received 23 bytes  958.00 bytes/sec
total size is 5,000  speedup is 10.44 (DRY RUN)

Workflow:

  1. Run with --dry-run
  2. Review output carefully
  3. Remove --dry-run and run for real
  4. Verify results

Alias for safety:

alias rsync='rsync --dry-run'
# Force yourself to consciously remove --dry-run

Part 6: Exclusions and Inclusions

Exclude specific files/directories:

rsync -avz --exclude 'node_modules' source/ destination/

Multiple exclusions:

rsync -avz \
  --exclude 'node_modules' \
  --exclude '*.log' \
  --exclude '.git' \
  source/ destination/

Exclude from file:

# Create .rsync-exclude
cat > .rsync-exclude << EOF
node_modules/
*.log
.git/
.DS_Store
__pycache__/
*.pyc
.env
EOF

# Use it
rsync -avz --exclude-from='.rsync-exclude' source/ destination/

Include only specific patterns:

# Include only .txt files
rsync -avz --include='*.txt' --exclude='*' source/ destination/

Complex include/exclude (order matters!):

# Include all .txt files, exclude everything in temp/, include everything else
rsync -avz \
  --include='*.txt' \
  --exclude='temp/' \
  --include='*/' \
  --exclude='*' \
  source/ destination/

Pattern matching:

Examples:

--exclude='*.log'           # All .log files
--exclude='temp/'           # Directory named temp
--exclude='/temp/'          # Only top-level temp directory
--exclude='**/*.tmp'        # All .tmp files in any subdirectory

Part 7: Bandwidth and Speed Control

Limit bandwidth:

rsync -avz --bwlimit=1000 source/ user@remote:/destination/
# Limit to 1000 KB/s (1 MB/s)

Why limit bandwidth?

Nice priority (CPU/IO):

nice -n 19 rsync -avz source/ destination/
# Lowest priority - doesn't interfere with other processes

Ionice (IO priority):

ionice -c3 rsync -avz source/ destination/
# Idle IO priority - only uses IO when nothing else needs it

Combine for ultimate "background" operation:

nice -n 19 ionice -c3 rsync -avz --bwlimit=5000 source/ remote:/dest/

Part 8: Advanced Options

--link-dest (Incremental backups with hard links)

Create space-efficient incremental backups:

# First backup
rsync -av source/ /backup/2024-01-01/

# Incremental backup (links unchanged files)
rsync -av --link-dest=/backup/2024-01-01/ source/ /backup/2024-01-02/

Result: Only changed files consume space. Unchanged files are hard-linked.

Full backup script:

#!/bin/bash
DATE=$(date +%Y-%m-%d)
BACKUP_DIR="/backup"
LATEST="$BACKUP_DIR/latest"
NEW="$BACKUP_DIR/$DATE"

if [ -d "$LATEST" ]; then
    rsync -av --link-dest="$LATEST" /source/ "$NEW/"
else
    rsync -av /source/ "$NEW/"
fi

rm -f "$LATEST"
ln -s "$NEW" "$LATEST"

--checksum (Verify by content, not timestamp)

rsync -avc --checksum source/ destination/

Default: rsync compares file size and modification time. With --checksum: Compares actual file content (slower, but more accurate).

Use when:

--ignore-existing (Don't overwrite existing files)

rsync -av --ignore-existing source/ destination/

Only copies files that don't exist in destination. Useful for resuming interrupted transfers.

--update (Don't overwrite newer files)

rsync -avu source/ destination/

Skip files that are newer in destination than source.

--sparse (Handle sparse files efficiently)

rsync -avS source/ destination/

Efficiently handles files with large holes (like VM disk images).

--append (Resume interrupted transfers)

rsync -av --append source/largefile.iso destination/

Resumes transfer of partially copied file (assumes data already transferred is correct).


Part 9: Real-World Scenarios

1. Website Deployment

#!/bin/bash
# Deploy website to production

rsync -avz \
  --delete \
  --exclude '.git' \
  --exclude 'node_modules' \
  --exclude '.env' \
  --exclude '*.log' \
  --checksum \
  /local/website/ \
  user@webserver:/var/www/html/

# Restart web server after sync
ssh user@webserver 'sudo systemctl reload nginx'

2. Automated Backups

#!/bin/bash
# Daily backup script

DATE=$(date +%Y-%m-%d-%H%M)
SOURCE="/important/data"
DEST="/backup/$DATE"
LOG="/var/log/backup-$DATE.log"

rsync -avz \
  --delete \
  --exclude='temp/' \
  --exclude='cache/' \
  --log-file="$LOG" \
  "$SOURCE/" \
  "$DEST/"

# Keep only last 7 days of backups
find /backup -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;

# Email log if errors
if grep -q "error" "$LOG"; then
    mail -s "Backup errors on $(hostname)" admin@example.com < "$LOG"
fi

3. Photo Library Sync

#!/bin/bash
# Sync photos to external drive, preserving everything

rsync -avh \
  --progress \
  --exclude='.DS_Store' \
  --exclude='Thumbs.db' \
  --exclude='*.tmp' \
  ~/Pictures/ \
  /Volumes/BackupDrive/Pictures/

4. Two-Way Sync (Manual)

# Sync local → remote
rsync -avz --delete local/ user@remote:/path/

# Sync remote → local (in reverse)
rsync -avz --delete user@remote:/path/ local/

Warning: Two-way sync needs careful orchestration. Use unison for true bidirectional sync.

5. Database Backup (with compression)

#!/bin/bash
# Backup database with rsync

# Dump database
mysqldump -u root -p mydatabase > /tmp/db-$(date +%F).sql

# Sync to backup server
rsync -avz \
  --remove-source-files \
  /tmp/db-*.sql \
  backup-server:/database-backups/

6. Sync Only Specific File Types

# Sync only videos
rsync -avz \
  --include='*/' \
  --include='*.mp4' \
  --include='*.mkv' \
  --include='*.avi' \
  --exclude='*' \
  source/ destination/

7. Mirror Production to Staging

#!/bin/bash
# Safely copy production to staging

rsync -avz \
  --delete \
  --exclude='logs/' \
  --exclude='cache/' \
  --exclude='sessions/' \
  prod-server:/var/www/app/ \
  staging-server:/var/www/app/

# Update staging config
ssh staging-server 'cd /var/www/app && cp .env.staging .env'

Part 10: Monitoring and Logging

Basic stats:

rsync -avz --stats source/ destination/

# Output:
Number of files: 1,234
Number of files transferred: 45
Total file size: 5.67G bytes
Total transferred file size: 234.56M bytes
Literal data: 234.56M bytes
Matched data: 5.44G bytes
File list size: 32.45K
Total bytes sent: 234.89M
Total bytes received: 1.23K

sent 234.89M bytes  received 1.23K bytes  156.59M bytes/sec
total size is 5.67G  speedup is 24.14

Detailed logging:

rsync -avz \
  --log-file=/var/log/rsync.log \
  --log-file-format='%t %f %b' \
  source/ destination/

Monitor progress of large transfer:

rsync -avzP source/ destination/
# Or
rsync -avz --info=progress2 source/ destination/

Email notification on completion:

rsync -avz source/ destination/ && \
  echo "Backup completed successfully" | \
  mail -s "Rsync Success" admin@example.com

Part 11: Troubleshooting

"Permission denied" errors

# Solution 1: Use sudo on destination
rsync -avz source/ destination/
sudo rsync -avz source/ destination/

# Solution 2: Use --rsync-path for remote sudo
rsync -avz --rsync-path="sudo rsync" source/ user@remote:/root/destination/

Slow transfers

# Disable compression for pre-compressed files
rsync -av source/ destination/  # Remove -z

# Use different compression
rsync -av --compress-level=1 source/ destination/  # Faster, less compression

# Check if bandwidth limited
rsync -avz --no-bwlimit source/ destination/

Files not syncing

# Check what rsync sees as different
rsync -avzni source/ destination/
# -n = dry run, -i = itemize changes

# Output like:
# >f+++++++++ file.txt    ← Will be transferred
# .f...p..... file2.txt   ← Only permissions differ

Itemized output explained:

Connection issues

# Test SSH connection first
ssh user@remote

# Use verbose SSH
rsync -avz -e "ssh -vv" source/ user@remote:/destination/

# Specify different shell
rsync -avz --rsh="ssh -p 2222" source/ user@remote:/destination/

Part 12: Performance Optimization

Parallel rsync with GNU parallel

# Split large sync into parallel jobs
find source/ -maxdepth 1 -type d | \
  parallel -j 4 rsync -avz {} destination/

Use --whole-file for local transfers

# Disable delta algorithm for local transfers (faster)
rsync -av --whole-file source/ destination/

Tune the block size

# Smaller block size (more CPU, less data transfer)
rsync -avz --block-size=2048 source/ destination/

# Larger block size (less CPU, more data transfer)
rsync -avz --block-size=131072 source/ destination/

Skip modification time checks

# Only sync files that differ in size (faster, less accurate)
rsync -av --size-only source/ destination/

Part 13: Security Considerations

Restrict rsync with rrsync (restricted rsync)

# Install rrsync
sudo apt install rsync

# In ~/.ssh/authorized_keys on destination:
command="rrsync /backup/allowed-path",no-agent-forwarding,no-port-forwarding,no-pty,no-user-rc,no-X11-forwarding ssh-rsa AAAA...

This restricts the key to only rsync to specific directory.

Use rsync daemon with authentication

# /etc/rsyncd.conf
[backup]
    path = /backup
    read only = no
    auth users = backupuser
    secrets file = /etc/rsyncd.secrets

# /etc/rsyncd.secrets (chmod 600)
backupuser:secretpassword

# Connect
rsync -avz source/ rsync://backupuser@remote/backup/

Encrypt with SSH keys only (no password)

# Generate key without passphrase for automation
ssh-keygen -t ed25519 -f ~/.ssh/rsync_key -N ""

# Copy to remote
ssh-copy-id -i ~/.ssh/rsync_key user@remote

# Use in rsync
rsync -avz -e "ssh -i ~/.ssh/rsync_key" source/ user@remote:/dest/

Your Path Forward: rsync Mastery

You've gone from basic file copying to advanced synchronization scenarios.

Essential commands to remember:

# Basic sync
rsync -avz source/ destination/

# Sync with delete (mirror)
rsync -avz --delete source/ destination/

# Remote sync
rsync -avz source/ user@remote:/destination/

# Dry run (always test first!)
rsync -avz --delete --dry-run source/ destination/

# Exclude patterns
rsync -avz --exclude='node_modules' --exclude='*.log' source/ dest/

# Incremental backup with hard links
rsync -av --link-dest=/backup/previous/ source/ /backup/current/

# Monitor progress
rsync -avzP source/ destination/

Quick reference flags:

Next steps:

Pro tips:

  1. Always test with --dry-run first
  2. Mind the trailing slash (source/ vs source)
  3. Use --exclude-from file for complex exclusions
  4. Log everything for audit trails
  5. Verify with --checksum for critical data
  6. Use --link-dest for space-efficient backups

rsync isn't just a copy tool - it's a synchronization engine. Master it, and you've mastered one of Unix's most powerful utilities.

Happy syncing!


P.S. - rsync's delta algorithm is brilliant: it checksums file blocks, identifies differences, and only transfers changes. A 10GB file with a 1MB change? rsync transfers ~1MB. This is why it's still unbeatable 25 years later. Read the original paper if you're into algorithms - it's computer science history.

Also: The trailing slash thing will trip you up at least once. We all learn it the hard way. Just remember: slash = contents only, no slash = directory itself.