2019-03-10

Photos Library backup using python and launchd

The problem at hand

We use a large, external drive to store the household Photos library (Photos as in Apple Photos.app). The external drive is already backed up using one of the popular cloud backup services. I want to have a more convenient backup in case of catastrophic damage to the drive, which would reduce the time it takes to recover all our photos.

Assess the options and devise a plan

Given that the drive is already backed up to the cloud, we have some freedom. We could just connect another drive and do some kind of back up; we could set up Time Machine to do something (maybe); we could try to move the library to the computer's drive and then use the external drive as a back up; we could occasionally copy the photos to another drive or computer. I happen to have a Synology NAS ready for this type of thing, so I will use that. Instead of relying on my memory or even an automated reminder to go and sit in front of my computer and do periodic backups, I will automate the backup.

Here's the plan: write a script that will do the backup of the Photos library to the NAS, and then schedule that script to run regularly. Easy.


Implementation


1. Enable SSH using keys on Synology

My Synology NAS is obviously under utilized, since I had never bothered to set up SSH keys before. It turns out to be easy, but only because others have already documented the process.

Follow instuctions posted here:
https://forum.synology.com/enu/viewtopic.php?t=126166

Note, at first I did all the steps except setting the home directory permissions. Turns out that post is correct that it is necessary. You need to change the permissions on the home directory or else SSH will still just prompt for the user password even though the keys are present.

Once this step is done, you can ssh into the NAS without entering a password.


2. Initial backup of the Photos library

Again, this road has been taken. For example:

https://kevingoedecke.me/2015/08/30/backup-mac-photos-library-with-rsync-over-ssh/

In that example they use rsync, and I don't see any reason that isn't a great way to go for my purposes. I'm taking most of that rsync command, but I'm removing the "--delete" just in case (I have room, so no need to worry about it).

On the NAS, make a location for the backup:
cd /volume1/some_place
mkdir photos_library_backup

Just as a note for those who haven't looked at this before, it important to realize that the Photos library is called something like
/Volumes/external_drive/Photos\ Library.photoslibrary

I don't know any details about this, but I do know that it isn't a "file", but more like a directory. You can even just cd into it and look around.

The command that I'm using will be something like this:

rsync -Phca --stats --include="/Photos Library.photoslibrary/" --include="/Photos Library.photoslibrary/***" --exclude="*" -e "ssh" "/Volumes/external_drive/" my_name@external_drive.local:/photos_library_backup/

3. Set up automation with launchd

The post from Kevin Goedecke uses a shell (sh) script and crontab, but this is not the Apple way. We should use launchd.

I'm no expert at launchd, so I looked at a bunch of examples. Here are a few:

A pretty nice quick overview:
https://stackoverflow.com/questions/132955/how-do-i-set-a-task-to-run-every-so-often

See also:
https://killtheyak.com/schedule-jobs-launchd/

And:
http://www.launchd.info

The bottom line is that you have to choose if you want the job to run only when you are logged in, or allow it to run as root. I want it to go ahead and run as root, since the photos library might be updated by other uses when I am not around. To do that, we just need to put a proper "plist" file into /Library/LauchDaemons. A plist file is just an xml file, but we have to follow some conventions which all the above links describe. Since this is a super simple job, my plist file is 20 lines.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
    <dict>
        <key>Label</key>
        <string>local.photo_backup</string>
        <key>ProgramArguments</key>
        <array>
         <string>/anaconda3/bin/python</string>
            <string>/HOME/Code/escape/photo_backup.py</string>
        </array>
        <key>StartCalendarInterval</key>
        <dict> 
            <key>Hour</key> 
            <integer>3</integer> 
            <key>Minute</key> 
            <integer>0</integer> 
        </dict>
    </dict>
</plist>
Essentially, the file says to run a python script at regular intervals. Arbitrarily, I'll just do it every morning at 3am, but it could have been less frequent.

4. Write a script to do the backup


The plist just schedules a job. That job is actually to run a python script. There are so many things that you could do here, but I'm just interested in doing one thing withoug incident: run rsync to backup my whole Photos Library.

There's not going to be anything fancy, but I think this does the job correctly. I will use subprocess.call() to invoke rsync, exactly as I did for my initial backup in step 2. That could have been put into a shell script, or even a python script, using very few lines.

I was slightly worried about what would happen if the external drive were unmounted/removed when the script was run. This could definitely happen; I think all the external drives get unmounted when no one is logged in to the system. So I put in a check, just to see if the path to the external drive seems valid:


from pathlib import Path
if local_loc.exists() and local_loc.is_dir():
    # proceed
else:
    raise IOError("Something is wrong")
I did notice that in python 3.7 there is a path.is_mount() function, but the machine running the script is still on 3.6. After a minute of thinking about it, I decided that it does not really matter if if the location is a mount, what matters is whether it is there, so I went with this version.

The script also defines where to point the rsync command, which is just a string. So the work is basically accomplished by


prc = subprocess.run(["rsync", "-Phca", "--stats", '--include="/Photos Library.photoslibrary/"',
                  '--include="/Photos Library.photoslibrary/***"', '--exclude="*"', '-e', '"ssh"', local_loc, remot_plc],
                  stdout=subprocess.PIPE,
                     stderr=subprocess.STDOUT,)
where local_loc is the path to the library on the external drive and remot_plc is the path on the NAS. The stdout and stderr keyword arguments put the output of rsync into one attribute of prc.

I used that stdout/stderr in order to write a useful log file. I am just using the standard library logging module, and setting a log file for each daily run using a simple timestamp:


import logging
from datetime import datetime
# construct the log file name using today's date
logloc = Path("/HOME/logs/")
lognam = datetime.now().strftime('photo_backup_log_%Y%m%d_%H%M.log')
logfil = logloc/lognam
logging.basicConfig(level=logging.INFO, filename=logfil)
The subprocess.run() function uses stdout=subprocess.PIPE to put the stdout of rsync into an attribute of prc called stdout as a bytes object. I have a little function that parses that bytes object into a list of strings split on the newline character:

def log_subprocess_output(p):
 lines = p.stdout.decode("utf-8").split("\n")
 for line in lines:
  logging.info(line)
This is called right after the subprocess.call(); it just puts all the output from rsync into the log file.

That's it. We now have a script that will try to run rsync to backup any changes we have made to the Photos library to a designated place on the NAS. If the external drive isn't available, it will raise an exception and exit. It logs all the steps to a timestamped file. The script is run by a root process at 3am every day. Not bad for 50 lines of python with no dependencies and a 20 line plist file.