HOW TO: Backup Your Website to Amazon S3 (Automatically)
Some of you might not know this about me, but I am a successful web developer outside of this blog. I currently host and run 9 sites on my Media Temple (dv) server. One of my biggest concerns was how I can keep safe, up-to-date, and secure backups of my website files outside of my home. After turning to my good friend Google, I came across a couple of great articles from two local Atlanta bloggers about this topic. My friend Paul Stamatiou and Christina Warren both have written in depth articles on how they used Amazon S3 to securely backup their websites daily, which is where I learned to do the following.
While Paul and Christina’s guides are great, I wanted to further explore S3sync and give my experience using S3sync to backup my Media Temple (dv) server to Amazon S3. The steps I am going to walk you through are based on S3sync, a open source Ruby application, that will allow you to transfer files to Amazon S3 using secure SSL encryption. Let me start by warning you, that you need to know a bit about the UNIX commands. You will need a application like Terminal for OS X, Linux, or PuTTy for Windows to SSH into your web server. If you don’t know what SSH is, then this tutorial might not be for you. I will be using Terminal which is built into Apple OS X.
**It is very important that you use absolute paths through out this tutorial. If you are not sure what your absolute path is, enter “pwd” in the terminal window after you have logged into you server. For this tutorial, I am going to work directly from the root level.**
Step 1: Install Ruby
I could walk you through a step by step guide on how to install Ruby, but truth be told, I would recommend you check with your web host for specific instructions. For Media Temple, I would use these guides: (dv) 2.0 server or (dv) 3.0 server. The Media Temple Grid server comes with Ruby pre-installed so you can skip this step.
Step 2: Install S3sync
To get started, we need to connect to S3sync’s Amazon S3 and download the S3sync tar-file. Once we have done that, we are going to decompress it and download the SSL certificates for secure transfers.
Start by getting the S3sync tar:
wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
Decompress the tar file:
tar xvzf s3sync.tar.gz
Remove the S3syc tar file, get the SSL certificates, and decompress them:
rm s3sync.tar.gz cd s3sync mkdir certs cd certs wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar sh ssl.certs.shar
Step 3: Setup S3sync
Next, we are going to make the directory that will store the backups before they are transferred to your S3 account. This folder will be inside the S3sync folder.
cd .. mkdir s3backup
Edit the s3config.rb file, this is a step that only needs to be done with newer version of S3sync.
vi s3config.rb
You need to replace the confpath with this:
confpath = ["./", "#{ENV['S3CONF']}", "#{ENV['HOME']}/.s3conf", "/etc/s3conf"]
Now, enter your Amazon S3 account information into the s3config.yml file which we will create in the S3sync directory:
vi s3config.yml
Now that you are in the VI editor, hit the “i” key to enter insert mode then enter the following:
aws_access_key_id:*********************** aws_secret_access_key: *************************** ssl_cert_dir: /s3sync/certs
If you are having problems with this step, look at the s3config.yml.sample file which comes with S3sync. After you have entered your Amazon S3 keys and the absolute path to your certs directory, you need to hit the “escape” key then “:wq” to save and quit the VI editor. If you are not sure how to use the VI editor, check out this resource.
Step 4: Write the Shell Backup Script
Now, you need to change directories so you are in the S3sync directory which is where we will create the backup shell script. I have said it before, but this is very important for the automation part at the end of this tutorial, use absolute paths in this script. For the script example below, I am assuming everything is at the root level.
vi backupscript
Hit the “i” key to enter insert mode.
!/bin/bash echo `date` ": Deleting Previous TAR Files..." > /s3sync/s3backup/backup.log # Empty the backup folder of previous backups. cd /s3sync/s3backup rm -f * cd /s3sync echo `date` ": Beginning Backup Process..." >> /s3sync/s3backup/backup.log # Get the date. NOW=$(date +_%b_%d_%y_%H-%M) echo `date` ": Generating File Backup TAR..." >> /s3sync/s3backup/backup.log # Generate a tar-file of the server contents. tar czvf websites_backup$NOW.tar.gz ********** mv websites_backup$NOW.tar.gz /s3sync/s3backup cd /s3sync/s3backup # Database Backup DBNAME=********** DBPWD=********** DBUSER=********** echo `date` ": Generating SQL Backup TAR..." >> /s3sync/s3backup/backup.log # Generate a tar-file of the SQL database. touch $DBNAME.backup$NOW.sql.gz mysqldump -u $DBUSER -p$DBPWD $DBNAME | gzip -9 > $DBNAME.backup$NOW.sql.gz echo `date` ": Compressing 2 TAR Files Into 1..." >> /s3sync/s3backup/backup.log # Compress all tar-files in to 1 tar czvf server_backup$NOW.tar.gz $DBNAME.backup$NOW.sql.gz websites_backup$NOW.tar.gz echo `date` ": Delete Individual TAR Files..." >> /s3sync/s3backup/backup.log # Remove individual tar-files. rm -f $DBNAME.backup$NOW.sql.gz websites_backup$NOW.tar.gz echo `date` ": Running S3sync Ruby Script..." >> /s3sync/s3backup/backup.log # Transfer tar-file to Amazon S3 BUCKET=********** cd ~/ ruby /s3sync/s3sync.rb -r -v --ssl /s3sync/s3backup/ BUCKET: echo `date` ": Backup Complete..." >> /s3sync/s3backup/backup.log
This script is not very complex but I will walk through it with you a little bit. All of the echo statements in the script are where I output the status of the script for debugging purposes. These statements are not required, but they might help you troubleshoot any problems that might arise.Make sure you replace the directory you want to backup, the SQL database information, and the Amazon S3 Bucket information where you see: **********
The script starts by deleting the contents of the backup folder, which will contain the backups that were generated the last time the script was ran. Next, we will generate the date for labeling the tar-files. After that, it is time to compress the actual web server files into a tar-file. Make sure you replace the “**********” with the absolute path of the directory you would like to backup. Next, we will generate a tar-file with the contents of a SQL database. Again, make sure you replace the “**********” with your specific information. Finally, we are going to compress the 2 tar-files into one and transfer that tar-file to your Amazon S3 account. Make sure you edit the last occurrence of “**********” with the name of the Amazon S3 bucket you wish to save the backups in.
Step 5: Test the Script
./backupscript
When you start the script, you should see a fast scrolling list of files as they backup. When it stops scrolling, the 2 tar-files are being combined and transferred to your Amazon S3. This can take a few minutes, so be patient. If successful, the prompt will reappear in the terminal window. Now, you should see the tar-file in your Amazon S3 bucket.
Step 6: Automate the Script
Now, I am going to walk you through how to edit your crontab file to run the script daily at a time of your choosing. If you don’t know how a crontab works, check out this great crontab resource. The basic format of a crontab entry is:
* * * * * command to be executed - - - - - | | | | | | | | | +----- day of week (0 - 6) (Sunday=0) | | | +------- month (1 - 12) | | +--------- day of month (1 - 31) | +----------- hour (0 - 23) +------------- min (0 - 59)
Start by entering your crontab file:
crontab -e
There may already be some crontab entries in this file, so make sure you do not edit any of the current entries. Scroll down to the last entry and insert a new line. One the new line, enter the new crontab:
0 6 * * * /s3sync/backupscript
This will run your backupscript at 0 minutes into the 6th hour of everyday. To change the time, edit the 0 for minutes or the 6 for hours (use military time.) This works off of your server time, so if your host is in a different timezone, the backup might not occur when you expect it to. Finally, make sure the path to the file is the absolute path.
Final Words
There are many local factors that can effect this script, if the script is not working, I would walk back through the tutorial, and make sure your file tree layout is the way it is suppose to be. Everything we created and edited in this tutorial, belongs inside the /s3sync/ directory.
One of the problems I struggled with, was that the script ran fine manually, but would not transfer the files to Amazon S3 when automated by the crontab. If this problem occurs use the “SET” command inside your script, and directly from the terminal and compare the environment variables (more on “set”.) Any differences you find might need to be annually adjusted in your script. For me it was the “PATH” varible.
I am not a pro with terminal, but I can try to help you troubleshoot any problems you might have when you setup your script. Just drop me a comment below and I will defiantly try my best to help you out.


Discussion In Progress (13)
There are already 13 comments on this post. Why don't you add another, and join the discussion.
Europe users need to add this line:
AWS_CALLING_FORMAT: SUBDOMAIN
to s3config.yml if they get this error:
Permanent redirect received. Try setting AWS_CALLING_FORMAT to SUBDOMAIN
S3 ERROR: #<Net::HTTPMovedPermanently:0xb774a588>
Thanks for adding that note!
I get the following error when running s3sync
/usr/lib/ruby/1.8/openssl/ssl.rb:123:in `post_connection_check': hostname was not match with the server certificate (OpenSSL::SSL::SSLError)
the ca certs are installed and the directory is correct (if i dont have it set correctly i get a different ssl error – not found).
Ive been trawling the net for an answer, but as yet nothing. It works fine without ssl, just not with.
Steve
Do you know what version of Ruby you have installed on your server?
yes
ruby -v says
ruby 1.8.7 (2008-06-20 patchlevel 22) [i686-linux]
It might have something to do with your ruby settings, I built this using the default ruby settings on version 1.8.6
Does this process create new backups each time, or overwrite the old? Probably cannot do in shared anyway (cpanel v3)?
Recently a client had his site hacked (local trojan nicked his FTP details) and the only backup the webhost had was already infected. Need to ensure that there are old copies on file still, as sometimes you only find out something is wrong a long time (2 weeks in my clients case) after the first attack.
Yes, it generates a new backup every time you run the script. Which I recommend people do daily using a Cron Job.
Another simple way to backup your files at Amazon As3 is to use As3FileSync from http://www.as3soft.com/
also Windows users can use S3 Browser – http://s3browser.com – free Amazon S3 Client for Windows.
nice article – thanks!
I'm on Mediatemple DV as well and think I followed your steps correctly, but I had to add another path to the s3config.rb file, otherwise the environment vars weren't found.
confpath = ["/s3sync/","./", "#{ENV['S3CONF']}”, “#{ENV['HOME']}/.s3conf”, “/etc/s3conf”]
Also if anyone is interested in backing up all databases. try this
DBNAME=ALLDB
DBPWD=yourpwd
DBUSER=yourusr
DBHOST=localhost
echo `date` “: Generating SQL Backup TAR…” >> /s3sync/s3backup/backup.log
# Generate a tar-file of the SQL database.
touch $DBNAME.backup$NOW.sql.gz
mysqldump -h $DBHOST –all-databases –skip-lock-tables -u $DBUSER -p$DBPWD | gzip -9 > $DBNAME.backup$NOW.sql.gz
Now I'm stuck with a 10gig file which won't copy to S3 so if anyone knows how I can break this up into smaller tar files and copy them across to dynicamically create s3 buckets, please let me know.
Thanks
Hi Steve,
did you find a way to solve this problem? I faced the same problem.
Best regards,
Thomas
You really do make this sound very easy. I'll give it a try. I'm really hoping it will work because I've had problems with the backup in the past. Thanks for the tutorial.
_______________
Mathew Farney – Web Hosting
Hi there, im getting this error
on ./s3sync.rb –delete –verbose “/backup/www” bucketname:mahalamobilewww -v
S3 command failed:
list_bucket max-keys 200 prefix mahalamobilewww/www delimiter /
With result 403 Forbidden
S3 ERROR: #
./s3sync.rb:290:in `+’: can’t convert nil into Array (TypeError)
from ./s3sync.rb:290:in `s3TreeRecurse’
from ./s3sync.rb:346:in `main’
from ./thread_generator.rb:79:in `call’
from ./thread_generator.rb:79:in `initialize’
from ./thread_generator.rb:76:in `new’
from ./thread_generator.rb:76:in `initialize’
from ./s3sync.rb:267:in `new’
from ./s3sync.rb:267:in `main’
from ./s3sync.rb:735
any ideas, the server is a centos server running on gogrids cloud infrastructure! pls assist
Trackbacks and Pingbacks (0)
Below, is a collection of trackbacks and pingbacks related to this article. For those that don't know, trackbacks and pingbacks are sites that mention and/or link back to this specific article.