HOW TO: Backup Your Website to Amazon S3 (Automatically)
Some of you might not know this about me, but I am a successful web developer outside of this blog. I currently host and run 9 sites on my Media Temple (dv) server. One of my biggest concerns was how I can keep safe, up-to-date, and secure backups of my website files outside of my home. After turning to my good friend Google, I came across a couple of great articles from two local Atlanta bloggers about this topic. My friend Paul Stamatiou and Christina Warren both have written in depth articles on how they used Amazon S3 to securely backup their websites daily, which is where I learned to do the following.
While Paul and Christina’s guides are great, I wanted to further explore S3sync and give my experience using S3sync to backup my Media Temple (dv) server to Amazon S3. The steps I am going to walk you through are based on S3sync, a open source Ruby application, that will allow you to transfer files to Amazon S3 using secure SSL encryption. Let me start by warning you, that you need to know a bit about the UNIX commands. You will need a application like Terminal for OS X, Linux, or PuTTy for Windows to SSH into your web server. If you don’t know what SSH is, then this tutorial might not be for you. I will be using Terminal which is built into Apple OS X.
**It is very important that you use absolute paths through out this tutorial. If you are not sure what your absolute path is, enter “pwd” in the terminal window after you have logged into you server. For this tutorial, I am going to work directly from the root level.**
Step 1: Install Ruby
I could walk you through a step by step guide on how to install Ruby, but truth be told, I would recommend you check with your web host for specific instructions. For Media Temple, I would use these guides: (dv) 2.0 server or (dv) 3.0 server. The Media Temple Grid server comes with Ruby pre-installed so you can skip this step.
Step 2: Install S3sync
To get started, we need to connect to S3sync’s Amazon S3 and download the S3sync tar-file. Once we have done that, we are going to decompress it and download the SSL certificates for secure transfers.
Start by getting the S3sync tar:
wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
Decompress the tar file:
tar xvzf s3sync.tar.gz
Remove the S3syc tar file, get the SSL certificates, and decompress them:
rm s3sync.tar.gz cd s3sync mkdir certs cd certs wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar sh ssl.certs.shar
Step 3: Setup S3sync
Next, we are going to make the directory that will store the backups before they are transferred to your S3 account. This folder will be inside the S3sync folder.
cd .. mkdir s3backup
Edit the s3config.rb file, this is a step that only needs to be done with newer version of S3sync.
vi s3config.rb
You need to replace the confpath with this:
confpath = ["./", "#{ENV['S3CONF']}", "#{ENV['HOME']}/.s3conf", "/etc/s3conf"]
Now, enter your Amazon S3 account information into the s3config.yml file which we will create in the S3sync directory:
vi s3config.yml
Now that you are in the VI editor, hit the “i” key to enter insert mode then enter the following:
aws_access_key_id:*********************** aws_secret_access_key: *************************** ssl_cert_dir: /s3sync/certs
If you are having problems with this step, look at the s3config.yml.sample file which comes with S3sync. After you have entered your Amazon S3 keys and the absolute path to your certs directory, you need to hit the “escape” key then “:wq” to save and quit the VI editor. If you are not sure how to use the VI editor, check out this resource.
Step 4: Write the Shell Backup Script
Now, you need to change directories so you are in the S3sync directory which is where we will create the backup shell script. I have said it before, but this is very important for the automation part at the end of this tutorial, use absolute paths in this script. For the script example below, I am assuming everything is at the root level.
vi backupscript
Hit the “i” key to enter insert mode.
!/bin/bash echo `date` ": Deleting Previous TAR Files..." > /s3sync/s3backup/backup.log # Empty the backup folder of previous backups. cd /s3sync/s3backup rm -f * cd /s3sync echo `date` ": Beginning Backup Process..." >> /s3sync/s3backup/backup.log # Get the date. NOW=$(date +_%b_%d_%y_%H-%M) echo `date` ": Generating File Backup TAR..." >> /s3sync/s3backup/backup.log # Generate a tar-file of the server contents. tar czvf websites_backup$NOW.tar.gz ********** mv websites_backup$NOW.tar.gz /s3sync/s3backup cd /s3sync/s3backup # Database Backup DBNAME=********** DBPWD=********** DBUSER=********** echo `date` ": Generating SQL Backup TAR..." >> /s3sync/s3backup/backup.log # Generate a tar-file of the SQL database. touch $DBNAME.backup$NOW.sql.gz mysqldump -u $DBUSER -p$DBPWD $DBNAME | gzip -9 > $DBNAME.backup$NOW.sql.gz echo `date` ": Compressing 2 TAR Files Into 1..." >> /s3sync/s3backup/backup.log # Compress all tar-files in to 1 tar czvf server_backup$NOW.tar.gz $DBNAME.backup$NOW.sql.gz websites_backup$NOW.tar.gz echo `date` ": Delete Individual TAR Files..." >> /s3sync/s3backup/backup.log # Remove individual tar-files. rm -f $DBNAME.backup$NOW.sql.gz websites_backup$NOW.tar.gz echo `date` ": Running S3sync Ruby Script..." >> /s3sync/s3backup/backup.log # Transfer tar-file to Amazon S3 BUCKET=********** cd ~/ ruby /s3sync/s3sync.rb -r -v --ssl /s3sync/s3backup/ BUCKET: echo `date` ": Backup Complete..." >> /s3sync/s3backup/backup.log
This script is not very complex but I will walk through it with you a little bit. All of the echo statements in the script are where I output the status of the script for debugging purposes. These statements are not required, but they might help you troubleshoot any problems that might arise.Make sure you replace the directory you want to backup, the SQL database information, and the Amazon S3 Bucket information where you see: **********
The script starts by deleting the contents of the backup folder, which will contain the backups that were generated the last time the script was ran. Next, we will generate the date for labeling the tar-files. After that, it is time to compress the actual web server files into a tar-file. Make sure you replace the “**********” with the absolute path of the directory you would like to backup. Next, we will generate a tar-file with the contents of a SQL database. Again, make sure you replace the “**********” with your specific information. Finally, we are going to compress the 2 tar-files into one and transfer that tar-file to your Amazon S3 account. Make sure you edit the last occurrence of “**********” with the name of the Amazon S3 bucket you wish to save the backups in.
Step 5: Test the Script
./backupscript
When you start the script, you should see a fast scrolling list of files as they backup. When it stops scrolling, the 2 tar-files are being combined and transferred to your Amazon S3. This can take a few minutes, so be patient. If successful, the prompt will reappear in the terminal window. Now, you should see the tar-file in your Amazon S3 bucket.
Step 6: Automate the Script
Now, I am going to walk you through how to edit your crontab file to run the script daily at a time of your choosing. If you don’t know how a crontab works, check out this great crontab resource. The basic format of a crontab entry is:
* * * * * command to be executed - - - - - | | | | | | | | | +----- day of week (0 - 6) (Sunday=0) | | | +------- month (1 - 12) | | +--------- day of month (1 - 31) | +----------- hour (0 - 23) +------------- min (0 - 59)
Start by entering your crontab file:
crontab -e
There may already be some crontab entries in this file, so make sure you do not edit any of the current entries. Scroll down to the last entry and insert a new line. One the new line, enter the new crontab:
0 6 * * * /s3sync/backupscript
This will run your backupscript at 0 minutes into the 6th hour of everyday. To change the time, edit the 0 for minutes or the 6 for hours (use military time.) This works off of your server time, so if your host is in a different timezone, the backup might not occur when you expect it to. Finally, make sure the path to the file is the absolute path.
Final Words
There are many local factors that can effect this script, if the script is not working, I would walk back through the tutorial, and make sure your file tree layout is the way it is suppose to be. Everything we created and edited in this tutorial, belongs inside the /s3sync/ directory.
One of the problems I struggled with, was that the script ran fine manually, but would not transfer the files to Amazon S3 when automated by the crontab. If this problem occurs use the “SET” command inside your script, and directly from the terminal and compare the environment variables (more on “set”.) Any differences you find might need to be annually adjusted in your script. For me it was the “PATH” varible.
I am not a pro with terminal, but I can try to help you troubleshoot any problems you might have when you setup your script. Just drop me a comment below and I will defiantly try my best to help you out.

Discussion In Progress (32)
There are already 32 comments on this post. Why don't you add another, and join the discussion.
Europe users need to add this line:
AWS_CALLING_FORMAT: SUBDOMAIN
to s3config.yml if they get this error:
Permanent redirect received. Try setting AWS_CALLING_FORMAT to SUBDOMAIN
S3 ERROR: #<Net::HTTPMovedPermanently:0xb774a588>
Thanks for adding that note!
I get the following error when running s3sync
/usr/lib/ruby/1.8/openssl/ssl.rb:123:in `post_connection_check': hostname was not match with the server certificate (OpenSSL::SSL::SSLError)
the ca certs are installed and the directory is correct (if i dont have it set correctly i get a different ssl error – not found).
Ive been trawling the net for an answer, but as yet nothing. It works fine without ssl, just not with.
Steve
Did you ever get to the bottom of this? I have the same issue with the same version of ruby.
Same problem here using Ruby 1.8.7, Ubuntu 9.04, certs installed, .yml ok.
Any pointers..?
If you’re running Debian / Ubuntu and so forth, chances are you already have the certs installed in /etc/ssl/certs (and if not, apt-get install ca-certificates should do the trick). You can safely skip the last four lines of Step 2.
If this is the case, in step 3, you can do:
SSL_CERT_DIR=/etc/ssl/certs
instead of:
ssl_cert_dir: /s3sync/certs
Do you know what version of Ruby you have installed on your server?
yes
ruby -v says
ruby 1.8.7 (2008-06-20 patchlevel 22) [i686-linux]
It might have something to do with your ruby settings, I built this using the default ruby settings on version 1.8.6
Does this process create new backups each time, or overwrite the old? Probably cannot do in shared anyway (cpanel v3)?
Recently a client had his site hacked (local trojan nicked his FTP details) and the only backup the webhost had was already infected. Need to ensure that there are old copies on file still, as sometimes you only find out something is wrong a long time (2 weeks in my clients case) after the first attack.
Yes, it generates a new backup every time you run the script. Which I recommend people do daily using a Cron Job.
Another simple way to backup your files at Amazon As3 is to use As3FileSync from http://www.as3soft.com/
also Windows users can use S3 Browser – http://s3browser.com – free Amazon S3 Client for Windows.
nice article – thanks!
I'm on Mediatemple DV as well and think I followed your steps correctly, but I had to add another path to the s3config.rb file, otherwise the environment vars weren't found.
confpath = ["/s3sync/","./", "#{ENV['S3CONF']}”, “#{ENV['HOME']}/.s3conf”, “/etc/s3conf”]
Also if anyone is interested in backing up all databases. try this
DBNAME=ALLDB
DBPWD=yourpwd
DBUSER=yourusr
DBHOST=localhost
echo `date` “: Generating SQL Backup TAR…” >> /s3sync/s3backup/backup.log
# Generate a tar-file of the SQL database.
touch $DBNAME.backup$NOW.sql.gz
mysqldump -h $DBHOST –all-databases –skip-lock-tables -u $DBUSER -p$DBPWD | gzip -9 > $DBNAME.backup$NOW.sql.gz
Now I'm stuck with a 10gig file which won't copy to S3 so if anyone knows how I can break this up into smaller tar files and copy them across to dynicamically create s3 buckets, please let me know.
Thanks
Hey thanks for your comment about the path.. I looked all over google for this, and finally found it right here in the comments!
thanks thanks! this worked for me on mediatemple as well.
Hi Steve,
did you find a way to solve this problem? I faced the same problem.
Best regards,
Thomas
You really do make this sound very easy. I'll give it a try. I'm really hoping it will work because I've had problems with the backup in the past. Thanks for the tutorial.
_______________
Mathew Farney – Web Hosting
Hi there, im getting this error
on ./s3sync.rb –delete –verbose “/backup/www” bucketname:mahalamobilewww -v
S3 command failed:
list_bucket max-keys 200 prefix mahalamobilewww/www delimiter /
With result 403 Forbidden
S3 ERROR: #
./s3sync.rb:290:in `+’: can’t convert nil into Array (TypeError)
from ./s3sync.rb:290:in `s3TreeRecurse’
from ./s3sync.rb:346:in `main’
from ./thread_generator.rb:79:in `call’
from ./thread_generator.rb:79:in `initialize’
from ./thread_generator.rb:76:in `new’
from ./thread_generator.rb:76:in `initialize’
from ./s3sync.rb:267:in `new’
from ./s3sync.rb:267:in `main’
from ./s3sync.rb:735
any ideas, the server is a centos server running on gogrids cloud infrastructure! pls assist
Neil – I was getting the same error. I got it to work by adding $ to BUCKET in the following line…
ruby /s3sync/s3sync.rb -r -v –ssl /s3sync/s3backup/ $BUCKET:
Bah! I’m getting the same 403 error and I just cant fix it! Im sure its the last step as I previously had heaps of problems connecting but have ironed those out.
Thanks for the share. It gave me the design plans I needed to get a similar for Rackspace cloud.
I’m getting a “Permission denied” when I try to run my backupscript on an MT DV3.5 server.
[root@server s3sync]# ./backupscript
-bash: ./backupscript: Permission denied
Does anyone know why that might be?
Nevermind, I guess I needed to CHMOD it 755.
I’m getting this error when running the s3sync script:
./s3config.rb:20: undefined method `each_pair’ for # (NoMethodError)
from ./s3config.rb:17:in `each’
from ./s3config.rb:17
from /s3sync/s3sync.rb:28:in `require’
from /s3sync/s3sync.rb:28
Anyone have similar issues, or any clues? The server is running Ruby 1.8.5.
I have the exact same error
./s3cmd.rb listbuckets
./s3config.rb:20: undefined method `each_pair’ for # (NoMethodError)
from ./s3config.rb:17:in `each’
from ./s3config.rb:17
from ./s3cmd.rb:23:in `require’
from ./s3cmd.rb:23
Solution??? I am also using ruby 1.8.5 it says i need 1.8.4 or greater, are there issues with 1.8.5???
i just spent ages writing the solution for your wordpress install to tell me my comment was a bit spammy and lost it all.. i cant be bothered now.. check the syntax of your YAML files which is the error. also these scripts SHOULD tell you that there is a problem with the YAML syntax instead of throwing all sorts of unrelated errors causing the user to think the software is buggy
Another option to set your S3 Bucket as Static Website, Upload/Download your data in bulk using Bucket Explorer. Supporting for all region end points.
Hey
Im getting this error right at the end where Im attempting to transfer up to s3. All of the MT stuff is OK, but the transfer is throwing this error. Ruby ver 1.8.5 could be the probem?
/s3sync/s3sync.rb:28:in `require’: /s3sync/s3config.rb:27: syntax error (SyntaxError)
from /s3sync/s3sync.rb:28
Thanks!
Scrub that – the problem now is this: undefined local variable or method `u’ for main:Object (NameError)
(i think i fixed an earlier syntax error in the YML file).
Hi, When you say ‘For me it was the “PATH” varible.’ – what do you mean?
Im now at the problem described in your final words…
Im all zipped up and ready to go, and I get this error:
websites_backup_Feb_20_12_01-50.tar.gz (you can see the zip is ready to go)…
You didn’t set up your environment variables; see README.txt
s3sync.rb [options] version 1.2.6
I have done the YAML file exactly as you say, but the readme is very confusing. Please help!
Rick
Hi,
Brilliant article thank you for sharing! For those that are having problems with 403 forbidden message, you need to add a $ in front of BUCKET on the line that starts with ruby. Also check if your server time is synchronised properly. My server time was out by 30 minutes and I came across some docs on AmazonS3 that says that the server clock has to be within 15 minutes of the correct time. Resynced the server time and the script started working.
The only problem I’m getting now is “BROKEN PIPE” every now and again. I’ve not figured this one out yet, but if someone has a solution can you please share?
Thank you,
Chris.
I don’t comment, however I glanced through some remarks on HOW TO: Backup Your Website to Amazon S3 (Automatically). I do have a couple of questions for you if you do not mind. Is it only me or does it look like some of these remarks appear as if they are written by brain dead individuals?
And, if you are posting on other sites, I would like to follow anything fresh you have to post. Would you list of every one of all your public sites like your twitter feed, Facebook page or linkedin profile?
Trackbacks and Pingbacks (0)
Below, is a collection of trackbacks and pingbacks related to this article. For those that don't know, trackbacks and pingbacks are sites that mention and/or link back to this specific article.