Amazon S3 Backup

This is the third tier of a backup system, the last resort if everything has been destroyed or corrupted. This script can run on a local machine or elsewhere. I chose to run it locally because the credentials are not on a publicly accessible server. The local machine copies the data from the publicly accessible servers, stores it, then sends it to S3.

The first step is to sign up at Amazon for an S3 account, create a bucket and a user. Limit the privileges for the user as much as possible, for this script, the user needs only the putObject privilege.

The script is written in Ruby. It reads JSON configuration file which contains all the servers, files and databases to be backed up.

JSON file syntax:

{
"email": "user@localhost",
"servers": {
"example.com": {
"login": "username",
"password" : "password",
"databases": [ { "name": "database_name", "dbuser": "user", "dbpass": "password"} ],
"files": ["backup.tgz"] }
},
"s3": {
"bucket": "example.com",
"username": "user",
"accesskeyid": "-- S3 Access Key Id --",
"secretaccesskey": "-- S3 Secret Access Key --"
}
}

Each server can include multiple databases and files. Be sure to limit the privileges for this database user to SELECT and LOCK TABLES, which makes them effectively read only. Be sure to grant remote access to the database for the backup server.

The files are to be placed in a directory where they can be retrieved with wget - in the example above it would be http://example.com/backup.tgz. The intent of these files is that they contain content already publicly available. This is NOT a place to put the application configuration settings.

Each server will have a hierarchy like this:

example.com
|-- initial.tgz
`-- 20140101093022
|-- backup.tgz
`-- database_name.sql.tgz

Create initial.tgz manually - run the tar command at the top of the account, download it to your local machine, then upload it to S3. If you want to get it to S3 from the server, that's fine, just be careful not to ever leave your S3 credentials on the source server.

This is the backup script. It uses wget to get the files (you can use scp, but then you may have a credential issue), and dumps the database.


#!/usr/bin/env ruby

require 'json'
require 'net/smtp'
require 'rubygems'
require 'aws-sdk'

class ItemStatus
	def initialize(item_name, exit_status, ls_file)
		@item_name, @exit_status, @ls_file = item_name, exit_status, ls_file
	end

	def name
		return @item_name
	end

	def error
		return @download_exit_status != 0 
	end
end

json = File.read('config/.json')
parms = JSON.load(json)

if parms["email"].nil? || parms["email"].empty?
	to_email = "user@localhost"
else
	to_email = parms["email"]
end

s3 = AWS::S3.new(
  :access_key_id => parms['s3']['accesskeyid'],
  :secret_access_key => parms['s3']['secretaccesskey']
)

backup_dir = "servers"
bucket = s3.buckets[parms['s3']['bucket']]

backup = Array.new
parms["servers"].each_pair {|server_name, server|
	puts "Server: #{server_name}"
	if !server.empty?
		date = `date "+%Y%m%d%H%M"|tr -d "\n"`
		dir = backup_dir + "/" + server_name + "/" + date
		mkdir = `mkdir -p "#{dir}"`
		if $?.exitstatus === 0
			dir_created = true
			if !server["files"].nil? && !server["files"].empty?
				files = server["files"]
                                if (files.length > 0)
				        if !server["login"].nil? && !server["password"].nil?
						files.each {|file_name|
							dir_file_name = "#{dir}/#{file_name}"
							Net::SSH.start("#{server_name}", "#{server["login"]}", :password => "#{server["password"]}") do |ssh|
								ssh.scp.download! "#{file_name}", "#{dir_file_name}"
							end
							`ls -l "#{dir_file_name}"`
							backup.push(ItemStatus.new("#{file_name}", $?.exitstatus, `ls -l "#{dir_file_name}"`))		
							bucket.objects[dir_file_name].write(Pathname.new(dir_file_name));
						}
					else
						files.each {|file_name|
							dir_file_name = "#{dir}/#{file_name}"
							`wget -q http://"#{server_name}"/"#{file_name}" -O "#{dir_file_name}"`
							backup.push(ItemStatus.new("#{file_name}", $?.exitstatus, `ls -l "#{dir_file_name}"`))		
							bucket.objects[dir_file_name].write(Pathname.new(dir_file_name));
						}
					end
                                end
			end
			if !server["databases"].nil? && !server["databases"].empty?
				databases = server["databases"]
				if (databases.length > 0)
					databases.each {|db|
						dbvalues = db.values_at("name", "dbuser", "dbpass").delete_if {|v| v.nil? || v.empty?}
						if dbvalues.length === 3
							dir_file_name = "#{dir}/#{db["name"]}.sql"
							dump = `mysqldump -C #{db["name"]} -u"#{db["dbuser"]}" -p"#{db["dbpass"]}" -h"#{server_name}" > "#{dir_file_name}"`
							backup.push(ItemStatus.new(db["name"], $?.exitstatus, `ls -l "#{dir_file_name}"`))
							tar_file_name = dir_file_name + ".tgz"
							tar = `tar czf #{tar_file_name} #{dir_file_name}`
							backup.push(ItemStatus.new(tar_file_name, $?.exitstatus, `ls -l "#{tar_file_name}"`))
							bucket.objects[tar_file_name].write(Pathname.new(tar_file_name));
						end
					}
				end
			end
		else
			dir_created = false
		end
	end
	error = backup.select{|item| item.error}
	if error.length == 0
		`find -mindepth 1 -mtime +8 | xargs --no-run-if-empty rm -rf`
	end
	msg = <<END_OF_MESSAGE
To: Me #{to_email}
Subject: #{server_name} backup status

END_OF_MESSAGE

	if !server.empty?
		if dir_created
			msg = msg + "Created #{dir} okay\n\n"
			if backup.length > 0
				msg = msg + "Files\n"
				backup.each {|v|
					msg = msg + "\t" + v.to_s
				}
				msg = msg + "\nColumns\n\t1. Source\n\t2. Exit Status\n\t3. File Information\n"
			end
		else
			msg = msg + "mkdir #{dir} failed"
		end
	else
		msg = msg + "No backup configuration"
	end
	msg = msg + "\n\n\n"
	begin
		Net::SMTP.start('localhost', 25) do |smtp|
			smtp.send_message msg,'amazon@localhost', to_email
		end
	rescue
		puts "Mail send failed"
	end

}

Finally, create a cron job to run the script as needed.

It is assumed that version control for the code is handled elsewhere. This backup is for data, with an emergency copy of the code. If the code is updated, it must be manually updated.

A note about leaving the password in the config file. I understand it is a security issue. That's why this is running on a local machine. Is it completely secure? No. But it isn't on a publicly accessible server either. Could I spend more time making it secure? Absolutely. Am I going to? Probably not.