On 11/04/2011 09:46 AM, Richard W.M. Jones wrote:
On Fri, Nov 04, 2011 at 08:51:08AM -0400, Mo Morsi wrote:
On 11/04/2011 08:36 AM, Richard W.M. Jones wrote:
How large is the snap metadata, ie. the stuff that you copy between the machines? How large would it be given, say, a typical database-backed webserver installation where you might have lots of static contents and some database tables?
One of the nice things that I added to Snap was the ability to ignore static content managed by the package management system. For example when taking a snapshot of the filesystem, only the files modified post installation and the files not tracked by the package system will be backed up and restored.
It should be simple enough to expand upon this concept, adding additional hooks to call out to to determine what exactly should be backed up and restore (hooks to be invoked during the backup / restoration process is already a feature on the project todo list / backlog).
Is the metadata in an ad-hoc format and how hard would it be to turn it into a standard format (probably one that we would standardize ourselves)? Can it be useful in other contexts -- eg. could a system administrator look at the output in order to get a definitive list of the changes made to the machine? Could it be useful for auditing? Could the format be diffed?
Right now the snapshot is a simple tarball containing the actual contents of the snapshot and the metadata in XML files. So for example there is a packages.xml file which contains the packages which have been recorded, services.xml containing the services and associated metadata, etc. We can use this as the basis of the standard, easily encapsulating any required information there.
Can you give us some numbers -- how big was the tarball for the migration you did?
2.7MB
This included the mediawiki db dump which was close to 1MB, and inspecting the snapshot I realize that it can be optimized further to reduce a bit of unnecessary cruft.
If we had a more formal description, then it could be the basis for a useful collection of tools.
snap puppet manifest snap formal --------> spec for --------> sysadmin
libguestfs- VM based tool auditing | ^ | | +------+ p2v, v2v
I think your demonstration only worked with a bit of luck. For v2v we rewrite a lot of configuration files, install virtio drivers etc. In terms of a formalized snap description, that process is a kind of transformation.
Sure thing, the demo was a proof of concept, though quite a bit of refactoring went into making the tool more modular and pluggable so that it can be easily extended and adapted to meet whatever snapshot and restoration needs.
Agree that a formal metadata and api definitions would be very useful to have, added that as a high priority item to the TODO list.
How (if at all) does this apply to Windows?
So yes I have been throwing around different ideas of how to accomplish different aspects of the backup/restoration process on Windows
- The repositories bit obviously does not apply, so that snapshot target can be ignored on those systems.
- Packages would only apply in a very limited manner, eg we can record what software is installed and what versions, but even then it would require more end user interaction / intervention in the restoration process.
- Backing up files should be straightforward, albiet without the support of a package system to determine which files are redundant. Snap already supports the user selection of which files to backup / restore and we can combine this with fixed lists of files to ignore for different windows versions
- Services should be simple enough and will work in the same manner as on Linux, eg for windows we know how to backup a postgres db, a running IIS webserver, or whatever other services.
So all in all, snap will be able to work in a consistent manner across linux distros, Windows, and even Mac OSX!
-Mo