|
BigPictureOverview
A summary of what this project can be used for and its goals.
IntroductionAs said on the Introduction page: "Ruckusing is a framework written in PHP5 for generating and managing a set of "database migrations". " But what does this really mean? DetailsThis project was born as a result of wanting to deal with some gripes I have had with the development process (for web applications). One way to do web app development is to use a shared web server and a shared DB. That is, 1 or more developers use the same DB and a shared webserver. This approach has some serious limitations. Namely, it does not allow for data and app segregation. If two developers, John and Luke were using a shared DB then imagine what would happen if during development John deletes all of the data in a given table that Luke was also working on? Yeah, Luke is now hosed and has to stop what he was doing. A shared DB means that there has to be a large amount of coordination between all developers on a project. This approach definitely does not scale. So it raises the question: "how can we move development to the local workstations of developers, but keep everybody in sync (to a degree)?" Two solutions: # Source control, e.g. Subversion, CVS # A mechanism to keep local databases in sync. Item 1 is a wheel that was invented awhile ago. Item 2 is trickier. Which is where "Ruckusing Migrations" comes into play. The goal is to find a way that multiple developers can have the same DB schema for their own development. Schema is the most important item we need to track. Data is important too, and certainly "Ruckusing Migrations" has support for populating a DB with data, but at the end of the day, its the schema was which needs to be consistent. Except for some core data (e.g. list of states in the US or user types, etc), most data will be different on every developer's machines, as they create their own users, add products here, add profile data there, etc. So Ruckusing Migrations is a way to keep the schema up-to-date. Lets take an example approach where we might be able to achieve this goal, without using a tool like Ruckusing Migrations. Whatever the solution is, likely it contains a lot of communication overhead, between all development parties involved. Scenario: We have two developers, Mary and Luke. They have devised a situation where during the source control check-in process, the developer dumps their DB schema (e.g. using the mysqldump utility), which generates a SQL file and then checks this into source control. Every time a developer does a source control update they are to find the highest time-stamped file in the DB dump directory and run it, unless there are no changes so there is nothing to do. Luke is working on a cool new feature and his new feature requires adding a photo_count column to the the user_profiles table. He is about to check this into source control so he quickly does a mysqldump (of just the user_profiles table, for instance) and checks this file in. A little while later Mary comes back from lunch, dutifully does a source control update, pulls down Luke's latest feature set and also notices that there is a new SQL file. Mary then fires up her DB utility and attempts to run the new file. She will immediately get an error: The `user_profiles` table already exists. (in whatever exact syntax the RDBMS spits out these kinds of errors). Which is expected because the SQL file just had Luke's "CREATE TABLE" definition. At this point, we have a number of problems. Lets go over them. How Migrations Solve This MessProblem #1: The mechanism that Mary and Luke uses requires the developer to manually dump whatever DB changes they have done to an SQL file, so they can include it in source control. How migrations solves this: Rather than record the DB changes at the end of the development cycle, move it to the beginning. With Ruckusing Migrations you create the migration file at the beginning of the development process. E.g. "ok I am working on this new feature, I need a photo_count attribute in the DB, so lets create a new migration, add the column so I can then write the supporting code." If I didn't create the migration in the first place, I certainly cannot write the code to do anything with it. This solves the problem of developers forgetting to check in the SQL file. Problem #2: The developer has look at the results of source control update process and see if any DB dump files are included. Some questions they might ask, "what version am I at now?", "Wait, did I already run that one? Those 2 beers at lunch made me kind of forgetful, did I run that SQL file before lunch? I guess I gotta go look at the dump file and then compare it to what's in the DB and see if those changes were executed." How migrations solves this: The DB knows what version its at. Using migrations a developer makes it part of her source control update process to also run the migration update command. The script will take care of comparing what version the DB is at compared to any migration files that might have been pulled down from the source control update. Also, using Ruckusing Migrations you can easily ask the DB what version it is using the command: php main.php db:version (just in case you wanted to know). Problem #3: The SQL files that Mary or Luke generate are full SQL files. That is, they contain the SQL needed to re-generate the schema from scratch. However, Mary and Luke are not developing from scratch after every source control update. They are developing features incrementally. So really we want a system that tracks things incrementally (or as in a diff approach versus the whole hog approach). Sure, when Luke added his photo_count column in the above example, he could have just bypassed the mysqldump utility and just written an ALTER TABLE ADD COLUMN photo_count ... statement into an SQL file and then check that in. And when Mary pulled it out of source control and executed it, she would have been OK. But this leads to other problems. Problem #4: Ideally, we want portability in mind. In a perfect world we should be able to pretty easily move to a different RDBMS. Its the whole point of abstracting your DB communications to a specific adapter. Its the whole purpose of using adapters like Perl's DBI or Java's JDBC or PHPs PDO/MDB2/ADO and the list goes on and on. The moment that Luke coded up that ALTER TABLE ... command in the above example he immediately threw portability out the window. (Sure, a lot of SQL is a standard and should follow the ANSI SQL-92 standard, but this is more fantasy than reality and if you start coding up raw SQL you're going to shoot yourself in the foot sooner than later.) With Ruckusing Migrations, you specify your schema in an abstract language, which is then converted to your DBs exact SQL at execution time. Thus, you could write a series of migrations and run them against MySQL and Oracle and SQL Server and Postgres without changing a single line of code. Problem #5: Barring the above problems, its now time for Mary and Luke to push their new features out to production. So what are they to do? Checkout the SQL dump files from source control and run them against the DB server node by hand? How migrations solves this: By default, Ruckusing Migrations is configured to understand 3 database environments, development, test and production. By specifying a different environment target, the toolkit will execute migrations against that target automatically. Combine this with whatever your existing deployment procedure is (you have an automated deployment procedure, right?) whether it be Apache Ant or Capistrano and deploying your app to production is a trivial process, where both the app code and database are all updated in one fell swoop. Migrations As Their Own "mini" Version ControlBecause every migration represents the change of the DB as it moves through the development process, and its kept in sync with version control, its possible to put your DB to a state back in time. For example, lets say you need to go back and fix a bug in a certain release that existed 2 months ago. You could pull out that specific release from source control and then using the migrations toolkit, revert your local DB to whatever version it was at the time of that release (go DOWN to that version). Now you can be assured that whatever code you are fixing/testing is being done exactly against the version of the DB the code thinks it is. A Developer's Process Using MigrationsThe development process is pretty simple using migrations. # You know you need a new table (and maybe some indexes). Generate a migration to create the table and the indexes. # Run it locally, ok it works. # Code up your features and check everything into source control. # At this point, another developer can come along, check out the latest code, run the migrations script and get their DB in sync. Thats it. |