Monday, September 12, 2011

ActiveJDBC cascades deep and shallow

Each ORM adds functionality not found in standard DB access layer, and ActiveJDBC is no exception. Lets say delete with cascade. Method:
model.deleteCascade();
has been in existence for a while, but had somewhat limited functionality. It deleted the model and its immediate children in case One to Many and Polymorphic associations. In case of Many to Many, it was merely clearing links in a join table. The main reason for this was performance. In order to implement a true cascade delete, an ORM must follow all relationships until none left, but unfortunately in the process, it has to load every record instance into memory. This process might allocate huge chunks of memory and generate unexpected number of DELETE statements to the database.

So, initially performance considerations stopped me from implementing a true cascade delete. After all, deleting immediate children is very efficient: clean all of them in one SQL, then delete the parent.

However, once people started using ActiveJDBC, many asked a question: "this deleteCascade() is not really cascading, what the heck?" (well, they are all nice people, but I need to add some drama here keep you reading:)).
In any case, they pointed out inconsistencies of a name and the actual semantics. This prompted me to implement delete cascade true to its name. So, a new version just published to Sonatype will cascade like there is no tomorrow. It will navigate all child and many to many relationships of a model being deleted, find their children, grand children, grand-grand kinds, etc. No one walks out alive, if you know what I mean:)

Implications might be strange at first, but logical if you think about it. Imagine you have a relationship where doctors treat patients and patients visit doctors. In other words, this is a many to many relationship. If you delete a doctor, then all patients associated to that doctor are also deleted. But, what if a patient also visits another doctor? Guess what, that doctor is also deleted (because it is a dependency of a patient being deleted) and so are his/her patients, and so on. So, "deleteCascade()" really knows how to cascade!

But, what about the fast and efficient delete if all I want is to delete a model and immediate children (assuming no grand kids)? For that, there is a new method:

model.deleteCascadeShallow();
which retained the same functionality deleteCascade() had before.

So, deleting models in ActiveJDBC is an easy business, with methods:
delete();
deleteCascade();
deleteCascadeShallow();

For more detailed info, see this Wiki page: http://code.google.com/p/activejdbc/wiki/DeleteCascade

cheers..

2 comments:

Ron Smith said...

Good stuff. It's nice seeing ActiveJDBC getting more advanced features added without adding unnecessary complexity.

Deletes are tricky. I tend to use logical deletes, marking records as deleted with a flag of some sort, then periodically archiving/purging old records that are absolutely no longer used.

Having said that, deletes are needed, and manually removing child records can be a pain, so this is a useful feature.

In your example with a doctor to be deleted (dr druker), a patient of dr druker's, and another doctor who shares the same patient, I'd want dr druker and his patient assignments (doctors_patients) deleted, but not the patient nor the other doctor.

So in other words, I'd want to cascade through some associations, but not others. How to achieve this? I guess I could first null out those associations that shouldn't be followed prior to doing the cascade delete, probably removing the doctors_patients records before calling cascade, but then I'm pretty much back to deleting the children myself.

What if you could define a boundary past which associations shouldn't be deleted.. Taking a page from domain driven design, you could say cascade deletes do not traverse aggregate root boundaries. So if Patient and Doctor are defined as aggregate roots, deleting a doctor would cascade to all of its children (and grandchildren etc) until it encountered another aggregate root. So doctors_patients would get deleted, but not patients since that's a separate aggregate root. And the other doctor would definitely not get deleted. You could define what class is an aggregate root via an annotation on the class. It could be called something other than aggregate root of course, and be modeled differently, but it seems to fit well into ddd concepts.

Igor Polevoy said...

Ron you are making a good point, but implementing something like that will be tricky on one hand, and confusing to users on the other. Currently a method:

deleteCascadeShallow()

will clear links from the join table, but will not go further. I'm trying to balance simplicity of API and useful functionality. In any case, good discussion!

cheers