Yesterday I received an often asked question from a close friend of mine and someone who I very much respect. In his words:

My new company has an existing system that I am rewriting. The system
has a database that follows the philosophy of never really deleting a
record - just set a flag marking it deleted. What do you think of this?

In general, I think it is a bad idea. I think it just creates work
for you in other areas. But I could be naive.

I'm using "ruby on rails", which has this guideline of DRY (Don't
Repeat Yourself). Rails puts things that are traditionally in the
database into the object-relational mapping layer, such as rules on
referential integrity. I guess referential integrity is normally in 2
places: application code (business logic) and database meta data.
Anyway, database constraints are rarely used in rails.

The system is a community site, with discussion boards and lots of
other shared content that is contributed by members. Members may
leave and can be deleted or disabled.

I'm tempted to delete members if they should be deleted. I'm tempted
to delete content if it should be deleted. I'm tempted to leave
content from former members unless the content itself is
inappropriate, though I guess content that is missing an associated
member record would be sort of an orphan. (I'm just thinking out loud
here).

Do you have any general recommendations?

I would hate to add to every single sql statement "when (deleted is
null or deleted=0)". And I think there could be other issues, like
unique key collisions with records that are marked 'deleted'.

On a project I created a schema that had sort of a "weak"
relationship in it. This was the goal planning system. The goal table
had a username column. This username column could be used to find the
user in the user table. But if the user was deleted, the goals still
displayed just fine. I guess if a user joined later, reusing a
previously used username they would inherit some existing goals.
Actually this feature was motivation for the schema — I started with
a set of goals from a big spread sheet and I wanted them to exist and
I wanted to be able to add users and them take ownership of goals
already in the system.

Finally, if I delete records, do you recommend saving the deleted
record in another table? Is that a common practice?


Here is my take on it:

Never deleting a record is a practice primarily followed in financial industry and some drug companies vying for FDA approval. Whenever you are dealing with financial or other sensitive data it is not a bad idea to archive it in this way. It is reasonably easy to manage such data. You can periodically move the records marked deleted to one or more secondary databases, which is only queried when you want such data, which is rare. So it is ok to have slower response times when you query deleted records.
This way your primary database remains lean and fast.

Today a third need is emerging in social networking and community sites (for never deleting a record). Often terrorist networks (as in Yahoo groups) or drug dealers (as in Orkut) use such sites to communicate with their geographically distributed members. Archiving their communication data may later help law enforcement authorities in tracking them down. The flipside is that it can be abused by the same authorities to pry into the private lives of normal citizens. But that is a story for another day :)

> I'm tempted to delete members if they should be deleted. I'm tempted
to delete content if it should be deleted. I'm tempted to leave
content from former members unless the content itself is
inappropriate, though I guess content that is missing an associated
member record would be sort of an orphan. (I'm just thinking out loud
here).

I can see additional value you can get from these records, like finding rate of attrition (people leaving the site or being deleted) over time, their content etc. Overall you reduce much of headache over orphan records or referential integrity problems (though I don't think RoR depend on that) down the road. As it is hardware is cheap these days.

So to summarize I think:
a. there are several benefits in keeping deleted records and simply marking them as deleted
b. there isn't much cost or performance overhead associated with such a decision even if your site becomes the next MySpace as you can always periodically move them over to secondary databases.

Most sql queries will simply look for records which doesn't have the deleted field set. For queries involving the deleted records, you will have to first look in the primary database and then look in the secondary deleted records database, if you do decide to use a secondary database.

> like unique key collisions with records that are marked 'deleted'.
That wouldn't be an issue. If you are expecting heavy load then you may want to use long integer for index. Normal unsigned integer will give you only 4GB.

> Finally, if I delete records, do you recommend saving the deleted record in another table? Is that a common practice?

If you are simply marking them as deleted then you can simply periodically move them to a different database. I wouldn't suggest moving them to a different table as that is likely to create more confusion and wouldn't help you as much wrt. performance. It is reasonably common practice to save deleted records. I have worked with such systems in past. I think you will find that they (marking records them as deleted) solve more problems than they create in the long run.

Let me know, my blog readers, what is your take on this issue?