My colleague U developed an application feature that would enable our users to upload documents. Users can also organize their uploaded documents as a tree of files and folders.

We then discussed how to store this information in the database.

Method 1 : Create a folder for the user on the server side file system and create the required files and folders as how the user needs it. The user’s display of his tree structure is exactly mimicked on the server side.

Method 2 : Do not use file systems. Put all the stuff discussed above in the database, including the files.

method 3 : Create a folder for the user. All the uploaded files reside in this directory. The user’s tree structure is maintained in the database (yes, we have to handle files with duplicate names. but this is not a big deal.).

We implemented Method (3). Method (1) was rejected straightaway. The difference between Method(2) an Method(3) is not much. It is about where to store the file contents ?

I would like to know which among Methods (2) and (3) is better ? I’ve heard contradicting views on file IO Vs database SELECTs of files (blob). It was quite simple to code the Method(3) way (may be, more of a perception issue).

The advantages of Method (3) are
1. The files can be backed up easily.
2. The server on which these files reside can be moved around, for performance reasons, easily.
3. Easier to index these files using search engines like lucene.

One might argue that all these are possible with the database approach as well.
These are my counter points.
1. The dbs can be backed up — But DB size bloats up. Certainly an issue for me.
2. server can be moved around — Certainly not as easily as moving files around. I need to set up a database cluster or have some sort of sharding or have a separate db for just the file storing tables but then it IS complicated than handling files, isnt it ?
3. Index – full text index of databases … again not as elegant. It is OS specific or DB specific etc.

Method (3) wins hands down right ?
I am open to hear other’s opinion on this. Any (!null) pointers would be just great !

About these ads