13

I'm working on a project at the moment that has a rather unusual requirement and I'm hoping to get some advice on the best way to handle it or even some pointers to info that can help me build a solution.

Ok, so this is what I need to do. The application stores and manages various types of media files but each deployment of the application has completely different metadata requirements for the media files.

This metadata can contain an arbitrary number of fields of different types (single line text, multi-line text, checkboxes, selected values, etc.) and also often requires validation particularly presence and uniqueness validations.

The application needs to be able to easily retrieve values and most importantly has to be able to handle full searching capabilities on these fields.

One option I considered was using a property list arrangement where the database table simply contained a property name and value for each metadata field of each media file. However, when prototyping this solution it quickly became apparent that it simply wasn't going to be efficient enough for the searching and retrieval of records particularly when the database can be reasonably large e.g. a recent deployment had 3000 media files and there were over 20 metadata fields. Also, the queries to do a search and retrieve the relevant records quickly became very complex.

Another option that the system is currently using is that the metadata config is defined upfront and a migration is run during deployment to create a the table and model with a standard name so that the media model can be associated with it which the system then uses. This generally works pretty fine but it does cause some significant deployment and testing issues.

For example, writing unit tests becomes much more challenging when you don't know the config until deployment. Although I could write a sample config and test the code that way, it won't allow me to test the specific requirements of a particular deployment.

Similarly, in development, it currently requires me to copy a migration from the config into the main folder, run it, do all of my testing and development and then I have to remember to rollback and remove that migration from the main folder so that the application is in a standard state. This particularly becomes challenging when I'm bug fixing and I need to have the application in a specific configuration for testing and debugging purposes. Trying to switch between the various configurations becomes a real nightmare.

Ideally, what I would like is to be able to dynamically create the table and model including validations, etc. from a config file when the server is started. Even better would be if I could maintain multiple metadata setups in the one database with each one having its own table so that all I need to do to switch between them is change which config file the application is currently using.

I'm sure this can be done with Rails but there is very little information that I've been able to find that can point me in the right direction of how to build it during my research over the past few days so any help or suggestions would be much appreciated!

5 Answers 5

4

If I understand you correctly, Rails has some nifty tricks to help you solve these problems.

In the ActiveRecord ORM it's possible to model what you're trying to do in a relational database, either using the single table inheritance pattern, or with polymorphic associations (...a bit more complex but more flexible too). A polymorphic association allows a model to belong_to different types of other models. There's a recent railscast on this topic but I won't link to it since it requires a paid subscription.

On the deployment side, it sounds like you're doing a lot of things manually, which is the right way to start until a pattern emerges. Once you start seeing the pattern, there are excellent programs available for configuration, build, and deployment automation such as Capistrano, OpsCode Chef, and Puppet, to name just a few. You might also benefit from integrating your configuration and deployment with your source code repository to achieve a better workflow. For example, with Git you could define topic branches for the various media file type and have a different configuration in each branch that matches the topic branch.

You may want to check out Martin Fowler's excellent book 'PoEAA' and some of the topics on his website. I hope this answer helps even though the answer is pretty generic. Your question is very broad and does not have one simple answer.

Sign up to request clarification or add additional context in comments.

Comments

2

each deployment of the application has completely different metadata requirements for the media files.

I recommend using mongoDB for your database and Mongoid for your ORM. This will give you the flexibility you need to change the schema as needed without horrific schema manipulations, dynamic models/tables, and all that horror.

The application needs to be able to easily retrieve values and most importantly has to be able to handle full searching capabilities on these fields.

This is a search problem rather than a database problem. I recommend trying out the full-text search capabilities in the latest version of mongoDB. If that doesn't meet your needs, try elasticsearch in conjuction with the Tire gem (an elasticsearch client that integrates nicely with Rails).

Comments

1

What you've described sounds exactly like the defining requirements for a non-traditional storage mechanism that used key-value storage.

I sense this from:

  • 'completely different metadata requirements' and '- 'arbitrary number of fields of different types' - key-value data stores often have no schema and are very flexible to different record layouts that change on the fly.

  • The application needs to be able to easily retrieve values and most importantly has to be able to handle full searching capabilities on these fields. Key-value stores are made to be extremely efficient at retrieving and filtering rows for queries.

'A property list arrangement where the database table simply contained a property name and value for each metadata' is basically a key value store.

Some options are:

1 Comment

Yeah, that was my first thought too and that was what I intended to do originally. But when I implemented a prototype solution a basic search just retrieving fields was about 10 times slower than a normal table-based solution. And when you add the cost of running searches to filter records it quickly becomes prohibitively slow, especially given that the searching functionality is fairly fundamental to the operation of the overall application.
1

I can see a few reasonable ways to approach this, depending on your exact requirements:

  1. If it's not prohibitive to write models and migrations for each of your metadata sets, then go ahead and generate a model and migration for each one. Then, in your per-environment config file -- e.g. config/development.rb -- load the desired model for that environment into a global constant (maybe ModelConfiguration::MetadataModel inside lib/model_configuration.rb). Write the rest of your app to only interact with your metadata model via this constant.

    This approach is quite efficient; its only real downside is that you're making an extra table in your database for each model. At runtime, the unused models are never even loaded, so they don't affect your performance at all.

  2. On the other hand, it's possible you have so many metadata models that this approach is too painful to consider, or it's possible that you don't know the metadata model ahead of time. In this case, I would do the following:

    • Have your config load the currently desired model configuration into a global constant in some easily-digestible form (maybe ModelConfiguration::ModelJSON).

    • Write a single model class which, when loaded, looks in ModelConfiguration::ModelJSON and calls a class method to install the appropriate fields and validations on itself from this configuration.

    • Write a Rake task to build a table to match your configuration. See http://edgeguides.rubyonrails.org/command_line.html#custom-rake-tasks for a quick run-down on how to write a Rake task. The simplest approach is probably to generate a one-off migration from your config and then run that migration (by calling .up on it). The downside here is that the one-off migration will disappear after the task is run, so you'll lose access to rake db:rollback.

    This approach is very general, and its biggest advantage is that you don't need a code change in order to get a config change, which gives you a lot of freedom in how you store and deploy your configurations.

Comments

0

Maybe use jsonb fields that store the metadata and create a dynamic view layer for crudding those.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.