One idea with test-driven design is to run all your unit tests when building a project, which allows you to catch functional regressions quickly, before committing the changes that caused them. For this to work, though, all your tests need to complete quickly.
If your benchmarks all run quickly, they can be included in thisyour test framework (I don't know if that is what Caliper is supposed to be for, but Google is heavily into TDD. Also..). In any case, the whole point of doing soadding benchmarks to your test-driven infrastructure would be to catch major performance regressions quickly -- so, for thisthat kind of usage, youyou'd want to flag taking too long as an error.