> what is a good strategy for generating the underlying data for the tests?
What is a good strategy for generating the underlying data for the tests?
I would use a modified version of the second approach:
Generate fake data such that for each request X there is a well-defined, intentionally constructed set of results Y that will be returned.
But instead of querying the database directly your searchengine should be implemented against datasource-specific repository-interfaces
Each repository-interface has one implementation that uses the database and one fake implementation that can read the result from a human-readable textfile. This way your testdata is less dependant to database-schema-changes.
- While testing the searchengine uses the fake-repository.
- For each test there is test-specific answer-file that can be maintained with a text editor.