Revisions to Search multiple tables

Fixed typo

Source Link

edited Sep 19, 2011 at 7:48

541
5
11

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for SphynxSphinx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphinx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

Tweeted twitter.com/#!/StackProgrammer/status/114638628717735936

occurred Sep 16, 2011 at 9:55

Tried to formulate a better question.

Source Link

edited Sep 14, 2011 at 22:21

kgilden

541
5
11

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or areAre there totally different ways of doing itany more efficient solutions or best practises how to approach this? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

added 1 characters in body

Source Link

edited Sep 14, 2011 at 20:45

kgilden

541
5
11

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discarddiscards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discard common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

substring searching can only count exact results
the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

Source Link

asked Sep 14, 2011 at 20:30

kgilden

541
5
11

Loading

Stack Exchange Network

Return to Question