Skip to main content
Fixed typo
Source Link
kgilden
  • 541
  • 5
  • 11

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for SphynxSphinx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphinx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

Tweeted twitter.com/#!/StackProgrammer/status/114638628717735936
Tried to formulate a better question.
Source Link
kgilden
  • 541
  • 5
  • 11

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or areAre there totally different ways of doing itany more efficient solutions or best practises how to approach this? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Are there any more efficient solutions or best practises how to approach this? Thanks.

added 1 characters in body
Source Link
kgilden
  • 541
  • 5
  • 11

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discarddiscards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discard common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

I have developed a web application that is used mainly for archiving all sorts of textual material (documents, references to articles, books, magazines etc.). There can be any given number of archive tables in my system, each with its own schema. The schema can be changed by a moderator through the application (imagine something similar to a really dumbed down version of phpMyAdmin).

Users can search for anything from all of the tables. By using FULLTEXT indexes together with substring searching (fields which do not support FULLTEXT indexing) the script inserts the results of a search to a single table and by ordering these results by the similarity measure I can fairly easily return the paginated results.

However, this approach has a few problems:

  • substring searching can only count exact results
  • the 50% rule applies to all tables separately and thus, mysql may not return important matches or too naively discards common words.
  • is quite expensive in terms of query numbers and execution time (not an issue right now as there's not a lot of data yet in the tables).
  • normalized data is not even searched for (I have different tables for categories, languages and file attatchments).

My planned solution Create a single table having columns similar to

id, table_id, row_id, data

Every time a new row is created/modified/deleted in any of the data tables this central table also gets updated with the data column containing a concatenation of all the fields in a row. I could then create a single index for Sphynx and use it for doing searches instead.

Is this a better solution to improve my application's searching capabilities or are there totally different ways of doing it? Thanks.

Source Link
kgilden
  • 541
  • 5
  • 11
Loading