13

Say I have at database table containing information about a news article in each row. The table has an integer "sort" column to dictate the order in which the articles are to be presented on a web site. How do I best implement and maintain this sort order.

The problem I want to avoid is having the the articles numbered 1,2,3,4,..,100 and when article number 50 suddenly becomes interesting it gets its sort number set to 1 and then all articles between them must have their sort number increased by one.

Sure, setting initial sort numbers to 100,200,300,400 etc. leaves some space for moving around but at some point it will break.

Is there a correct way to do this, maybe a completely different approach?


Added-1:

All article titles are shown in a list linking to the contents, so yes all sorted items are show at once.

Added-2:

An item is not necessarily moved to the top of the list; any item can be placed anywhere in the ordered list.

1
  • Incrementing by 100 at a time will give you a lot of leeway for reordering, and you won't run out of numbers for a long time (even if they were 16-bit signed numbers - let alone 32-bit ones). And you can do a renumber periodically if the gaps are getting small. Commented Dec 29, 2008 at 20:27

10 Answers 10

10

Forget correct -- the problem you want to avoid my not be a big deal but, rather, just a couple UPDATE statements depending on your RDBMS (I’m assuming Oracle and that you’re moving the article “up” the list):

UPDATE Articles
SET sort_number = sort_number + 1
WHERE sort_number BETWEEN :new_sort_number and :current_sort_number - 1;

UPDATE Articles
SET sort_number = :new_sort_number
WHERE article_id = :article_id;

The biggest caveat is that the SET functionality doesn’t work the same in all RDBMS.

If you really want to consider correct, consider rethinking the question in terms of data structures – what you might be asking is how to implement a linked list in a database.

An alternative to maintaining your sort number column would be to maintain a parent ID column. This is arguably more correct since it preserves the serialization of your data but the drawback is that querying the data isn’t pretty or efficient (think CONNECT BY in Oracle, for example).

If the question is, instead, what’s best perhaps you want to consider both the parent ID column for correctness and denormalizing the data by deriving a sort number column’s value, possibly from the parent ID data or from the first solution.

Sign up to request clarification or add additional context in comments.

Comments

9

It's quite simple. You need have an "order column".

To use this method you'd need to have a minimum of 2 columns:

  1. id - 32bit int
  2. orderValue - 64bit bigint (bigint, not double!!!)

Insert/Update

  1. When you insert the first new record you must
    set orderValue = round(max_bigint / 2).

  2. If you insert a row that belongs at the beginning of the list you must
    set orderValue = round("orderValue of previously inserted record" / 2)

  3. If you insert a row that belongs at the end of the list you must
    set orderValue = round("max_bigint + orderValue of final record" / 2)

  4. If you insert anywhere else you must
    set orderValue = round("orderValue of record before + orderValue of record after" / 2)

This method has a very large maximum list size. If you run into constraint errors or if you think you'll run out of resolution you can always rebuild the orderValue column (normalize).

If you use normalization to continually organize the orderValue column you may be able to use a 32-bit int.

It's very simple and fast!

Remember - don't use doubles!!! Only integers - the orderValue column needs very high precision!

3 Comments

can you please add more info about normalization?
A little note, you'll find yourself having overflow isssue if doing this (max_bigint + orderValue of final record) / 2, write it like this instead max_bigint / 2 + orderValue / 2
8

If I understand your problem correctly; I would use a column containing some type of ranking index number. This number doesn't have to be unique. You will need to come up with some type of algorithm to calculate the rank.

Then just sort on the ranking column, with a secondary column to handle ranking ties (maybe the create date or something).

1 Comment

Or order modification date-time, so you can reorder them.
5

Model your table after the linked list data structure. Get rid of the "sort" column, instead have a next_article_id column which points to the next article after it. Then when you want to add a new article at position 50 you only have to update article #49 to point to your new article, then point your new article to the one article #49 was previously pointed to.

1 Comment

please give me a SELECT query which I can use to get all the articles, sorted correctly
3

To build on Alkini's answer (I would reply to that directly but I lack the points/privileges for the moment). That method works only if the new display order is less than the new. If it's the other way around then you need to shift the articles in between in the other direction.

if (new_sort_number == current_sort_number)

Do nothing

if (new_sort_number < current_sort_number)

UPDATE Articles
SET sort_number = sort_number + 1
WHERE sort_number BETWEEN :new_sort_number and :current_sort_number - 1;

if (new_sort_number > current_sort_number)

UPDATE Articles
SET sort_number = sort_number - 1
WHERE sort_number BETWEEN :current_sort_number + 1 and :new_sort_number;

Then in either case, update the article that got moved.

UPDATE Articles
SET sort_number = :new_sort_number
WHERE article_id = :article_id;

Hope that helps someone else looking for a generalized display order update algorithm.

Comments

1

Using a real number would seem to be the most computationally efficient choice. You could move a row between two others by adding half the distance between the the two rows. To move a newrow between row1 and row2, calculate newrow as newrow = row1 + (row2 - row1)/2.0. However, due to floating point limits, this series (using double precision) will converge in 53 iterations. So worst case you can only do 53 inserts (more if you are not always inserting between the last newrow and row2). This is similar to the idea of leaving gaps and filling them in. The idea of doing the update is less efficient, but does not break after a certain number of inserts. Keeping the column indexed will probably help.

1 Comment

Wouldn't it be easier to just to add 100 to the rank of each new row and then fit the new "in-between" rows in one of those 100 slots. It still converges but it's less complicated.
0

Use descending sort order, i.e. the highest sort number is shown on top.

If an older article should be placed on top of the list, simply set its Sort column to MAX(Sort)+1. No need to renumber the whole table.

Update after your comments:

Change sort order to floating point. If the sort order of an article is rearranged between two other articles, the new Sort value is set to ((Sort of previous article) + (Sort of next article)) / 2. Otherwise, set Sort to MAX+1.

Comments

0

The problem I want to avoid is having the the articles numbered 1,2,3,4,..,100

I would suggest you not try and having your sort key be your primary key, or be unique. If they key is not unique you can have multiple items with the same value. You probably would want to simply make your sort key be a simple 1-10 scale.

So perhaps things at level 10 would be extremely interesting, 5 would be normal, and 1 would be extremely boring. When something becomes boring or more interesting, just adjust the value of your sort key down or up.

Comments

0

In order to do this, we set the rows to be incrementing integers, then use a stored proc to move items. The proc takes two args, the current index of the item, and the new index you want to move it to. It does an update items set index = index + sign(@newidx - @oldidx) where index between itemMax(@newidx, @oldidx) and itemMin(@newidx, @oldidx) to update the items in between and then an update that changes the actual item to the new index. This is not the actual code, just from memory. In our table, we have two columns, the id and the index so that even though at one point you end up with two items at the same index, we can differentiate them by ID. This could be circumvented by doing an update on the item to be moved to and index of -1 maybe, then updating the rest and finally moving the new item at -1 to the new index.

Comments

0

I'd go with the answer most people are saying, namely that you should have a separate column in your table with the record's sort/interest value. If you want to factor in some sort of time argument while still remaining robust, you might want to try and incorporate a data "view" for a given time period.

I.e. every night at midnight you create a table view of your main table but only create it using articles from the past 2 or 3 days, then maintain the sort/interest counts strictly on that table. If someone is selecting an older article not in that table, (say from a week ago) you can always insert it into the current view. This way you can get an effect of age-off.

The whole point would be to maintain small indices on the table where inserts and updates of sorting/interest values should remain relatively fast. You should only have to pay for complicated joins once up front during the view creation, then you could keep it flat to optimize for excessive reads. What's the max you would have in the view, a couple hundred, couple thousand? Beats trying to sort a hundred thousand every time someone makes a request.

Comments