Adding "null" or NOT NULL column to a huge SQL Server table

Question

I have a table with some 30 columns, already used in the application extensively. i.e Select, Insert and Update operations for this table written in many different ways(in whatever ways the developers thought they were comfortable) in number of stored procedures and UDFs. I'm now handed with a task to extend the functionality for which the table serves and I'm in need to add additional detail to the table(generally can be assumed as an additional column to the table). Adding additional column to the table is a massive and inefficient task I don't want to do considering the impact it will cause elsewhere.

Another way i can think of now is creating a new table with foreign key to the main table and maintaining the records in the new table. I'm skeptical of this way too. What is the effective way to handle this sort of modifications in the schema of the table?

Using SQL Server 2000 in case it's needed.

Edit:

Unfortuantely, column should not accept NULL values. Missed this crucial info indeed

Impacts i think which can occur due to already implemented poor practices are,

1) "SELECT *" and binding to some datagrid directly to front end. (very very low probable)

2) using Column numbers to fetch from dataset or datatable instead of column names in front end when using "SELECT *"

3) "Insert into" with values given sequentially instead of with column names.

By some way, if i can make the column to accept "NULL" values(by tweaking requirements a bit) any impact due to the above points?

I'm doubtful of analysisng existing code because number of SPs and functions using this table can run into hundreds.

+1 for a good topic of discussion of an issue that seems to be fairly common. — Dusty
– Dusty, Commented Jun 17, 2009 at 13:41

Chris McCall · Accepted Answer · 2009-06-17 14:02:08Z

5

Build a new table with all the columns you need, call it whatever you want.
Create a view, name it the same as the old table, and have it return all the columns the old table used to.
???
$

(yes, I know that this might be confusing for maintenance because a lot of DBAs use a naming convention for views: V_Viewname. I never got into naming a SQL object after what type of object it is and don't see the benefit of such a convention)

answered Jun 17, 2009 at 14:02

Chris McCall

10.4k9 gold badges51 silver badges83 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

lakshminb7 Over a year ago

one stupid question though, if table name and view name are same, how the existing queries using the table name works? Is always View takes preference over Tables, if they are of same names?

Chris McCall Over a year ago

You can't have any two database-scope objects (tables, views, indexes, stored procedures) in the database with the same name.

Chris McCall Over a year ago

Not an entirely stupid question, though. I had to verify by trying to do this myself.

lakshminb7 Over a year ago

@Chris I went with creating a new extension table. But your answer is interesting. Can you just clarify your 2nd point? Whether i can create a new view with the same name as my old table name? And my question in the comments. I would have marked your answer as accepted if i was okay with that part.

John Saunders · Accepted Answer · 2009-06-17 13:44:32Z

3

Ask yourself why adding a column would have a massive impact. Perhaps you have queries that use SELECT *? Find out why the impact would be significant - then consider those to be bugs, and fix them.

Most of the time, adding a column should not break anything. Adding a NOT NULL column will affect anything that does an INSERT, but otherwise, there should be little impact if your database is properly designed.

EDIT after NOT NULL update

The solution is obvious: add the column as NULL, update the data to include non NULL values for every row, then alter the column to be NOT NULL.

edited Jun 17, 2009 at 13:44

answered Jun 17, 2009 at 13:27

John Saunders

162k26 gold badges252 silver badges403 bronze badges

3 Comments

Eric Over a year ago

Actually, it will affect the existing data. As a matter of fact if you try to insert a not null column with data already in the table you will get an error and you won't be able to proceed.

John Saunders Over a year ago

@Eric: I specifically mentioned the case of a NOT NULL column, above. The solution is obvious: add the column as NULL, update the data to include non NULL values for every row, then alter the column to be NOT NULL.

Eric Over a year ago

@John: I misunderstood I guess. I didn't realize you meant that it will affect the table from the get-go.

Jesse · Accepted Answer · 2009-06-17 14:20:05Z

The suggestion of adding a new table to accomodate this new column is what is technically known as vertical partitioning, and although there is a place for it in database design, those concerns have to do with performance.

Ideally you should be able to simply add the new column to the existing table. If you have to add a new table to your database everytime you want to add a new column, your system is going to become unmanageable very quickly. I assume that you don't have a dev/test environment separate from production. This might be the perfect opportunity to convince your boss that you need one.

Ian Boyd · Accepted Answer · 2012-11-06 19:19:53Z

Adding additional column to the table is a massive and inefficient task I don't want to do considering the impact it will cause elsewhere.

Can you elaborate on this?

Adding the columns as nullable, or with default values, means that nobody will actually have to supply values. no impact

If if you're concerned about the the lock time as a column is added to the table, add the columns to the end of the table (that way SQL Server doesn't have to create a new table, copy data to it, drop the old table, and rename the new one back.) almost no runtime impact

Adding 50 million rows of data would have almost no runtime impact?

User @BrianWhite seems to be confused how adding a column to a table that contains 50 million rows can have almost no runtime impact. He seems to think that adding a column to a large table is an expensive operation, that would cause problems for other users as the extended operation blocks users. He seems to think that adding a column causes the server to write 50 million rows:

it will hold a table lock for the amount of time that it takes to write 50 million entries of data

The important point is that it will not write 50 million entries of data. To demonstrate this, just happen to have a table with 28,176,266 rows (4,557 MB):

--How many rows in the table
SELECT COUNT(*) FROM BigTable

28176266
(1 row(s) affected)

--How big is the table
EXECUTE sp_spaceused 'BigTable'

name      rows      reserved    data        index_size  unused
--------  --------  ----------  ----------  ----------  ------
BigTable  28176266  4681560 KB  4666984 KB  14536 KB    40 KB

Now that we've established that i have a 28 million row table, that is 4.6 GB, lets add a column to this table:

ALTER TABLE BigTable ADD NewColumn int NULL

Wait! The question is: How long will it take? Isn't this a long operation that will take a table lock while it creates 28 million entries?

No! Let's time how long it takes:

PRINT 'Time before adding the column: '+CONVERT(varchar(50), getdate(), 126)
ALTER TABLE BigTable ADD NewColumn int NULL
PRINT 'Time after adding the column: '+CONVERT(varchar(50), getdate(), 126)

And how long did it take to add a column to a 28 million row, 4.6 GB table?

Time before adding the column: 2012-11-06T14:14:33.493
Time after adding the column: 2012-11-06T14:14:33.503

The answer: about 10ms

Ten milliseconds.

Adding 50 million rows of data would have almost no runtime impact?
@BrianWhite Not adding 50 million rows, adding a new column to a table that contains 50 million rows.
Yes, and it will hold a table lock for the amount of time that it takes to write 50 million entries of data. It is quite a big impact on a running site. Transactions start piling up very quickly. In contrast adding it as nullable, updating a few thousand rows at a time (less than the ~5k that escalates to table lock), and then flipping it non-null at the end is almost no impact
@BrianWhite Well there's his problem: he made the column NOT NULL! My answer deals with "Adding the columns as nullable, or with default values, means that nobody will actually have to supply values. no impact"; as the original question was asking at the time i answered the question.
Ah. When i came here the question stated that it meant non-null columns. That is the expensive part of course, and why it will take a long time and why I asked about your response. It was not because I don't understand how databases work. The time stamps on the 'asked' and 'edited' on the question are only 30 minutes apart it looks like. Your large update above addresses only the nullable column situation. And yes, adding a non-nullable column is exactly his problem and why he was asking it :)

Steven A. Lowe · Accepted Answer · 2009-06-17 13:25:33Z

either approach will work, with the following caveats:

if you have a SELECT * ... somewhere, your new columns will show up in the result-set, which may be undesirable, e.g.

insert into #tmpTable select * from sometable where blah-blah-blah

will cause an error unless the new colums are defined in the temp table

using an 'extension' table is lower impact but less efficient, however, it is the only method guaranteed not to disturb existing stored procedures, views, et al

Cătălin Pitiș · Accepted Answer · 2009-06-17 13:25:36Z

If adding a new column in the existing table is not acceptable, add a new table in one-to-one relation with the old table. It should contain the primary key field as in the old table, and the new column(s). This key field is primary key also for the new table (to enforce one-to-(zero or one) cardinality).

The disadvantage is that:

in order to find new data, you need to make a join (outer join actually).
when inserting/updating/deleting records, you have to do it in two tables

Macarse · Accepted Answer · 2009-06-17 13:25:50Z

0

I would add the tables needed and add triggers to the original one while I refactor the code and the db.

answered Jun 17, 2009 at 13:25

Macarse

93.3k44 gold badges177 silver badges233 bronze badges

Comments

tmeisenh · Accepted Answer · 2009-06-17 13:27:50Z

0

You would have to evaluate the impact to the existing codebase and that would be your answer. If it fits within timelines, then I usually suggest to make it right. If it falls out of timelines then obviously you just hack it and fix it another time.

Sometimes we can't fix everything and the only solution is to just band-aid things.

answered Jun 17, 2009 at 13:27

tmeisenh

1,5521 gold badge13 silver badges11 bronze badges

Comments

Robin Day · Accepted Answer · 2009-06-17 13:30:22Z

I would first investigate the issue you have with just altering the original table. If you are just adding nullable columns then you may find there is no issue at all.

The possible problems from an existing code perspective is that developers may have done a SELECT * FROM TABLE which could mess up this code if more is added. However, it is a fairly widespread best practice that you should never perform a SELECT *.

If you go down the second table route, you could just add a VIEW to the two tables so that any new development can be based upon this view.

In my opinion though, I would probably just go with modifying the existing table and deal with any problems you come accross. This of course is dependant on the real life "cost" of getting it wrong, will people die?

Eric · Accepted Answer · 2009-06-17 13:34:44Z

0

I like the creating the new table Idea. I think it is the safest way to do it. But if the new column you want to add can allow nulls you shouldn't have any problem. Just make sure you make the column allow nulls.

If it cannot allow nulls, set the column to allow nulls,insert the values you need in the columns for the existing data then be sure to set the column back to allow nulls.

answered Jun 17, 2009 at 13:34

Eric

8,10819 gold badges102 silver badges131 bronze badges

Comments

Dusty · Accepted Answer · 2009-06-17 13:45:52Z

I think the extension table is your best bet. When you get your list of where the table is used from the sys tables and are going about making your changes, I'd recommend that you create a new view of your table linked to the new extension table and use that in your select statements instead. This should buy you some flexibility in the future.

EDIT: I wouldn't try to keep a one for one relatinoship in this extension table. I'd enter a row in the extension table only if it is necessary and left join in the view. This way you don't have to worry about triggers or tons of data validation making sure that the tables are in sync.

HLGEM · Accepted Answer · 2009-06-17 14:36:44Z

If you use alter table and add a default value so that all the records get a value, then it should not be too bad unless you have millions of records. Do nop do this throuhg Enterprise Manager (you should never alter tables using Enterprise Manager as it totally recreates the table which Alter table does not). If you have too many records to have the deafult populated automatically, you first need to alter table to add a column allwoing null values, then update the column to the proper values (if you have a lot of records, you might want to do this is in baches rather than locking up the whole table) based on whatever rules you have for determineing the proper value for existing records. Then alter table to make the column not nullable once you know there are no records without a value. At this time you may want to consider a default value for any new records which don't have a value.

Will adding a column have an impact on existing code. If the developers did not use select * (which should never be used in production code) it will not have much of an impact, except that you must be adding the new column for a purpose and whatever code is related to that purpoase will need to be updated to inmclude the new column. Since this is a not nullable column, at the minimum your code to insert records will need to be changed and possibly your code to update them (Depending on whether this is a value that would ever be updated once it is in place.) There are also probably some selects which might be affected. The insert code must be in place at roughly the same time as the change which makes the column not nullable, otherwise all inserts will fail until you put it in place. YOu do this by making it all one big script.

If you think a lot of will be affected and it will take some time to sort them all out. create a new table including the new column. Populate it from the old table. Change the insert/updates/deletes to go to the new table. Then drop the old table and create a view with the name of the old table thaat has the only the old columns. Do all this in scripts so tjhat it can run on prod all together. Do not run this during the main part of the day, schedule it to run during the lightest hours of the database use.

Collectives™ on Stack Overflow

Adding "null" or NOT NULL column to a huge SQL Server table

12 Answers 12

4 Comments

3 Comments

Comments

Adding 50 million rows of data would have almost no runtime impact?

9 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

12 Answers 12

4 Comments

3 Comments

Comments

Adding 50 million rows of data would have almost no runtime impact?

9 Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Comments

Linked

Related