7

It seems like a very arbitrary decision. Both can accomplish the same thing in most cases. By limiting the varchar length seems to me like you're shooting yourself in the foot cause you never know how long of a field you will need.

Is there any specific guideline for choosing VARCHAR or TEXT for your string fields?

I will be using postgresql with the sqlalchemy orm framework for python.

3
  • 1
    duplicate of stackoverflow.com/questions/564755/… Commented Jan 16, 2011 at 13:08
  • As Quassnoi is to SQL what John Skeet is to... well, everything else, you can't top the answer he gave. Commented Jan 16, 2011 at 13:14
  • I'm referring to the duplicate posted in comments but that was before you've mentioned postgresql. Commented Jan 16, 2011 at 15:27

6 Answers 6

10

In PostgreSQL there is no technical difference between varchar and text

You can see a varchar(nnn) as a text column with a check constraint that prohibits storing larger values.

So each time you want to have a length constraint, use varchar(nnn).

If you don't want to restrict the length of the data use text

Sign up to request clarification or add additional context in comments.

2 Comments

You can also use VARCHAR when you have no restriction. VARCHAR and VARCHAR(n) are two different things.
Ah, right. I always forget that varchar can be used without a length restriction as well. I usually use TEXT then.
2

This sentence is wrong:

By limiting the varchar length seems to me like you're shooting yourself in the foot cause you never know how long of a field you will need.

If you are saving, for example, MD5 hashes you do know how large the field is your storing and your storage becomes more efficient. Other examples are:

  • Usernames (64 max)
  • Passwords (128 max)
  • Zip codes
  • Addresses
  • Tags
  • Many more!

13 Comments

MD5 hashes would even be more efficiently stored in CHAR(32) columns.
True, but I'm giving an example where limiting lengths is not shooting yourself in the foot. But once again, true.
who said passwords should be 64 max?
@Absolute0: Passwords with 1000 chars will still be hashed to 40-char SHA1 checksums.
@SolomonUcko Zip codes in my country are most certainly not integer.
|
1

In brief:

  • Variable length fields save space, but because each field can have different length, it makes table operations slower
  • Fixed length fields make table operations fast, although must be large enough for the maximum expected input, so can use more space

Think of an analogy to arrays and linked lists, where arrays are fixed length fields, and linked lists are like varchars. Which is better, arrays or linked lists? Lucky we have both, because they are both useful in different situations, so too here.

4 Comments

Most of the times we use some variation of a vector :) but I get your point.
@Absolute0, and how do you think vector is typically implemented internally? Arrays. It's the exact same principle, if you want random access to any element, you need to know the size of every element (fixed size), otherwise, you can save space, although access requires you to move element-element, like in a linked list.
This advice is not true in PostgreSQL, char/varchar/text types have exactly the same representation on disk. There's no efficiency to be gained by using char.
@intgr, good point, i wonder why that's the case. will have to dig up some postgresql code one day...
0

In the most cases you do know what the max length of a string in a field is. In case of a first of lastname you don't need more then 255 characters for example. So by design you choose wich type to use, if you always use text you're wasting resources

1 Comment

That's a blanket statement that isn't quite accurate. Each choice wastes resources, because variable length fields make searching slower, which results in wasted CPU resources.
0

Check this article on PostgresOnline, it also links to two other usefull articles.

Most problems with TEXT in PostgreSQL occur when you're using tools, applications and drivers that treat TEXT very different from VARCHAR because other databases behave very different with these two datatypes.

Comments

-1

Database designers almost always know how many characters a column needs to hold. US delivery addresses need to hold up to 64 characters. (The US Postal Service publishes addressing guidelines that say so.) US ZIP codes are 5 characters long.

A database designer will look at representative sample data from her clients when she's specifying columns. She'll ask herself, questions like "What's the longest product name?" And when the answer is "70 characters", she won't make the column 3000 characters wide.

VARCHAR has a limit of 8k in SQL Server (I think). Most applications don't require nearly that much storage for a single column.

3 Comments

You shouldn't be limiting US zip codes to five characters, they expanded them to ZIP+4 (ten characters) about thirty years ago.
At the risk of stating the obvious, ZIP codes are still five characters. ZIP+4 codes are five characters plus four more characters. Two columns simplifies the constraints, and it simplifies grouping addresses by ZIP code. Grouping by ZIP code is required for price breaks on bulk mail.
Only if you expect Canadian addresses to conform to USPS specifications. Professionals know better. (Canada? Preferably less than 30 characters, but no more than 40 characters per address line for machineable mail.)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.