144

I do this:

a = 'hello'

And now I just want an independent copy of a:

import copy

b = str(a)
c = a[:]
d = a + ''
e = copy.copy(a)

map( id, [ a,b,c,d,e ] )

Out[3]:

[4365576160, 4365576160, 4365576160, 4365576160, 4365576160]

Why do they all have the same memory address and how can I get a copy of a?

5
  • 5
    To get answer different from Martijin's (which is entirely correct, though doesn't necessarily answer question as stated) you might want to provide more detail/use case to show why you want it copied. Commented Jul 17, 2014 at 14:10
  • 4
    As @elemo implies, this might be an XY Problem. Commented Jul 17, 2014 at 14:39
  • 2
    I was interested in estimating the memory usage of a nested dictionary of the form d[ 'hello' ] = e, where e[ 'hi' ] = 'again'. To generate such a nested dictionary, I generated a single e dictionary and copied it multiple times. I noticed that the memory consumption was very low, which led to my question here. Now I understand that no string copies were created, hence the low memory consumption. Commented Jul 17, 2014 at 15:41
  • 1
    If you want b to be a modified version of a without modifying a, just let b be the result of whatever operation. e.g. b = a[2:-1] sets b to 'll' and a remains 'hello'. Commented Jul 17, 2014 at 22:30
  • Ollie is correct. This is because str is an immutable type. Due to python's use of singletons (and probably other internal optimizations), You won't see the memory expand like you expect when copying the e dictionary. Commented Aug 27, 2016 at 1:12

8 Answers 8

203

You don't need to copy a Python string. They are immutable, and the copy module always returns the original in such cases, as do str(), the whole string slice, and concatenating with an empty string.

Moreover, your 'hello' string is interned (certain strings are). Python deliberately tries to keep just the one copy, as that makes dictionary lookups faster.

One way you could work around this is to actually create a new string, then slice that string back to the original content:

>>> a = 'hello'
>>> b = (a + '.')[:-1]
>>> id(a), id(b)
(4435312528, 4435312432)

But all you are doing now is waste memory. It is not as if you can mutate these string objects in any way, after all.

If all you wanted to know is how much memory a Python object requires, use sys.getsizeof(); it gives you the memory footprint of any Python object.

For containers this does not include the contents; you'd have to recurse into each container to calculate a total memory size:

>>> import sys
>>> a = 'hello'
>>> sys.getsizeof(a)
42
>>> b = {'foo': 'bar'}
>>> sys.getsizeof(b)
280
>>> sys.getsizeof(b) + sum(sys.getsizeof(k) + sys.getsizeof(v) for k, v in b.items())
360

You can then choose to use id() tracking to take an actual memory footprint or to estimate a maximum footprint if objects were not cached and reused.

Sign up to request clarification or add additional context in comments.

7 Comments

There's more than only one way to create a new string object, such as b = ''.join(a).
@martineau: sure, I really meant to say 'one way'.
Emphasis on "You don't need to copy a Python string". There's a reason why those operations simply return the same string.
In this case, though, the OP is attempting to waste memory. Since he wants to know how much memory will be used by a certain quantity of strings, that is the actual goal. Obviously he could generate unique strings, but that's just unnecessary work as a workaround.
+1 for "casually" using an example that would output 42.
|
24

I'm just starting some string manipulations and found this question. I was probably trying to do something like the OP, "usual me". The previous answers did not clear up my confusion, but after thinking a little about it I finally "got it".

As long as a, b, c, d, and e have the same value, they reference to the same place. Memory is saved. As soon as the variable start to have different values, they get start to have different references. My learning experience came from this code:

import copy
a = 'hello'
b = str(a)
c = a[:]
d = a + ''
e = copy.copy(a)

print map( id, [ a,b,c,d,e ] )

print a, b, c, d, e

e = a + 'something'
a = 'goodbye'
print map( id, [ a,b,c,d,e ] )
print a, b, c, d, e

The printed output is:

[4538504992, 4538504992, 4538504992, 4538504992, 4538504992]

hello hello hello hello hello

[6113502048, 4538504992, 4538504992, 4538504992, 5570935808]

goodbye hello hello hello hello something

2 Comments

More details for the behavior are described in this post stackoverflow.com/questions/2123925/…
Underrated answer
20

You can copy a string in python via string formatting :

>>> a = 'foo'  
>>> b = '%s' % a  
>>> id(a), id(b)  
(140595444686784, 140595444726400)  

1 Comment

Not true in Python 3.6.5. id(a) and id(b) are identical. The results are no different even when I used the modern version of format, viz., b = '{:s}'.format(a)
8

To put it a different way "id()" is not what you care about. You want to know if the variable name can be modified without harming the source variable name.

>>> a = 'hello'                                                                                                                                                                                                                                                                                        
>>> b = a[:]                                                                                                                                                                                                                                                                                           
>>> c = a                                                                                                                                                                                                                                                                                              
>>> b += ' world'                                                                                                                                                                                                                                                                                      
>>> c += ', bye'                                                                                                                                                                                                                                                                                       
>>> a                                                                                                                                                                                                                                                                                                  
'hello'                                                                                                                                                                                                                                                                                                
>>> b                                                                                                                                                                                                                                                                                                  
'hello world'                                                                                                                                                                                                                                                                                          
>>> c                                                                                                                                                                                                                                                                                                  
'hello, bye'                                                                                                                                                                                                                                                                                           

If you're used to C, then these are like pointer variables except you can't de-reference them to modify what they point at, but id() will tell you where they currently point.

The problem for python programmers comes when you consider deeper structures like lists or dicts:

>>> o={'a': 10}                                                                                                                                                                                                                                                                                        
>>> x=o                                                                                                                                                                                                                                                                                                
>>> y=o.copy()                                                                                                                                                                                                                                                                                         
>>> x['a'] = 20                                                                                                                                                                                                                                                                                        
>>> y['a'] = 30                                                                                                                                                                                                                                                                                        
>>> o                                                                                                                                                                                                                                                                                                  
{'a': 20}                                                                                                                                                                                                                                                                                              
>>> x                                                                                                                                                                                                                                                                                                  
{'a': 20}                                                                                                                                                                                                                                                                                              
>>> y                                                                                                                                                                                                                                                                                                  
{'a': 30}                                                                                                                                                                                                                                                                                              

Here o and x refer to the same dict o['a'] and x['a'], and that dict is "mutable" in the sense that you can change the value for key 'a'. That's why "y" needs to be a copy and y['a'] can refer to something else.

Comments

6

It is possible, using this simple trick :

a = "Python"
b = a[ : : -1 ][ : : -1 ]
print( "a =" , a )
print( "b =" , b )
a == b  # True
id( a ) == id( b ) # False

Comments

4

As others have already explained, there's rarely an actual need for this, but nevertheless, here you go:
(works on Python 3, but there's probably something similar for Python 2)

import ctypes

copy          = ctypes.pythonapi._PyUnicode_Copy
copy.argtypes = [ctypes.py_object]
copy.restype  = ctypes.py_object

s1 = 'xxxxxxxxxxxxx'
s2 = copy(s1)

id(s1) == id(s2) # False

1 Comment

This is the best answer. Believe it or not, this was actually useful in debugging a C extension for Python to see whether it was corrupting strings that it had previously returned to me.
1

I think I have just solved this with string slicing.

a="words"
b=a[:int(len(a)/2)]+a[int(len(a)/2):]
b
'words'
id(a),id(b)
(1492873997808, 1492902431216)

Comments

0

Copying a string can be done two ways either copy the location a = "a" b = a or you can clone which means b wont get affected when a is changed which is done by a = 'a' b = a[:]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.