Using Python to break a continuous string into components?

Question

This is similar to what I want to do: breaking a 32-bit number into individual fields

This is my typical "string" 00000000110000000000011000000000

I need to break it up into four equal parts:

00000000

11000000

00000110

00000000

I need to append the list to a new text file with the original string as a header.

I know how to split the string if there were separators such as spaces but my string is continuous.

These could be thought of as 32bit and 8bit binary numbers but they are just text in a text file (for now)!

I am brand new to programing in Python so please, I need patient details, no generalizations.

Do not assume I know anything.

Thank you,

Ralph

robert · Accepted Answer · 2011-09-02 16:08:00Z

10

This should do what you want. See comprehensions for more details.

>>> s = "00000000110000000000011000000000"
>>> [s[i:i+8] for i in xrange(0, len(s), 8)]
['00000000', '11000000', '00000110', '00000000']

answered Sep 2, 2011 at 16:08

robert

34.6k8 gold badges55 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user925567 Over a year ago

Thank you, but what is ['00000000', '11000000', '00000110', '00000000']?

Remi Over a year ago

this is a 'list' containing your strings at fixed positions. Try E.g. 'mylist = ['00000000', '11000000', '00000110', '00000000']` mylist[0] will give you the first element. See also here and here

Andrew Clark · Accepted Answer · 2011-09-02 16:44:22Z

3

For reference, here are a few alternatives for splitting strings into equal length parts:

>>> import re
>>> re.findall(r'.{1,8}', s, re.S)
['00000000', '11000000', '00000110', '00000000']

>>> map(''.join, zip(*[iter(s)]*8))
['00000000', '11000000', '00000110', '00000000']

The zip method for splitting a sequence into n-length groups is documented here, but it will only work for strings whose length is evenly divisible by n (which won't be an issue for this particular question). If the string length is not evenly divisible by n you could use itertools.izip_longest(*[iter(s)]*8, fillvalue='').

answered Sep 2, 2011 at 16:44

Andrew Clark

210k36 gold badges284 silver badges310 bronze badges

1 Comment

user925567 Over a year ago

If I have ['00000000', '11000000', '00000110', '00000000'] Why do I need to ask this question? I do not understand what is being said when you use ['00000000', '11000000', '00000110', '00000000']? The character makeup will be unknown until the line from the file is parsed. Or is ['00000000', '11000000', '00000110', '00000000'] the expected output? Thanks Ralph

Remi · Accepted Answer · 2011-09-02 17:18:57Z

+1 for Robert's answer. As for 'I need to append the list to a new text file with the original string as a header':

s = "00000000110000000000011000000000"
s += '\n' + '\n'.join(s[i:i+8] for i in xrange(0, len(s), 8))

will give

'00000000110000000000011000000000\n00000000\n11000000\n00000110\n00000000'

thus putting each 'byte' on a separate line as I understood from your question...

Edit: some notes to help you understand: A list [] (see here) contains your data, in this case, strings, between its brackets. The first item in a list is retrieved as in:

mylist[0]

in Python, a string is itself also an object, with specific methods that you can call. So '\n' (representing a carriage return) is an object of type 'string', and you can call it's method join() with your list as argument:

'\n'.join(mylist)

The elements in the list are then 'joined' together with the string '\n' in between each element. The result is no longer a list, but a string. Two strings can be added together, thus

s += '\n' + '\n'.join(mylist)

adds to s (which was already a string), the right part which is itself a 'sum' of strings. (I hope that clears some things up?)

Thanks Remi, the "00000000110000000000011000000000" was given as a example the string will need to be read from a text file so I imagine
for a long file you can read 32bit strings at-a-time: with open('data.txt') as f: A2 = f.read(3) bits= f.read(33).strip() (the .strip() takes away the trailing space)

immortal · Accepted Answer · 2011-09-02 16:09:03Z

1

Strings, Lists and Touples can be broken using the indexing operator []. Using the : operator inside of the indexing operator you can achieve fields there. Try something like:

x = "00000000110000000000011000000000"
part1, part2, part3, part4 = x[:8], x[8:16], x[16:24], x[24:]

answered Sep 2, 2011 at 16:09

immortal

3,19822 silver badges39 bronze badges

2 Comments

user925567 Over a year ago

Thanks everybody, the indexing operator [] with the : operator appears to be the key!! I'll need to parse a text file with patterns, a typical pattern being: A1 00000000000000111000000000000000 00000000000001111100000000000000 00000000000011000110000000000000 00000000000110000011000000000000 00000000001100000001100000000000 00000000001111111111100000000000 00000000011111111111110000000000 00000000110000000000011000000000 00000001100000000000001100000000 00000011000000000000000110000000 00000110000000000000000011000000 00001100000000000000000001100000

Remi Over a year ago

OK, make sure to check the available string methods; knowing them is power... E.g. s.split() will split your pattern over the spaces, resulting in a list of the 32bit-strings!

KevinDTimm · Accepted Answer · 2011-09-02 16:13:27Z

0

you need a substring

x = 01234567
x0 = x[0:2]
x1 = x[2:4]
x2 = x[4:6]
x3 = x[6:8]

So, x0 will hold '01', x1 will hold '23', etc.

edited Sep 2, 2011 at 16:13

answered Sep 2, 2011 at 16:06

KevinDTimm

14.4k3 gold badges44 silver badges62 bronze badges

Collectives™ on Stack Overflow

Using Python to break a continuous string into components?

5 Answers 5

2 Comments

1 Comment

2 Comments

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

1 Comment

2 Comments

2 Comments

Comments

Linked

Related