| Portability | ghc only |
|---|---|
| Stability | stable |
| Maintainer | [email protected] |
| Safe Haskell | Trustworthy |
Data.ByteString.Short
Description
A compact representation suitable for storing short byte strings in memory.
In typical use cases it can be imported alongside Data.ByteString, e.g.
import qualified Data.ByteString as B
import qualified Data.ByteString.Short as B
(ShortByteString, toShort, fromShort)
Other ShortByteString operations clash with Data.ByteString or Prelude
functions however, so they should be imported qualified with a different
alias e.g.
import qualified Data.ByteString.Short as B.Short
- data ShortByteString
- toShort :: ByteString -> ShortByteString
- fromShort :: ShortByteString -> ByteString
- pack :: [Word8] -> ShortByteString
- unpack :: ShortByteString -> [Word8]
- empty :: ShortByteString
- null :: ShortByteString -> Bool
- length :: ShortByteString -> Int
- index :: ShortByteString -> Int -> Word8
The ShortByteString type
data ShortByteString Source
A compact representation of a Word8 vector.
It has a lower memory overhead than a ByteString and and does not
contribute to heap fragmentation. It can be converted to or from a
ByteString (at the cost of copying the string data). It supports very few
other operations.
It is suitable for use as an internal representation for code that needs
to keep many short strings in memory, but it should not be used as an
interchange type. That is, it should not generally be used in public APIs.
The ByteString type is usually more suitable for use in interfaces; it is
more flexible and it supports a wide range of operations.
Memory overhead
With GHC, the memory overheads are as follows, expressed in words and in bytes (words are 4 and 8 bytes on 32 or 64bit machines respectively).
-
ByteStringunshared: 9 words; 36 or 72 bytes. -
ByteStringshared substring: 5 words; 20 or 40 bytes. -
ShortByteString: 4 words; 16 or 32 bytes.
For the string data itself, both ShortByteString and ByteString use
one byte per element, rounded up to the nearest word. For example,
including the overheads, a length 10 ShortByteString would take
16 + 12 = 28 bytes on a 32bit platform and 32 + 16 = 48 bytes on a
64bit platform.
These overheads can all be reduced by 1 word (4 or 8 bytes) when the
ShortByteString or ByteString is unpacked into another constructor.
For example:
data ThingId = ThingId {-# UNPACK #-} !Int
{-# UNPACK #-} !ShortByteString
This will take 1 + 1 + 3 words (the ThingId constructor +
unpacked Int + unpacked ShortByteString), plus the words for the
string data.
Heap fragmentation
With GHC, the ByteString representation uses pinned memory,
meaning it cannot be moved by the GC. This is usually the right thing to
do for larger strings, but for small strings using pinned memory can
lead to heap fragmentation which wastes space. The ShortByteString
type (and the Text type from the text package) use unpinned memory
so they do not contribute to heap fragmentation. In addition, with GHC,
small unpinned strings are allocated in the same way as normal heap
allocations, rather than in a separate pinned area.
Conversions
toShort :: ByteString -> ShortByteStringSource
O(n). Convert a ByteString into a ShortByteString.
This makes a copy, so does not retain the input string.
fromShort :: ShortByteString -> ByteStringSource
O(n). Convert a ShortByteString into a ByteString.
pack :: [Word8] -> ShortByteStringSource
O(n). Convert a list into a ShortByteString
unpack :: ShortByteString -> [Word8]Source
O(n). Convert a ShortByteString into a list.
Other operations
empty :: ShortByteStringSource
O(1). The empty ShortByteString.
null :: ShortByteString -> BoolSource
O(1) Test whether a ShortByteString is empty.
length :: ShortByteString -> IntSource
O(1) The length of a ShortByteString.
index :: ShortByteString -> Int -> Word8Source
O(1) ShortByteString index (subscript) operator, starting from 0.