0

I'm receiving a string from the server in the following format:

118|...message...215|...message2...

Basically, it's the message length followed by a pipe and the message itself, repeated for multiple messages. The message is encoded UTF16.

I'm looking for a way to parse this in Swift. I know I could cast this as NSString and use standard indexes/ranges on that because UTF16 is what NSString uses, but I'm wondering what is the Swift way to handle this? I can't seem to find a way to pull a substring out of a String based on a UTF16 encoding.

Update

I'm not trying to initialize a String with raw UTF16 Data (there's plenty of ways to do that). I already have the string, so I'm trying to take a String in the above format and parse it. The issue I have is that the message length given to me by the server is based on UTF16. I can't simply extract the length and call String.advance(messageLength) on the Index because the length I've been given doesn't match the grapheme clusters that Swift advances on. My issue is that I can't extract from the string the message in Swift. I have to instead cast it over to NSString and then use "normal" NSRange on it. My question is how do I pull the substring out by extracting a range based on my search for the first pipe, and then use the length provided by the parser in UTF16.

This is all extremely simple to do with NSString. Not sure how it can be done in pure Swift (or if it can be done).

6
  • Since the received information is really data bytes, convert that to a NSString with init?(bytes bytes: UnsafePointer<Void>, length len: Int, encoding encoding: UInt) which will be bridged to a Swift String. Commented Dec 7, 2015 at 19:55
  • Yes, I can bridge to NSString. I've already implemented that. But I'm want to see if it's possible to do it without relying on objective-c types... do it in Pure Swift. So far, doesn't seem like it's possible. I get that Swift is a somewhat new language, but it seems like something so basic as this parsing should be possible. Commented Dec 7, 2015 at 20:00
  • Here is some code to create a Swift string from UTF-16 bytes in "pure Swift": stackoverflow.com/questions/24542170/…. Commented Dec 7, 2015 at 20:10
  • @MartinR I wasn't asking how to init a String with UTF-16 data. Commented Dec 7, 2015 at 20:58
  • Is the input a Swift string or a C string or a byte sequence? Perhaps you can use the methods from stackoverflow.com/a/30404532/1187415 to convert a UTF-16 based index to a Swift String index? Commented Dec 7, 2015 at 21:15

2 Answers 2

3

Here is my take on parsing the messages out of the string. I had to change your lengths to work with the string.

let message = "13|...message...14|...message2..."
let utf16 = message.utf16
var startingIndex = message.utf16.startIndex
var travellingIndex = message.utf16.startIndex
var messages = [String]()
var messageLength: Int

while travellingIndex != message.utf16.endIndex {

    // Start walking through each character
    if let char = String(utf16[travellingIndex..<travellingIndex.successor()]) {

        // When we find the pipe symbol try to parse out the message length
        if char == "|" {
            if let stringNumber = Int(String(utf16[startingIndex..<travellingIndex])) {
                messageLength = stringNumber

                // We found the lenght, now skip the pipe character
                startingIndex = travellingIndex.successor()

                // move the travelingIndex to the end of the message
                travellingIndex = travellingIndex.advancedBy(messageLength)

                // get the message and put it into an array
                if let message = String(utf16[startingIndex...travellingIndex]) {
                    messages.append(message)
                    startingIndex = travellingIndex.successor()
                }
            }
        }
    }

    travellingIndex = travellingIndex.successor()
}

print(messages)

The output I get at the end is:

["...message...", "...message2..."]
Sign up to request clarification or add additional context in comments.

5 Comments

Hmm... ok I see, this is a pretty good approach. I simply need to remain within the UTF16 view. I'll give this a shot.
Nice! Although it does crash when the given length is longer than the string, which I tried to avoid. Shouldn't happen though, so +1
Granted there is no error handling in that code, but it gives the original questioner an idea of how to sub-string the UTF16 view of a Swift string.
+1 - simply adding travellingIndex = travellingIndex.advancedBy(messageLength, limit: endIndex) where let endIndex = utf16.endIndex.advance(-1) worked to check against overflow.
Also, this is more efficient: view[currentIndex] == 124 instead of creating a new string on each iteration. Note: 124 is the unicode decimal scalar for |.
0

The Foundation framework extends String to be initialisable from data:

import Foundation

let string = String(data: data, encoding: NSUTF16StringEncoding)

Getting rid of Foundation is not possible unless you implement the decoding yourself. Note that with Swift going open-source, Foundation is getting reimplemented without Objective-C dependency here.

EDIT: Thanks, Martin R, the link you provided is indeed working in pure Swift :D

EDIT:

There is the utf16 property of a String whose count property is the length in UTF16. Here is a simple parser for your purpose, efficiency isn't great, but it gets the job done:

func getMessages(var string: String) -> [String]? {

    func getMessage(string: String) -> (message: String, rest: String)? {
        guard let
            index = string.characters.indexOf("|"),
            length = Int(String(string.characters.prefixUpTo(index)))
        else { return nil }

        let msgRest = String(string.characters.suffixFrom(index.successor()))
        return (String(msgRest.utf16.prefix(length)), String(msgRest.utf16.dropFirst(length)))
    }

    var messages : [String] = []
    while let (message, rest) = getMessage(string) {
        string = rest
        messages.append(message)
    }
    return messages
}

func stringForMessages(messages: [String]) -> String {
    return messages.map{ "\($0.utf16.count)|\($0)" }.joinWithSeparator("")
}

let messages = [
    "123",
    "💆🏽💆🏽💆🏽",
    "🙉😇🎅🏿",
    "6🕴⚽️"
]

let string = stringForMessages(messages)

let received = getMessages(string)

messages // ["123", "💆🏽💆🏽💆🏽", "🙉😇🎅🏿", "6🕴⚽️"]

I actually tried making it more efficient, but Swift's String mechanics pushed against it.. I challenge anyone to create a beautiful efficient crash-safe parser for this..

4 Comments

Please re-look at the question. I'm not asking how to init a String with UTF16 data. I already have the String and I need to parse it.
Thanks for the answer. This does look like it would work. Unfortunately there is another answer that may be more efficient, since it only relies on tracking indexes and pulling out the string directly from a single UTF16view. I'm going to give both approaches a try and select the best one.
@AaronHayman I was going for the safe approach (doesn't crash, even when the given length is longer than the string). Also consider doing this stuff yourself in the future, StackOverflow isn't there to request parsers from others
I actually wasn't asking for a parser, although several people seem to want to provide it (definitely not complaining though). I'll be writing my own regardless of what people put here (I have Unit Tests to satisfy). I wasn't sure if there was a way to do what I needed. A simple answer stating: "You can manually iterate through the UTF16 view as an Array and extract the bytes using subscripting: view[startIndex...endIndex]" would have been a sufficient answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.