3

I'm trying to parse a crg-file in C#. The file is mixed with plain text and binary data. The first section of the file contains plain text while the rest of the file is binary (lots of floats), here's an example:

$
$ROAD_CRG
reference_line_start_u   =  100
reference_line_end_u     =  120
$
$KD_DEFINITION
#:KRBI
U:reference line u,m,730.000,0.010
D:reference line phi,rad
D:long section 1,m
D:long section 2,m
D:long section 3,m
...
$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
�@z����RA����\�l
...

I know I can read bytes starting at a specific offset but how do I find out which byte to start from? The last row before the binary section will always contain at least four dollar signs "$$$$". Here's what I've got so far:

using var fs = new FileStream(@"crg_sample.crg", FileMode.Open, FileAccess.Read);

var startByte = ??; // How to find out where to start?

using (BinaryReader reader = new BinaryReader(fs))
{
    reader.BaseStream.Seek(startByte, SeekOrigin.Begin);
    var f = reader.ReadSingle();
    Debug.WriteLine(f);
}
6
  • 1
    Hook up a StreamReader with the leaveOpen constructor parameter set to true. Then you can simply read lines until you've seen the separator line and start using the BinaryReader, as the stream will be positioned correctly. Alternatively, of course, you can use the BinaryReader to hunt for four consecutive 0x24 bytes ($), then read to the next newline, then start reading floats (i.e. implement your own little state machine) but that's more complicated. Commented Jan 20, 2020 at 15:28
  • @JeroenMostert discussed more on Troy's answer, but: I don't think that works here Commented Jan 20, 2020 at 15:46
  • @MarcGravell: Yeah, obvious in hindsight when you take buffering into account, but also lame because it would not be particularly complicated for StreamReader to support this scenario anyway (given that we have a seekable stream). In the general case with buffered forward-only access you do have to get more complicated, of course. Commented Jan 20, 2020 at 15:49
  • OpenCRG doesn't seem to be very open or widespread - just 5 Github repos and only Mercedes is actually using this. Have you checked the C code? You may be able to replicate what it does using byte buffers or Span<byte>. Perhaps even map bytes to structs directly. Or you could use System.IO.Pipelines for an API that allows you to move back and forth in the file. You have to treat that files as binary though Commented Jan 20, 2020 at 16:02
  • You'll have to handle the file differently before and after the $$$$$ lines. Is that separator well defined? If so, you can use one parser for everything up to it and a completely different one for the rest of the file. Commented Jan 20, 2020 at 16:03

3 Answers 3

3

When you have a mixture of text data and binary data, you need to treat everything as binary. This means you should be using raw Stream access, or something similar, and using binary APIs to look through the text data (often looking for cr/lf/crlf at bytes as sentinels, although it sounds like in your case you could just look for the $$$$ using binary APIs, then decode the entire block before, and scan forwards). When you think you have an entire line, then you can use Encoding to parse each line - the most convenient API being encoding.GetString(). When you've finished looking through the text data as binary, then you can continue parsing the binary data, again using the binary API. I would usually recommend against BinaryReader here too, because frankly it doesn't gain you much over more direct API. The other problem you might want to think about is CPU endianness, but assuming that isn't a problem: BitConverter.ToSingle() may be your friend.

If the data is modest in size, you may find it easiest to use byte[] for the data; either via File.ReadAllBytes, or by renting an oversized byte[] from the array-pool, and loading it from a FileStream. The Stream API is awkward for this kind of scenario, because once you've looked at data: it has gone - so you need to maintain your own back-buffers. The pipelines API is ideal for this, when dealing with large data, but is an advanced topic.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for info and clues on how to move forward, it seems like this seemingly simple task isn't that simple after all. A question: by pipelines API, do you mean the built in System.IO.Pipelines?
@Johan yes; I have a multi-part blog on it - starts here - the later sections have more code to look at
@Johan emphasis: there's no need to look at pipelines here unless the data is big; for modest data - just use byte[], it'll be tons easier
Allright, what I know is that the plain text part will never be very large but the following binary could reach a pretty big size (up to GBs)
@Johan hmm; in that scenario, my next step would probably be to try using a memory mapped file with the block size at least as big as the largest expected first frame; perhaps using this, but: I'm familiar (some would say too familiar) with the pipelines API, which can be daunting if you haven't touched it before. You could do some kind of "read from the stream byte by byte until you have all the text block, then decode the text block and continue the rest..."
|
0

UPDATE: This code may not work as expected. Please review the valuable information in the comments.

using (var fs = new FileStream(@"crg_sample.crg", FileMode.Open, FileAccess.Read))
{
    using (StreamReader sr = new StreamReader(fs, Encoding.ASCII, true, 1, true))
    {
        var line = sr.ReadLine();
        while (!string.IsNullOrWhiteSpace(line) && !line.Contains("$$$$"))
        {
            line = sr.ReadLine();
        }
    }
    using (BinaryReader reader = new BinaryReader(fs))
    {
        // TODO: Start reading the binary data
    }
}

9 Comments

this code is probably broken; StreamReader over-reads, so you will have dropped data on the floor
@MarcGravell: that's a bummer. It's a shame it doesn't at least have a property for the virtual position of the stream so you could record/reset it manually.
This is not working since StreamReader jumps "bufferSize" amount of bytes (which is independent of the line length) and will position the reader too far ahead.
@JeroenMostert FWIW, that's one of the things I love about the "pipelines" API (which I've now added into my answer); it would be pretty trivial to parse this as a ReadOnlySequence<byte>; but: not many people are familiar with that API
@MarcGravell would this code work if the buffer of the StreamReader were set to 1? If so would there be a significant negative impact on performance?
|
0

Solution

I know this is far from the most optimized solution but in my case it did the trick and since the plain text section of the file was known to be fairly small this didn't cause any noticable performance issues. Here's the code:

using var fileStream = new FileStream(@"crg_sample.crg", FileMode.Open, FileAccess.Read);
using var reader = new BinaryReader(fileStream);

var newLine = '\n';
var markerString = "$$$$";
var currentString = "";

var foundMarker = false;
var foundNewLine = false;

while (!foundNewLine)
{
    var c = reader.ReadChar();

    if (!foundMarker)
    {
        currentString += c;

        if (currentString.Length > markerString.Length)
            currentString = currentString.Substring(1);

        if (currentString == markerString)
            foundMarker = true;
    }
    else
    {
        if (c == newLine)
            foundNewLine = true;
    }
}

if (foundNewLine)
{
    // Read binary
}

Note: If you're dealing with larger or more complex files you should probably take a look at Mark Gravell's answer and the comment sections.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.