1

I needed to create a custom file format with embedded meta information. Instead of whipping up my own format I decide to just use Lua.

texture
{
   format=GL_LUMINANCE_ALPHA;
   type=GL_UNSIGNED_BYTE;
   width=256;
   height=128;
   pixels=[[
<binary-data-here>]];
}

texture is a function that takes a table as its sole argument. It then looks up the various parameters by name in the table and forwards the call on to a C++ routine. Nothing out of the ordinary I hope.

Occasionally the files fail to parse with the following error:

my_file.lua:8: unexpected symbol near ']'

What's going on here?
Is there a better way to store binary data in Lua?


Update

It turns out that storing binary data is a Lua string is non-trivial. But it is possible when taking care with 3 sequences.

  • Long-format-string-literals cannot have an embedded closing-long-bracket (]], ]=], etc).
    This one is pretty obvious.

  • Long-format-string-literals cannot end with something like ]== which would match the chosen closing-long-bracket.
    This one is more subtle. Luckily the script will fail to compile if done wrong.

  • The data cannot embed \n or \r.
    Lua's built in line-end processing messes these up. This problem is much more subtle. The script will compile fine but it will yield the wrong data. 0x13 => 0x10, 0x1013 => 0x10, etc.

To get around these limitations I split the binary data up on \r, \n, then pick a long-bracket that works, finally emit Lua that concats the various parts back together. I used a script that does this for me.
input: XXXX\nXX]]XX\r\nXX]]XX]=

texture
{
  --other fields omitted      
  pixels= '' ..
     [[XXXX]] ..
     '\n' ..
     [=[XX]]XX]=] ..
     '\r\n' ..
     [==[XX]]XX]=]==];
}
2
  • are you sure your binary data does not contain anything which could be read as ']]'? Commented Sep 28, 2010 at 17:18
  • I checked, the pixel data doesn't have a ']]' in it. Commented Sep 28, 2010 at 17:23

2 Answers 2

4

Lua is able to encode most characters in long bracket format including nulls. However, Lua opens the script file in text mode and this causes some problems. On my Windows system the following characters have problems:

Char code(s)      Problem
--------------    -------------------------------
13 (CR)           Is translated to 10 (LF)
13 10 (CR LF)     Is translated to 10 (LF)
26 (EOF)          Causes "unfinished long string near '<eof>'"

If you are not using windows than these may not cause problems, but there may be different text-mode based problems.


I was only able to produce the error you received by encoding multiple close brackets:

a=[[
]]] --> a.lua:2: unexpected symbol near ']'

But, this was easily fixed with the following:

a=[==[
]]==]
Sign up to request clarification or add additional context in comments.

2 Comments

That did it! thanks. Besides checking that there isn't a ]] in the binary data, I need to check that the data doesn't end with ]. When I find either of those I add some =s to the string delimiter.
Well, that almost did it. [==[ still produces an error when end of the data is ]==. So now I check that the first part of the delimiter(]=*) doesn't show up at the end of the binary data. When it does I keep adding =s until ]={n}] isn't found in the string and ]={n} doesn't show up at the end.
1

The binary data needs to be encoded into printable characters. The simplest method for decoding purposes would be to use C-like escape sequences for all bytes. For example, hex bytes 13 41 42 1E would be encoded as '\19\65\66\30'. Of course, then the encoded data is three to four times larger than the source binary.

Alternatively, you could use something like Base64, but that would have to be decoded at runtime instead of relying on the Lua interpreter. Personally, I'd probably go the Base64 route. There are Lua examples of Base64 encoding and decoding.

Another alternative would be have two files. Use a well defined image format file (e.g. TGA) that is pointed to by a separate Lua script with the additional metadata. If you don't want two files to move around then they could be combined in an archive.

6 Comments

I chose to use the Lua raw strings after reading the remark that they allow for embedded NULLs. NULL isn't printable so I assumed it was allowed to embed the others as well. Usually (>90%) it works. quotes from §2.1 of the manual, "Strings in Lua can contain any 8-bit value, including embedded zeros" and speaking of long bracket form strings "They can contain anything except a closing bracket of the proper level."
The former quote specifically refers to escaped values (i.e. \ddd) while the latter probably assumes use of a text editor--and thus exclusive use of printable characters--to generate the source code.
How are you putting the binary data into the script file?
With python. I could do it with any programming language. My script accepts the image file on the command line. Grabs and filters the image data. Writes the lua code up to the [[\n then dumps the image data. Finally it writes the closing ]];\n}
I don't know this for sure, but I imagine that the Lua interpreter chokes on non-printable characters. Yes, Lua strings can contain non-printable characters, but that doesn't mean the Lua interpreter can. I would still advise using a text encoding and then decode into a binary at runtime.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.