A lot of my work involves storing and processing large amounts of signal data from sensor systems. This usually means getting fixed point samples from some data acquisition gear, converting them to floating point, and storing them in files. I have two options - store the data in text files or binary files. Text files are human readable, but they’re too big and slow to work with. Binary files are my only practical option. As a result, any programming language I use must provide a simple way to serialize and deserialize data structures to binary files. I’ll show in this article why Zig is one of those languages.
My solution is in this GitHub Gist. To be clear, this is not rocket science. My only goal is to determine whether Zig is something I could use day-to-day.
Handling Binary Files in C
Call me old fashioned, but I really like the way C handles binary serialization. Let’s say you want to write an array to a binary file. No problem.
fwrite(array, sizeof(array[0]), sizeof(array)/sizeof(array[0]), fid);
What about a struct? Piece of cake.
fwrite(&struct_instance, sizeof(MyStruct_t), 1, fid);
In C, it’s extremely easy to convert between pointer types. And if you don’t want to do the cast yourself, the compiler can implicity do it for you. It assumes you know what you’re doing and gets out of the way.
Pointers in Zig
Zig, like many other modern systems languages, discourages you from using raw pointers. At least that’s the impression I get when reading over the documentation. Don’t get me wrong, it still has raw pointers. It just tries really hard to get you to use other data structures like arrays, many-item pointers, and slices.
Arrays and pointers are not interchangeable like they are in C. Case in point, the Zig compiler won’t let you pass an array to a function that expects a pointer. Implicit type conversion (aka coercion) is a thing, but the rules are stricter than they are in C. It’s worth looking over the coercion rules in the documentation:
A many-item pointer points to an undisclosed number of things - like a pointer to a C array element. If you want an address and a length, use a slice. Zig really likes slices. And for good reason. It gives the compiler a way to enforce bounds checking.
C doesn’t have any builtin bounds checking. If you want it, you’ve got to implement it yourself. Take a look at those C code snippets again. Together, the second and third function arguments determine how many bytes of data past the pointer you want to write. You can put any non-negative numbers in there. The compiler doesn’t check whether it makes sense of not. It’s up to you. Depending only your application, this can lead to some seriously buggy code.
Array Serialization Example
Ok, let’s get down to business. Like I said earlier, one of things I do all the time is write arrays of floating point values to binary files. Zig has a function in its standard library for this, write
. The problem is that it only writes byte slices. So the first thing I had to do was figure out how to convert slices of floats to slices of bytes. After working through the pointer documentation for awhile, I decided to use Zig’s comptime
feature to generalize it to any slice. Here’s what I settled on:
fn transmuteSlice(comptime T: type, x: []T) []u8 {
const num_bytes = @sizeOf(T) * x.len;
const x0: [*]u8 = @ptrCast(x);
const x1: [ ]u8 = x0[0..num_bytes];
return x1;
}
comptime T: type
- The first function argument must be a type known at compile time.@sizeOf(T)
- compiler builtin that returns the size of the compile time typeT
, in bytes.@ptrCast(x)
- compiler builtin that convertsx
’s internal many-item pointer[*]T
to a[*]u8
. It can handle many other casts too.x0[0..num_bytes]
- turns the many-item pointer into a slice.
I stole the name “transmute” from Odin (https://odin-lang.org/), another powerful systems programming language. It captures exactly what the function is doing:
transmute: to change or alter in form, appearance, or nature.
transmuteSlice
isn’t creating a new slice, it’s reinterpreting the memory as a []u8
. The underlying memory is the same.
Let’s see it in action. Here’s a program that write chunks of floating point values to a binary file. To verify that the file was written correctly, I read the data back in from the file and compared it to the original “parent” array I used for the writes. Zig’s read
standard library implementation is like write
, it only accepts byte slices.
|
|
This is not a Zig tutorial, so I’m not going to explain this program in gory detail. To be honest, I bet you can figure it out yourself - Zig is very readable. The loop syntax and error handling mechanisms (try
and !
) are a bit unusual, but you get used to them. Hopefully the comments I put in the code help out too.
I did want to point out a few things.
- Line 22 - The second argument to
createFile
is a struct with default values. Even though the default values are correct, something needs to be passed in. Unlike Python, Zig doesn’t have optional function arguments. - Line 27,29 - This is how you invoke
transmuteSlice
. - Line 46 - The second and third arguments to
mem.eql
should be slices, but I’m passing in pointers to arrays:*[80]f32
. Pointers to arrays coerce to slices!
Struct Serialization
What about serializing structs? I could do some pointer casting and use transmuteSlice
:
const MyStruct = struct {a: u8, b: u8};
const s0 = MyStruct {.a = 1, .b = 2};
const s1: *MyStruct = @constCast(&s0);
const s2: [*] MyStruct = @ptrCast(s1);
const s3: [ ] MyStruct = s2[0..1];
_ = transmuteSlice(MyStruct, s3);
Or I could implement a function that transmutes generic pointers:
fn transmutePtr(comptime T: type, x: *T) []u8 {
const num_bytes = @sizeOf(T);
const x0: [*]u8 = @ptrCast(x);
const x1: [ ]u8 = x0[0..num_bytes];
return x1;
}
Note that transmutePtr
and transmuteSlice
look nearly identicle. Besides the function signature, the only difference is how num_bytes
is calculated. I can probably use Zig’s metaprogramming features (with anytype
) to get this down to a single function. I’ll save that for another day.
Conclusion
The purpose of this post was to figure out a simple way to serialize data structures in Zig. I did, and the solution is similar in spirit to what you’d do in C. But is the added complexity worth it? I don’t think the code I wrote can be as fast as an equivalant C version. You’d think the extra pointer conversions would produce more CPU instructions - and more instructions mean a longer runtime. This is just speculation - I haven’t checked the assembly.
The question is how many instructions am I willing to pay for a better typechecker and a few modern conveniences? The only way to make a fair assessment is to continue dabbling.
All that being said, I really like Zig so far. It feels a little bit like Python, just a heck of a lot faster and more powerful.
Here are a few other attractive language features:
- No header files
- SIMD vector programming
- Flexible cross-compilation
- Errors as values
- Compile-time reflection
- Powerful C-interop
- Integrated build system
- Custom allocators
Thanks for reading.