DEV Community

kojix2
kojix2

Posted on

Writing Inline Assembly in the Crystal Programming Language

Introduction

When you want to make your code run significantly faster, or just want to explore how computers work at a lower level, you might find yourself curious about writing instructions directly for the CPU. In Crystal, you can do this using inline assembly.

Crystal is a programming language built on top of the LLVM compiler infrastructure. Thanks to this, it can access many of LLVM's powerful features. For low-level programming, Crystal provides both Intrinsic functions and the asm syntax.

The asm Syntax

Crystal supports writing inline assembly using the asm keyword.

You can find the official documentation here.

The basic syntax is:

asm("template" : outputs : inputs : clobbers : flags)
Enter fullscreen mode Exit fullscreen mode
  • template — Assembly code using LLVM’s integrated assembler syntax
  • outputs — Output operands
  • inputs — Input operands
  • clobbers — Registers that may be modified
  • flags — Optional flags (e.g., "intel")

This colon-separated syntax is quite unusual in Crystal and comes from GCC's inline assembly syntax.

Let’s look at some examples.

NOP Instruction

asm("nop")
Enter fullscreen mode Exit fullscreen mode

Setting a Value Using an Output Operand

dst = uninitialized Int32

asm("mov $$10, $0" : "=r"(dst))

puts dst  # => 10
Enter fullscreen mode Exit fullscreen mode

Note that $$10 is an immediate literal value, and $0 is a placeholder for the output operand.

Using uninitialized Int32 is optional; initializing with dst = 0 works as well.

Using an Input Operand

src = 10
dst = 0

asm("mov $1, $0" : "=r"(dst) : "r"(src))

puts dst  # => 10
Enter fullscreen mode Exit fullscreen mode

Using Multiple Input Operands

a = 10
b = 20
c = uninitialized Int32

asm("add $2, $0" : "=r"(c) : "0"(a), "r"(b))

puts c  # => 30
Enter fullscreen mode Exit fullscreen mode

Using Multiple Output Operands

dst1 = uninitialized Int32
dst2 = uninitialized Int32

asm("
  mov $$10, $0
  mov $$20, $1" : "=r"(dst1), "=r"(dst2))

puts dst1
puts dst2
Enter fullscreen mode Exit fullscreen mode

Using Intel Syntax

You can also use Intel-style syntax:

dst = uninitialized Int32

asm("mov dword ptr [$0], 10" :: "r"(pointerof(dst)) :: "intel")

puts dst
Enter fullscreen mode Exit fullscreen mode

Intrinsics

For relatively simple operations, LLVM provides intrinsics. These functions are highly optimized, platform-independent, and often compatible with Crystal’s interpreter. However, for most basic operations, Crystal's standard library already provides efficient implementations, so using intrinsics does not always yield performance benefits.

Available intrinsics are defined in the Intrinsics module.

Common Intrinsic Functions

memcpy — Copy memory

src = Slice(UInt8).new(10) { |i| i.to_u8 }
dest = Slice(UInt8).new(10, 0_u8)

Intrinsics.memcpy(dest, src, 10, is_volatile: false)

puts "Copied: #{dest}"
Enter fullscreen mode Exit fullscreen mode

memmove — Move memory with overlap support

buffer = Slice(UInt8).new(10) { |i| i.to_u8 }

Intrinsics.memmove(buffer.to_unsafe + 3, buffer.to_unsafe, 5, is_volatile: false)

puts "Moved: #{buffer}"
Enter fullscreen mode Exit fullscreen mode

memset — Initialize memory

buffer = Slice(UInt8).new(10, 0_u8)

Intrinsics.memset(buffer, 0xFF_u8, 10, is_volatile: false)

puts "Set: #{buffer}"
Enter fullscreen mode Exit fullscreen mode

debugtrap — Trigger debugger trap

Intrinsics.debugtrap
Enter fullscreen mode Exit fullscreen mode

pause — CPU pause (works on x86/x64 and AArch64)

Intrinsics.pause
Enter fullscreen mode Exit fullscreen mode

This is often used internally in Crystal’s Mutex or SpinLock implementations.

read_cycle_counter — Read the CPU cycle counter

cycles = Intrinsics.read_cycle_counter

puts "Cycles: #{cycles}"
Enter fullscreen mode Exit fullscreen mode

To observe it in action:

loop do
  cycles = Intrinsics.read_cycle_counter
  puts "Cycles: #{cycles}"
  sleep 1.second
end
Enter fullscreen mode Exit fullscreen mode

Bit Manipulation Intrinsics

Bit Reversal

  • bitreverse8, bitreverse16, bitreverse32, bitreverse64, bitreverse128
value = 0b1101001_u8
result = Intrinsics.bitreverse8(value)

puts "Reversed: #{result.to_s(2)}"  # => 10010110
Enter fullscreen mode Exit fullscreen mode

Byte Swap

  • bswap16, bswap32, bswap64, bswap128
value = 0x12345678_u32
result = Intrinsics.bswap32(value)

puts "Swapped: 0x#{result.to_s(16)}"  # => 0x78563412
Enter fullscreen mode Exit fullscreen mode

Population Count

  • popcount8, popcount16, popcount32, popcount64, popcount128
value = 0b11010110_i32
count = Intrinsics.popcount32(value)

puts "Bit count: #{count}"  # => 5
Enter fullscreen mode Exit fullscreen mode

Count Leading Zeros

  • countleading8, countleading16, countleading32, countleading64, countleading128
value = 0b00001111_i32
count = Intrinsics.countleading32(value, false)

puts "Leading zeros: #{count}"  # => 4
Enter fullscreen mode Exit fullscreen mode

Count Trailing Zeros

  • counttrailing8, counttrailing16, counttrailing32, counttrailing64, counttrailing128
value = 0b11110000_i32
count = Intrinsics.counttrailing32(value, false)

puts "Trailing zeros: #{count}"  # => 4
Enter fullscreen mode Exit fullscreen mode

Conclusion

Crystal still lacks extensive documentation in many languages, but DeepWiki is a reliable source for answers to most questions. This article is based on what I’ve learned from DeepWiki, and all code examples have been tested to ensure they work correctly. I highly recommend it.

That’s all for now — happy hacking with Crystal!


This post was translated from Japanese to English by ChatGPT.
Click here to see the original post.

Top comments (0)