Introduction
When you want to make your code run significantly faster, or just want to explore how computers work at a lower level, you might find yourself curious about writing instructions directly for the CPU. In Crystal, you can do this using inline assembly.
Crystal is a programming language built on top of the LLVM compiler infrastructure. Thanks to this, it can access many of LLVM's powerful features. For low-level programming, Crystal provides both Intrinsic
functions and the asm
syntax.
The asm
Syntax
Crystal supports writing inline assembly using the asm
keyword.
You can find the official documentation here.
The basic syntax is:
asm("template" : outputs : inputs : clobbers : flags)
-
template
— Assembly code using LLVM’s integrated assembler syntax -
outputs
— Output operands -
inputs
— Input operands -
clobbers
— Registers that may be modified -
flags
— Optional flags (e.g.,"intel"
)
This colon-separated syntax is quite unusual in Crystal and comes from GCC's inline assembly syntax.
Let’s look at some examples.
NOP Instruction
asm("nop")
Setting a Value Using an Output Operand
dst = uninitialized Int32
asm("mov $$10, $0" : "=r"(dst))
puts dst # => 10
Note that $$10
is an immediate literal value, and $0
is a placeholder for the output operand.
Using uninitialized Int32
is optional; initializing with dst = 0
works as well.
Using an Input Operand
src = 10
dst = 0
asm("mov $1, $0" : "=r"(dst) : "r"(src))
puts dst # => 10
Using Multiple Input Operands
a = 10
b = 20
c = uninitialized Int32
asm("add $2, $0" : "=r"(c) : "0"(a), "r"(b))
puts c # => 30
Using Multiple Output Operands
dst1 = uninitialized Int32
dst2 = uninitialized Int32
asm("
mov $$10, $0
mov $$20, $1" : "=r"(dst1), "=r"(dst2))
puts dst1
puts dst2
Using Intel Syntax
You can also use Intel-style syntax:
dst = uninitialized Int32
asm("mov dword ptr [$0], 10" :: "r"(pointerof(dst)) :: "intel")
puts dst
Intrinsics
For relatively simple operations, LLVM provides intrinsics. These functions are highly optimized, platform-independent, and often compatible with Crystal’s interpreter. However, for most basic operations, Crystal's standard library already provides efficient implementations, so using intrinsics does not always yield performance benefits.
Available intrinsics are defined in the Intrinsics
module.
Common Intrinsic Functions
memcpy
— Copy memory
src = Slice(UInt8).new(10) { |i| i.to_u8 }
dest = Slice(UInt8).new(10, 0_u8)
Intrinsics.memcpy(dest, src, 10, is_volatile: false)
puts "Copied: #{dest}"
memmove
— Move memory with overlap support
buffer = Slice(UInt8).new(10) { |i| i.to_u8 }
Intrinsics.memmove(buffer.to_unsafe + 3, buffer.to_unsafe, 5, is_volatile: false)
puts "Moved: #{buffer}"
memset
— Initialize memory
buffer = Slice(UInt8).new(10, 0_u8)
Intrinsics.memset(buffer, 0xFF_u8, 10, is_volatile: false)
puts "Set: #{buffer}"
debugtrap
— Trigger debugger trap
Intrinsics.debugtrap
pause
— CPU pause (works on x86/x64 and AArch64)
Intrinsics.pause
This is often used internally in Crystal’s Mutex
or SpinLock
implementations.
read_cycle_counter
— Read the CPU cycle counter
cycles = Intrinsics.read_cycle_counter
puts "Cycles: #{cycles}"
To observe it in action:
loop do
cycles = Intrinsics.read_cycle_counter
puts "Cycles: #{cycles}"
sleep 1.second
end
Bit Manipulation Intrinsics
Bit Reversal
-
bitreverse8
,bitreverse16
,bitreverse32
,bitreverse64
,bitreverse128
value = 0b1101001_u8
result = Intrinsics.bitreverse8(value)
puts "Reversed: #{result.to_s(2)}" # => 10010110
Byte Swap
-
bswap16
,bswap32
,bswap64
,bswap128
value = 0x12345678_u32
result = Intrinsics.bswap32(value)
puts "Swapped: 0x#{result.to_s(16)}" # => 0x78563412
Population Count
-
popcount8
,popcount16
,popcount32
,popcount64
,popcount128
value = 0b11010110_i32
count = Intrinsics.popcount32(value)
puts "Bit count: #{count}" # => 5
Count Leading Zeros
-
countleading8
,countleading16
,countleading32
,countleading64
,countleading128
value = 0b00001111_i32
count = Intrinsics.countleading32(value, false)
puts "Leading zeros: #{count}" # => 4
Count Trailing Zeros
-
counttrailing8
,counttrailing16
,counttrailing32
,counttrailing64
,counttrailing128
value = 0b11110000_i32
count = Intrinsics.counttrailing32(value, false)
puts "Trailing zeros: #{count}" # => 4
Conclusion
Crystal still lacks extensive documentation in many languages, but DeepWiki is a reliable source for answers to most questions. This article is based on what I’ve learned from DeepWiki, and all code examples have been tested to ensure they work correctly. I highly recommend it.
That’s all for now — happy hacking with Crystal!
This post was translated from Japanese to English by ChatGPT.
Click here to see the original post.
Top comments (0)