This document covers some of the design decisions and internals of the tool. To get started with the bindings, see the usage documentation.
Rz-pipe, the currently recommended way to script Rizin, only works with commands exposed to the Rizin shell. Although it can do everything the Rizin shell can, it cannot match the full Rizin C API in performance, feature-completeness, or type guarantees. The C API on the other hand, is more difficult to work with, especially for one-off scripts. Rz-bindgen seeks to be a middle-ground, making the C API accessible from other programming languages. Python is the primary target for rz-bindgen, as it is usable for both scripts and plugins, and has been incorporated successfully in other reverse-engineering tools.
Many languages already have tools for creating bindings to C/C++, such as rust-bindgen for Rust or CLIF for Python. However, these tools often rely on mapping C++ constructs to their own, and require extra work to create idiomatic bindings for plain C code. Like many of these tools, rz-bindgen parses C headers and generates bindings as output. However, rz-bindgen targets one project and multiple languages, rather than one language and multiple projects. This allows rz-bindgen to make use of Rizin-specific annotations, such as the
RZ_DEPRECATE C macros.
See this post on the Rizin blog for more details on the thought process behind my proposal and my implementation ideas from before I started the task.
I considered my primary options for parsing the C headers to be tree-sitter and libclang. Even though I wrote about tree-sitter in the Rizin GSoC announcement blogpost, the integrated preprocessor and semantic analysis led me to choose libclang’s Python bindings.
C Structs and Functions
Once a header is parsed, C data structures are grouped with functions that operate on them. In this snippet from rz-bindgen, the
RzAnalysis struct from the
rz_analysis.h header is grouped with functions that have the
rz_analysis_ prefix. In the generated Python bindings, these groupings are mapped to object-oriented classes, with the
RzAnalysis class containing the grouped functions as its methods. The
RzAnalysis class also makes all the fields of the C struct accessible except for
leaddrs (which is ignored as per the
ignore_fields argument) and
type_links (which is renamed as per the
rz_analysis = Class(
Rz-bindgen is designed to support multiple backends to generate bindings for a variety of languages. A backend takes the
Class objects created in the transformation step and generates output. There are, at the time of writing, a SWIG backend and a Sphinx backend.
The SWIG backend is currently only used for Python bindings, but SWIG targets other languages too, such as Java and OCaml. Supporting them in rz-bindgen should be relatively simple. The Sphinx backend generates documentation for the Python bindings and can be viewed here.
One of the main challenges in translating the C headers was the existence of generic container types. Rizin uses types like
RzVector to represent a linked-list and dynamic array respectively and, being written in C, uses
void* for the type of the data contained within. This means that trying to use these types from Python would be difficult, as their elements lack the type information to generate methods. Fortunately, Rizin developers were already annotating the types of these functions for developer ergonomics using comments such as
RzList /*<RzAnalysisBlock *>*/ *bbs.
This allows bindings to use container types in a type-safe manner. In this Python example from rz-bindgen, a specialized
RzList_RzBinSymbol is created, and
RzBinSymbols are appended to it. Appending any other type will result in an error.
syms = rizin.RzList_RzBinSymbol()
for sym in self.loader.main_object.symbols:
binsym = rizin.RzBinSymbol()
binsym.thisown = False
binsym.name = sym.name
binsym.type = rizin.RZ_BIN_TYPE_FUNC_STR
binsym.paddr = sym.linked_addr
binsym.vaddr = sym.rebased_addr
binsym.size = sym.size
The snippet above is from an example of implementing an
RzBinPlugin in Python. See the bin_plugin documentation for more details.
The Python bindings also make it easier to access Rizin internals when writing scripts, as can be seen in the rz_cmd example (see the cmd documentation for more details). One key feature is the ability to register a Rizin command backed by a Python function, like so:
def print_function_info(fn: rizin.RzAnalysisFunction):
print("number of xrefs from:", len(fn.get_xrefs_from()))
print("number of xrefs to:", len(fn.get_xrefs_to()))
core.register_group("u", "A custom group for user-defined commands")
The Rizin plugin registers Python as an
RzLang, allowing users to load Python scripts on the fly. It also adds a
core variable to the
rizin Python module, allowing scripts that import it to access Rizin’s own
The coverage of the bindings is currently lacking - it is not yet possible to use every bit of the C API. I hope this will change as I get more eyes on the project. I also hope to improve the Rizin plugin and finalize the Cutter plugin.
In the long term, I hope to add bindings for extensions such as rz-ghidra which expose their functions. This could allow access to Ghidra’s P-Code and decompiler once implemented.
If you need help with rz-bindgen or wish to build a project using the generated bindings, feel free to reach me on the Rizin mattermost @wingdeans (we have an IRC bridge too).