Hello. I’m wingdeans, a participant of GSoC 2022 with Rizin. For the past few months, I’ve been working on creating rz-bindgen - a framework for making Rizin scriptable from other languages.

This document covers some of the design decisions and internals of the tool. To get started with the bindings, see the usage documentation.

Rationale

Rz-pipe, the currently recommended way to script Rizin, only works with commands exposed to the Rizin shell. Although it can do everything the Rizin shell can, it cannot match the full Rizin C API in performance, feature-completeness, or type guarantees. The C API on the other hand, is more difficult to work with, especially for one-off scripts. Rz-bindgen seeks to be a middle-ground, making the C API accessible from other programming languages. Python is the primary target for rz-bindgen, as it is usable for both scripts and plugins, and has been incorporated successfully in other reverse-engineering tools.

Design

Many languages already have tools for creating bindings to C/C++, such as rust-bindgen for Rust or CLIF for Python. However, these tools often rely on mapping C++ constructs to their own, and require extra work to create idiomatic bindings for plain C code. Like many of these tools, rz-bindgen parses C headers and generates bindings as output. However, rz-bindgen targets one project and multiple languages, rather than one language and multiple projects. This allows rz-bindgen to make use of Rizin-specific annotations, such as the RZ_NULLABLE and RZ_DEPRECATE C macros.

See this post on the Rizin blog for more details on the thought process behind my proposal and my implementation ideas from before I started the task.

Implementation

I considered my primary options for parsing the C headers to be tree-sitter and libclang. Even though I wrote about tree-sitter in the Rizin GSoC announcement blogpost, the integrated preprocessor and semantic analysis led me to choose libclang’s Python bindings.

C Structs and Functions

Once a header is parsed, C data structures are grouped with functions that operate on them. In this snippet from rz-bindgen, the RzAnalysis struct from the rz_analysis.h header is grouped with functions that have the rz_analysis_ prefix. In the generated Python bindings, these groupings are mapped to object-oriented classes, with the RzAnalysis class containing the grouped functions as its methods. The RzAnalysis class also makes all the fields of the C struct accessible except for leaddrs (which is ignored as per the ignore_fields argument) and type_links (which is renamed as per the rename_fields argument).

rz_analysis = Class(
    analysis_h,
    typedef="RzAnalysis",
    ignore_fields={"leaddrs"},
    rename_fields={"type_links": "_type_links"},
)

rz_analysis.add_method("rz_analysis_reflines_get", rename="get_reflines")
rz_analysis.add_prefixed_methods("rz_analysis_")
rz_analysis.add_prefixed_funcs("rz_analysis_")

Generation

Rz-bindgen is designed to support multiple backends to generate bindings for a variety of languages. A backend takes the Class objects created in the transformation step and generates output. There are, at the time of writing, a SWIG backend and a Sphinx backend.

The SWIG backend is currently only used for Python bindings, but SWIG targets other languages too, such as Java and OCaml. Supporting them in rz-bindgen should be relatively simple. The Sphinx backend generates documentation for the Python bindings and can be viewed here.

Generics

One of the main challenges in translating the C headers was the existence of generic container types. Rizin uses types like RzList and RzVector to represent a linked-list and dynamic array respectively and, being written in C, uses void* for the type of the data contained within. This means that trying to use these types from Python would be difficult, as their elements lack the type information to generate methods. Fortunately, Rizin developers were already annotating the types of these functions for developer ergonomics using comments such as RzList /*<RzAnalysisBlock *>*/ *bbs.

This allows bindings to use container types in a type-safe manner. In this Python example from rz-bindgen, a specialized RzList_RzBinSymbol is created, and RzBinSymbols are appended to it. Appending any other type will result in an error.

syms = rizin.RzList_RzBinSymbol()

for sym in self.loader.main_object.symbols:
    binsym = rizin.RzBinSymbol()
    binsym.thisown = False
    binsym.name = sym.name
    binsym.type = rizin.RZ_BIN_TYPE_FUNC_STR
    binsym.paddr = sym.linked_addr
    binsym.vaddr = sym.rebased_addr
    binsym.size = sym.size
    syms.append(binsym)

Additional Features

The snippet above is from an example of implementing an RzBinPlugin in Python. See the bin_plugin documentation for more details.

The Python bindings also make it easier to access Rizin internals when writing scripts, as can be seen in the rz_cmd example (see the cmd documentation for more details). One key feature is the ability to register a Rizin command backed by a Python function, like so:

def print_function_info(fn: rizin.RzAnalysisFunction):
    print("name:", fn.name)
    print("number of xrefs from:", len(fn.get_xrefs_from()))
    print("number of xrefs to:", len(fn.get_xrefs_to()))
    return True
core.register_group("u", "A custom group for user-defined commands")
core.register_command("uf", print_function_info)

The Rizin plugin registers Python as an RzLang, allowing users to load Python scripts on the fly. It also adds a core variable to the rizin Python module, allowing scripts that import it to access Rizin’s own RzCore.

Reflections

The coverage of the bindings is currently lacking - it is not yet possible to use every bit of the C API. I hope this will change as I get more eyes on the project. I also hope to improve the Rizin plugin and finalize the Cutter plugin.

In the long term, I hope to add bindings for extensions such as rz-ghidra which expose their functions. This could allow access to Ghidra’s P-Code and decompiler once implemented.

I would like to thank my GSoC mentors XVilka and megabeets, as well as Rizin core contributors ret2libc and deroad.

If you need help with rz-bindgen or wish to build a project using the generated bindings, feel free to reach me on the Rizin mattermost @wingdeans (we have an IRC bridge too).