TL;DR Jump to the Ideas list.

Introduction

This year is the second time we participate as a fork - Rizin, effectively continuing the tradition since the year 2015 (as the radare2 project).

Mentors

Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’22. They were already guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them in case you need any help in selecting a project.

  • Anton Kochkov Mattermost: xvilka – @akochkov
  • Riccardo Schirone Mattermost: ret2libc
  • Florian Märkl Mattermost/Telegram: @thestr4ng3r – @thestr4ng3r
  • Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_
  • Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad
  • Yossi Zapesochini Mattermost/Telegram: @yossizap
  • And many others

Development methodology

Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don’t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we’re using ASCIInema for recording the sessions.

See also our guides for corresponding projects:

For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.

License

Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.

Instructions for participants

It is a requirement that participants who want to apply to the Rizin project for the Google Summer of Code 2022 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks. To help participants to understand how to contribute to the project there are issues marked as “good first issue” for both Rizin and Cutter.

Programming languages

Most of Rizin is written in C (conforming C99 standard) and hence we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.

  1. Read Google’s instructions for participating
  2. Grab any of the project from the list of ideas that you’re interested in (or propose your own).
  3. Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you.
  4. Submit it using Google’s web interface.

Participant proposal guidelines

  1. Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing.
  2. Try to split the entire GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it’ll help you to understand the task deep enough before starting, and prioritize important things to do first.
  3. Please note how much time a day/week you are able to spend on this project.
  4. Please specify which category you apply for - medium task or extended deadline one.
  5. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication.
  6. Submit your proposal early, not in the last minute!
  7. Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two participants for one task) can be resolved.

Project Ideas

Rizin

RzIL uplifting migration (350 hour project)

Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP’s Core Theory was implemented. In the following months it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn’t any uplifting at all.

Tasks

  • Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already
  • Improve the integration with analysis (variables and types differences) for the chosen architecture
  • Write the test cases for Rizin regression tests and improve the results.
  • Update and use rz-tracetest for the chosen architectures
  • Implement necessary commands and APIs in Rizin for visual representation of the IL tree
  • Implement standard and graph views in Cutter for the IL output (optional)

Due to the sensivity of uplifting to the precision, it’s important to follow these steps:

  • For every single lifted opcode, have at the very least one asm test in test/db/asm/... containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g. malloc() error handling).
  • Run rz-tracetest on real traces. It’s also possible to write custom assembly programs that execute specific obscure instructions where it’s hard to be sure that they were implemented correctly on many random inputs and then feed these executions into rz-tracetest.
  • Few rz-test command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero).

Skills

The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.

Difficulty

Medium

Benefits for the participant

The participant will understand the state of the art of intermediate languages research, it’s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.

Benefits for the project

Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren’t even supported by ESIL will imrove the analysis to even greater degree.

Assess requirements for midterm/final evaluation

  • 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests
  • Final term: implement all changes in the analysis code, added more complex integration tests with types analysis

Mentors

  • xvilka
  • thestr4ng3r

Links/Resources

Debug information handling improvements (175 hour project)

Rizin already supports most of the DWARF and PDB features, including cross-platform parsing of both. However information are usually just printed to aid the reverse engineering process, but they are not actually used at their best. For example, you can’t use them to configure a breakpoint, nor they can be used to access variables within a function during debugging. Moreover, it is becoming more and more common to store DWARF information in separate files, either shipped as separate file or downloaded on the fly with debuginfod. Rizin does not support these kind of DWARF files yet.

Your task would be to improve the parsing support of both by fixing smaller bugs, add support for separate DWARF files and debuginfod and enhance breakpoint integration and variable/structure printing in debugging mode with the source information gathered from DWARF/PDB.

Task

  1. Support loading DWARF information from separate files and debuginfod
  2. Unify source lines/types information access for DWARF, PDB, dSYM and refactor/fix parsing code as necessary
  3. Integrate source line and types/variables information with the analysis (optional)
  4. Integrate source line and types/variables with printing with p commands in the debug mode
  5. Integrate source line and types/variables with breakpoint commands and APIs
  6. Parsing performance improvements

Skills

  • Good knowledge of the C language
  • Some experience in debugging with GDB or LLDB
  • Basic knowledge of at least one of the following formats: ELF, DWARF, PDB, PE

Difficulty

Hard

Benefits for the participant

Participant will understand how high-level features of debuggers work as well as gain skills in the field of software architecture of a large, modular C project.

Assess requirements for midterm/final evaluation

  • 1st term: debuginfod and source line information refactoring are implemented
  • Final term: Integration of variable information with the debug and printing commands is implemented

Mentors

  • xvilka
  • thestr4ng3r
  • ret2libc

Links/Resources

Thread-safety and multithreading (175 hour project)

Currently Rizin is not thread safe completely internally and as a library for a multithreaded application. The goal of this project is to eliminate global states and use contexts, eliminate singletons, e.g. RzCons, and use thread-safe external functions and dependencies.

Task

  1. Migrate from thread-unsafe system and external dependencies
  2. Eliminate global state inside RzCons and use of the singleton
  3. Make RzBin thread-safe
  4. Make RzAnalysis thread-safe
  5. Make RzCore thread-safe
  6. Add tests for using multiple RzCore and RzAnalysis instances
  7. Parallelize some of the RzAnalysis function using the threading API

Skills

Participant should know C as well as have the experience of developing multithreaded applications.

Difficulty

Hard

Benefits for the participant

Participant will understand the hurdles of multithreaded programming, data synchronization, locks and debugging of such code.

Assess requirements for midterm/final evaluation

  • 1st term: Eliminate thread-unsafe dependencies and remove global state from RzCons and RzBin
  • Final term: Make RzAnalysis and RzCore (optionally) thread-safe

Mentors

  • xvilka
  • thestr4ng3r
  • wargio

Links/Resources

Rewriting GPL-only code (175 hour project)

Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.

Tasks

  1. Rewrite C++ demangler to and remove the GPL code
  2. Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code

Good example of such architectures are:

  • SPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it’s less complete than binutils-based one)
  • Xtensa
  • Tricore
  • SH
  • HPPA (PA-RISC)

Skills

Participant should know C and basics of C++ for understanding the mangling scheme

Difficulty

Medium

Benefits for the participant

Participant will understand how C++ type information is stored in the name of the methods and classes.

Assess requirements for midterm/final evaluation

  • 1st term: Basic demangling for C++ is rewritten under less restrictive license.
  • Final term: At least one binutils-based architecture is reimplemented with more permissive license.

Mentors

  • xvilka
  • thestr4ng3r
  • wargio

Links/Resources

Exploitation capabilities improvements (175 hour project)

Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It’s a shame that despite having ESIL, Rizin doesn’t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.

The rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.

Task

  1. Update the shellcodes database, imrove rz-gg features and documentation
  2. Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like:
register reg1 = 0;
register reg2 = whatever;
register reg3 = reg1 + reg2;
system(reg3);
  1. Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain.
  2. Support main architectures - x86, ARM, MIPS, PowerPC

Skills

The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.

Difficulty

Advanced

Benefits for the participant

The participant will improve their skills in software exploitation and solvers.

Benefits for the project

This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)

Assess requirements for evaluation

  • 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver
  • Final term: Working ropchain compiler, covered by tests and documented in the Rizin book.

Mentors

  • xvilka
  • ret2libc

Links/Resources

Bindings for languages other than C/C++ (175 hour project)

Rizin offers a convenient scripting interface through the rz-pipe APIs, which build upon its command-based interface. While this reduced interface is beneficial and well-suited for many scripting tasks, building more complex applications generally requires direct access to the public C api that Rizin offers. Using this API is directly possible in C and C++, as it is done in Cutter for example, but for other languages no generic bindings exist so far. The goal of this task is to use a bindings generator such as SWIG to expose Rizin’s C API to languages such as Python, Java or OCaml.

Task

  • Integrate SWIG-generated bindings into Rizin’s build system
  • Write SWIG interfaces for all mature parts of Rizin’s C API
  • Integrate the Python bindings into Cutter’s Python support

Skills

The participant should be comfortable with the C and Python languages, as well as have a deep understanding of common memory management patterns such as ownership and reference counting.

Difficulty

Advanced

Benefits for the participant

The participant will gain an experience of exposing a C-based API with manual memory management to high-level, object-oriented languages with automatic memory management.

Assess requirements for midterm/final evaluation

  • 1st term: Bindings can be generated as part of the standard Rizin build system and small parts of the core API are already usable.
  • Final term: All relevant parts of the API can be used through bindings and also from within Cutter’s Python interpreter.

Mentors

  • thestr4ng3r
  • xvilka

Links/Resources

Cutter

Plugins and Python High Level API (175 hour project)

Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin’s API for disassembly, analysis and other purposes, see the Rizin bindings task above.

Task

  • Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc.
  • Accessing everything from Python (like Blender) - see issue #1662
  • Python integration and IPython console.

Skills

The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework

Difficulty

Advanced

Benefits for the participant

The participant will gain an experience of creating a suitable API for scripting graphical interface programs.

Benefits for the project

It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.

Assess requirements for midterm/final evaluation

  • 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls.
  • Final term: Implement the way to show the API when hovered over some interface control, create documentation.

Mentors

  • thestr4ng3r
  • Megabeets

Links/Resources

Multi-Tasking and Event-driven architecture (350 hour project)

The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn’t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ <addr>, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -> Refresh Contents). The goal of this task is to use an event-driven architecture to overcome this limitation.

In addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.

Tasks

The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more

  • Add events to all the relevant functions inside Rizin
  • Add support for these events in Cutter and refresh and update the relevant widgets per each event
  • Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574)

Skills

The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.

Difficulty

Advanced

Benefits for the participant

The participant will gain an experience of creating complex event-driven software in both C and C++ languages.

Benefits for the project

It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.

Assess requirements for midterm/final evaluation

  • 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter.
  • Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background.

Mentors

  • thestr4ng3r

Heap viewer completion (175 hour project)

Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.

Task

  • Complete Cutter’s implementation of the windows heap widget #2723
  • Improve the performance of the Windows heap parser
  • Fix Windows heap parsing errors
  • Make the implementation work with remote debugging modes

Skills

The participant should be comfortable with the C++, and be familiar with Qt framework

Difficulty

Medium

Benefits for the participant

The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.

Benefits for the project

It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.

Assess requirements for midterm/final evaluation

  • 1st term: Design and implement heap visualization widgets, add Rizin test and fixes
  • Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation.

Mentors

  • xvilka
  • Megabeets
  • yossizap

Links/Resources

Diffing mode (175 hour project)

Binary diffing is one of the most common tasks for the reverse engineer. There are many tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.

Task

  • Expose basic rz-diff features in the Cutter
  • Create the interface to choose two files for diffing
  • Create the way to show the differences in all main widgets:
    • Hexadecimal view
    • Disassembly view
    • Graph view
    • Pseudocode view

Skills

The participant should be comfortable with the C++ language, and be familiar with Qt framework

Difficulty

Medium

Benefits for the participant

The participant will gain an experience of creating efficient graphical interfaces.

Benefits for the project

It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.

Assess requirements for midterm/final evaluation

  • 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views.
  • Final term: Implement the diff modes for graph and pseudocode views, create the documentation.

Mentors

  • xvilka
  • Megabeets

Links/Resources

Microtasks

When taking any of microtasks please be sure someone isn’t already working on them, and let us know if you are going to work on a particular one.

File formats

Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.

Disassemblers and assemblers

Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.

ELF binary parsing.

Rizin parses a lot of information about the ELF but doesn’t print everything.

Moreover, some information about PLT stubs not being resolved correctly.

Analysis

The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin’s analysis engine.

See these issues or the “Analysis” project on our GitHub dashboard.

Heap analysis #157

Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn’t depend on the debugger backend, and we may be able to use different heap tools.

Class analysis for C++/ObjectiveC/Swift/Dlang/Java #416

Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.

Devirtualize method calls using class vtables #414

Consider the following call: call dword [eax + 0x6c] Let’s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.

So there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.

When that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.

Add classes list to Vb

Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.

Refactoring

Use internal API instead of commands

Currently, Rizin’s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn’t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:

In general you can just search for rz_core_cmd pattern in any place inside librz/.

Miscellaneous

Shell (dietline) improvements

Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the “dietline”-labeled issues.

Improving regression suite and testing

It is required to solve numerous issues, along with improving parallel execution and performance. Good example is to allow better filtering of the test types to run, for example to ignore debug tests. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.

Another important part of the improving test suite is to cover more different formats and cases with expanding it. See the #114 issue with more details on how it can be done.

Unbreaking broken tests

Almost one thousand of tests marked as “broken” in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.

Better portability

Due to the mistakes in handling data for big-endian platforms in Rizin code a lot of tests still don’t pass on our System Z CI worker. Most of the broken test are related to parsing the formats, in particular reading the integers in portable way. See #297 for details on these formats. In most cases the solution would be to use rz_read_*() API functions: Developers Guide: Manage Endianess.

RzGhidra

There are many small issues in the decompiler output:

Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.

Also note that most of these issues should be paired with the test to verify it will not break in the future.