TL;DR Jump to the Ideas list.

Introduction

Each year since 2015, we have participated in Google Summer of Code as the Radare2 project and accomplished many goals. This year we participate as a fork - Rizin, but effectively continuing the same process and the same mentors.

Mentors

Members of the Rizin and Cutter core teams have volunteered to guide students for GSoC’21. They were already guiding the students for the GSoC and RSoC in past years as part of the Radare2 project. Please feel free to reach out to any of them in case you need any help in selecting a project.

  • Anton Kochkov Mattermost: xvilka – @akochkov
  • Riccardo Shirone Mattermost: ret2libc
  • Florian Märkl Mattermost/Telegram: @thestr4ng3r – @thestr4ng3r
  • Antide Petit IRC/Telegram: xarkes – @xarkes_
  • Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_
  • Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad
  • Yossi Zapesochini Mattermost/Telegram: @yossizap
  • And many others

Development methodology

Currently, all repositories are hosted on GitHub main organization account, bugs are tracked on GitHub issues too. We are mostly using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (that is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don’t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples we’re using ASCIInema for recording the sessions.

See also our guides for corresponding projects:

For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.

License

Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin, while protecting our free codebase.

Instructions for students

It is a requirement that students who want to apply to the Rizin project for the Google Summer of Code 2021 should submit a small pull request accomplishing one of the microtasks (see below) as part of their application. Though you can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task, and still small enough to be finished no more than in a couple of weeks.

Programming languages

Most of Rizin is written in C (conforming C99 standard) and hence we expect students to be familiar with C programming language. For some of our tasks or microtasks, such as collaborative RE or rz-pm, students should know the Go programming language. For the Cutter tasks, students should know C++ and Qt framework basics.

  1. Read Google’s instructions for participating
  2. Grab any of the project from list of ideas that you’re interested in (or propose your own).
  3. Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you.
  4. Submit it using Google’s web interface.

Student proposal guidelines

  1. Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing.
  2. Try to split GSoC period into tasks, and each task into subtasks. It helps us to understand how you plan to accomplish your goals, but more importantly, it’ll help you to understand the task deep enough before starting, and prioritize important things to do first.
  3. Please, note, how much time a day/week you are able to spend on this project.
  4. Specify your timezone, since so we can assign you a mentor in the same one, to ease communication.
  5. Submit your proposal early, not in the last minute!
  6. Be sure to choose a “backup” idea (the second task you want to do), so that conflicts (two students for one task) can be resolved.

Project Ideas

Rizin

Type Analysis Improvements

Currently we have types support in Rizin, including basic (low-level) ability to edit type with pf and higher-level, C-like types with t command. It is possible to parse the C type definition from C headers for example, or load from “precompiled” SDB file. However, despite such features being present, many of them still lack the right structures and connections between them to make complex automated and manual analysis of code using types convenient. The goal of this task is to build upon the currently available features and re-think or re-design some of them to fit into the bigger picture of the entire framework. The overall plan for this project is tracked at https://github.com/rizinorg/rizin/projects/3. There are certain dependencies between some of the subtasks, but as long as these are respected, subtasks to be taken for GSoC can be picked by preference.

Task (proposal)

  1. Bundle all types functionality in a new module RzTypes #369
  2. Refactor some base type accesses to use the RzAnalysisBaseType API #368
  3. Replace the current TCC-based C types parser by a Tree-sitter based one #275

Skills

Student should know C as well as be familiar with basics of the program analysis. They should also be passionate about software architecture.

Difficulty

Hard

Benefits for the student

Student will understand modern program analysis problems related to type analysis, as well as gain skills in the field of software architecture of a large, modular C project with complex dependencies between modules.

Assess requirements for midterm/final evaluation

  • 1st term: RzTypes module exists and contains all relevant code.
  • Final term: Tree-sitter based C parser is implemented and integrated into the analysis framework.

Mentors

  • xvilka
  • thestr4ng3r

Links/Resources

CPU/Platform profiles

While instruction set defines architecture, it is common that particular CPU or SoC models implement only a subset of it or extend it with custom instructions and registers. Moreover, various SoC modifications can define peripheral devices interaction through ports (rare), registers or MMIO spaces. All this helps the reverse engineering process, because a lot of the code will make sense upon a glance once you see it accesses certain registers (if named) or peripheral devices (when MMIO area is defined). A common example is SVD loading for ARM architecture.

A good example how CPU profile should look like:

  • asm.cpu to be dynamically populated by listing the available CPU dedicated plaintext/sdb files
  • Add RAM_SIZE info in the CPU files and remove hardcoding from .h and .c files
  • Add ROM_SIZE info in the CPU files and remove hardcoding from .h and .c files
  • Add INTERRUPT_VECTOR_SIZE info in the CPU files and remove hardcoding from .h and .c files
  • Add IO_REGISTERS, EXTENDED_IO_REGISTER, MMIO_REGISTER, coprocessor register info in the CPU files and remove hardcoding from .h and .c files

Task

  1. Implement support for CPU profiles
  2. Implement support for platform profiles
  3. Add support for register and MMIO specific setups
  4. Integrate these in analysis loop, handling register and memory accesses.
  5. Implement tests and documentation in Rizin book
  6. Provide an API for setting these values from rz-pipe and lang-* plugins

Skills

Student should know C and understand basics of the hardware platforms, architectures and chips.

Difficulty

Medium

Benefits for the student

The student will improve familiarity with reverse engineering for various architectures and platforms, along with the improving the efficiency of Rizin.

Benefits for the project

Huge benefits for end users in UX and better support for extension.

Assess requirements for evaluations

  • 1st term: CPU and platform profiles, some most common profiles, integration with the analysis loop
  • Final term: Support for more platforms, regression and unit tests, documentation (including Rizin book).

Mentors

  • xvilka
  • deroad

Links/Resources

Rz-diff improvements

Rizin has had the ability to perform binary diffing for over a decade. Nevertheless the support is quite basic and there is room for improvement. One of the most important tasks is to deepen the integration with analysis loop. Integration with the analysis loop will allow Rizin to find and highlight the difference between arguments count, local variables count, their types and other analysis metainformation. The next big task is to modernize rz-diff (and corresponding parts in RCore) in terms of performance and user interface. And of course - cover the rz-diff and rizin diffing features with regression tests and unit tests.

Tasks

  • Support diffing of the different parts of the same buffer/file
  • Split view for hexadecimal view and disassembly diffing mode
  • Improve the integration with analysis (variables and types differences)
  • Integrate ESIL and decompilation (rz-ghidra, jsdec) pseudocode as an options for binary diffing
  • Implement the most important diffing strategies from Diaphora
  • Write the test cases for Rizin regression tests and improve the results.

Skills

Student should know C as well as be familiar with basics of the program analysis. Having an experience with other binary diffing software is a plus.

Difficulty

Medium

Benefits for the student

Student will understand modern program analysis problems in application to binary diffing, and how to improve the performance of patch analysis.

Benefits for the project

This feature will make Rizin usable for day-to-day patch analysis of modern software, as well as improve the automation and performance of this task.

Assess requirements for midterm/final evaluation

  • 1st term: rz-diff/rizin should support highlighting types, arguments, and variables differences between functions.
  • Fina term: Implement split-view for hex, disassembly, and graph modes. Their interface and performance improvements. Write the regression tests for all implemented features, add the documentation in Rizin book.

Mentors

  • xvilka
  • Megabeets

Links/Resources

Exploitation capabilities improvements

Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It’s a shame that despite having ESIL, Rizin doesn’t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.

The rz-gg tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.

Task

  1. Update the shellcodes database, imrove rz-gg features and documentation
  2. Implement a ropchain syntax parser that uses rz-gg or a custom DSL, something like:
register reg1 = 0;
register reg2 = whatever;
register reg3 = reg1 + reg2;
system(reg3);
  1. Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain.
  2. Support main architectures - x86, ARM, MIPS, PowerPC

Skills

The student should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.

Difficulty

Advanced

Benefits for the student

The student will improve their skills in software exploitation and solvers.

Benefits for the project

This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)

Assess requirements for evaluation

  • 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver
  • Final term: Working ropchain compiler, covered by tests and documented in the Rizin book.

Mentors

  • xvilka
  • ret2libc

Links/Resources

Bindings for languages other than C/C++

Rizin offers a convenient scripting interface through the rz-pipe APIs, which build upon its command-based interface. While this reduced interface is beneficial and well-suited for many scripting tasks, building more complex applications generally requires direct access to the public C api that Rizin offers. Using this API is directly possible in C and C++, as it is done in Cutter for example, but for other languages no generic bindings exist so far. The goal of this task is to use a bindings generator such as SWIG to expose Rizin’s C API to languages such as Python, Java or OCaml.

Task

  • Integrate SWIG-generated bindings into Rizin’s build system
  • Write SWIG interfaces for all mature parts of Rizin’s C API
  • Integrate the Python bindings into Cutter’s Python support

Skills

The student should be comfortable with the C and Python languages, as well as have a deep understanding of common memory management patterns such as ownership and reference counting.

Difficulty

Advanced

Benefits for the student

The student will gain an experience of exposing a C-based API with manual memory management to high-level, object-oriented languages with automatic memory management.

Assess requirements for midterm/final evaluation

  • 1st term: Bindings can be generated as part of the standard Rizin build system and small parts of the core API are already usable.
  • Final term: All relevant parts of the API can be used through bindings and also from within Cutter’s Python interpreter.

Mentors

  • thestr4ng3r
  • xvilka

Links/Resources

Cutter

Plugins and Python High Level API

We currently don’t have API almost for plugin authors to use. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the Python interface in Cutter, specifically its graphical user interface components. For a task about exposing Rizin’s API for disassembly, analysis and other purposes, see the Rizin bindings task above.

Task

  • Expose everything Cutter can offer for plugins authors. This includes high level API, integration of the plugin management etc.
  • Accessing everything from Python (like Blender) - see issue #1662
  • Python integration and IPython console.

Skills

The student should be comfortable with the C++ and Python languages, and be familiar with Qt framework

Difficulty

Advanced

Benefits for the student

The student will gain an experience of creating a suitable API for scripting graphical interface programs.

Benefits for the project

It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.

Assess requirements for midterm/final evaluation

  • 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls.
  • Final term: Implement the way to show the API when hovered over some interface control, create documentation.

Mentors

  • thestr4ng3r
  • Megabeets

Links/Resources

Multi-Tasking and Event-driven architecture

Cutter is a reverse engineering framework that is powered by Rizin. The information it gets about functions, strings, imports, and the analysis are all performed in Rizin and displayed in Cutter. Currently, Cutter is pulling information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn’t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ <addr>, Cutter will now show this new function in the Functions widget until the user will refresh the interface manually (edit -> Refresh Contents).

In addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.

Tasks

The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more

  • Add events to all the relevant functions inside Rizin
  • Add support for these events in Cutter and refresh and update the relevant widgets per each event
  • Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574)

Skills

The student should be comfortable with the C++ for Cutter and C for Rizin. The student should be familiar with Qt framework.

Difficulty

Advanced

Benefits for the student

The student will gain an experience of creating complex event-driven software in both C and C++ languages.

Benefits for the project

It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.

Assess requirements for midterm/final evaluation

  • 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter.
  • Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background.

Mentors

  • thestr4ng3r
  • Karliss

Heap viewer

We already have a nice heap (and memory map) parser and visualizer in Rizin (dm and dmh commands). After debugging becomes a first-class citizen in cutterland it would be awesome to have memory map and heap visualizations.

Task

  • Expose Rizin API/commands for Cutter to use for visualization
  • Design and implement heap navigation and inspection widgets
  • Provide the integration with current debugging mode in Cutter
  • Make the implementation work with both local (native) and remote debugging modes

Skills

The student should be comfortable with the C++, and be familiar with Qt framework

Difficulty

Medium

Benefits for the student

The student will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.

Benefits for the project

It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.

Assess requirements for midterm/final evaluation

  • 1st term: Design and implement heap visualization widgets, add Rizin test and fixes
  • Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation.

Mentors

  • xvilka
  • Megabeets

Links/Resources

Diffing mode

Binary diffing is one of the most common tasks for the reverse engineer. There are many various tools available, but most of them are either detached from the main RE toolbox or poorly integrated. Rizin provides basic diffing features out of the box with rz-diff tool, but Cutter has no interface to represent similar functionality.

Task

  • Expose basic rz-diff features in the Cutter
  • Create the interface to choose two files for diffing
  • Create the way to show the differences in all main widgets:
    • Hexadecimal view
    • Disassembly view
    • Graph view
    • Pseudocode view

Skills

The student should be comfortable with the C++ language, and be familiar with Qt framework

Difficulty

Advanced

Benefits for the student

The student will gain an experience of creating efficient graphical interfaces.

Benefits for the project

It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.

Assess requirements for midterm/final evaluation

  • 1st term: Expose the rz-diff features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views.
  • Final term: Implement the diff modes for graph and pseudocode views, create the documentation.

Mentors

  • xvilka
  • Megabeets

Links/Resources

Microtasks

When taking any of microtasks please be sure someone isn’t already working on them, and let us know if you are going to work on a particular one.

File formats

Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.

Disassemblers and assemblers

Implementing the support for any new architectire counts as a microtask. See New-Architecture label for pending issues.

ELF binary parsing.

Rizin parses a lot of information about the ELF but doesn’t print everything. Thus, the improving the output of i* commands and rz-bin tool is important to match up with readelf (Add file offset and memory alignment for segments information (iSS command))

Moreover, some information about PLT stubs not being resolved correctly.

Analysis

The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin’s analysis engine.

See these issues or the “Analysis” project on our GitHub dashboard.

Basefind #413

There are plenty of external scripts and plugins for finding the most probable base for raw firmware images. Opening raw firmwares with rizin is a common use case, so it makes sense to implement it as a part of rizin core.

Heap analysis #157

Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn’t depend on the debugger backend, and we may be able to use different heap tools.

Class analysis for C++/ObjectiveC/Swift/Dlang/Java #416

Analysis classes, accessible under the ac command, is a relatively new feature of rizin. They provide a way to both manually and automatically manage and use information about classes in the binary.

Devirtualize method calls using class vtables #414

Consider the following call: call dword [eax + 0x6c] Let’s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.

So there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.

When that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.

Add classes list to Vb

Vb already supports browsing bin classes. The same thing should be implemented for classes from analysis.

Signatures

Rizin has a good support for loading and creating signatures, but it is not yet complete, thus some problems remain, for example: #272.

As Rizin supports FLIRT signatures loading from IDA Pro, not all of them are supported yet - e.g. version 5 compression.

Refactoring

Use “newshell” instead of old switch/case handling

Rizin is in the middle of the switch from the old style switch/case manual parsing of every command to the centralized Tree-Sitter-based parser, providing every command handler argc/argv arguments. Best candidates for the initial switch are:

  • librz/core/cmd_egg.c
  • librz/core/cmd_hash.c
  • librz/core/cmd_plugins.c

A good example of transition is in these pull requests for t (types) command conversion:

Use internal API instead of commands

Currently, Rizin’s source code is rife with calls to rz_core_cmd()-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn’t the case of changed function arguments count or type. Thus, all these calls eventually should be substituted with direct calls to the corresponding API functions. If there is no corresponding API function, then one should be created. Good examples of such cases are:

In general you can just search for rz_core_cmd pattern in any place inside librz/.

Improving the uplifting of the code to IL

Rizin has its own intermediate language - ESIL, but not yet support it for all architectures. So the task is to add ESIL support to any architecture, which doesn’t has it yet.

Miscellaneous

Shell (dietline) improvements

Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the “dietline”-labeled issues.

Improving regression suite and testing

It is required to solve numerous issues, along with improving parallel execution and performance. Good example is to allow better filtering of the test types to run, for example to ignore debug tests. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.

Another important part of the improving test suite is to cover more different formats and cases with expanding it. See the #114 issue with more details on how it can be done.

RzGhidra

There are many small issues in the decompiler output:

Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.

Also note that most of these issues should be paired with the test to verify it will not break in the future.