TL;DR Jump to the Ideas list.
Introduction
This year, we participate again, effectively continuing the tradition since 2015.
Mentors
Members of the Rizin and Cutter core teams have volunteered to guide participants for GSoC’24. They have already been guiding the participants for the GSoC and RSoC in past years. Please feel free to reach out to any of them if you need any help in selecting a project.
- Anton Kochkov Mattermost: xvilka – @akochkov
- Riccardo Schirone Mattermost: ret2libc @RickySkiro
- Florian Märkl Mattermost/Telegram: @thestr4ng3r – @thestr4ng3r
- Itay Cohen Mattermost/Telegram: @Megabeets @Megabeets_
- Giovanni Dante Grazioli Mattermost/Telegram: @deroad @der0ad
- And many others
Development methodology
Currently, all repositories are hosted on GitHub main organization account and bugs are tracked on GitHub issues too. We are primarily using our own Mattermost instance, IRC, and Telegram) for communication. We have a testsuite (which is running on GitHub Actions, Travis CI, AppVeyor and SourceHut) to test and verify that all the features are still working and that a pull requests or commits don’t break anything, to ensure the support of different operating systems (Linux, MacOS, Windows, FreeBSD, OpenBSD), different architectures (x86/x86_64, ARM64, PowerPC, SystemZ), and to find regressions. We encourage contributors to write test cases and documentation in order to verify the implementation and ensure that everything fits well together. For complex bugs and examples, we’re using ASCIInema to record the sessions.
See also our guides for corresponding projects:
- Rizin Contributing Guide and Developers Intro
- Cutter Contributing Guide and Developers Intro
For those who want to get introduced to the Rizin codebase and practices, we recommend to pick one of the easy issues for Rizin or Cutter to start with.
License
Rizin is modular: this means that it aims to make all the elements and features easily reusable from other projects. The choice of LGPL3 as a license is the minimum requirement to get code merged in Rizin. Contributors can choose Apache, BSD, MIT, Public Domain, or other similar licenses. The reason to exclude GPL as a valid license for the project is because we aim to support proprietary software that uses Rizin while protecting our free codebase.
Instructions for participants
Participants who want to apply to the Rizin project for the Google Summer of Code 2024 are required to submit a small pull request accomplishing one of the microtasks (see below) as part of their application. You can also choose any of the GitHub issues for Rizin if they are big enough to be a qualification task and still small enough to be finished in no more than a couple of weeks. To help participants understand how to contribute to the project, there are issues marked as “good first issue” for both Rizin and Cutter.
Programming languages
Most of Rizin is written in C (conforming to the C99 standard), and hence, we expect participants to be familiar with C programming language. For some of our tasks or microtasks, such as rz-pm, they should know the Go programming language. For the Cutter tasks, it is a requirement to know C++ and Qt framework basics.
Recommended steps
- Read Google’s instructions for participating
- Grab any of the projects from the list of ideas that you’re interested in (or propose your own).
- Write a first draft proposal using Google Docs and our template and ask one of the mentors or administrators to review it with you.
- Submit it using Google’s web interface.
Participant proposal guidelines
- Keep it simple enough to fit in no more than a couple of pages. Try to be clear and concise in your writing.
- Try to split the entire GSoC period into tasks and each task into subtasks. It helps us understand how you plan to accomplish your goals, but more importantly, it’ll help you understand the task deep enough before starting and prioritize important things to do first.
- Please note how much time a day/week you can spend on this project.
- Please specify which category you apply for - medium task or extended deadline one.
- Specify your timezone so we can assign you a mentor in the same one to ease communication.
- Submit your proposal early, not at the last minute!
- Be sure to choose a “backup” idea (the second task you want to do) so that conflicts (two participants for one task) can be resolved.
Project Ideas
Cutter
Improving usability and user experience (175 hour project)
The Cutter’s backend provides many features that are not exposed or exposed in Cutter efficiently. The goal of this task would be to figure out the users’ biggest pain points and address them by improving or reworking the interface. Some of the issues are already in our GitHub, while others might be figured during the cross-comparison with other tools.
Task
- Add a scrollbar to the disassembly and hexdump widgets
- Better syntax highlight and theming
- Managing window/widget overlays
- Add information about status of the analysis, signature searching, and other operations
- Address various small UI problems that make user’s life harder than necessary
Skills
The participant should be comfortable with the C++ and be familiar with Qt framework. Basics of the design/UX would be a plus.
Difficulty
Advanced
Benefits for the participant
The participant will gain an experience of creating comfortable and efficient user interface with C++/Qt.
Benefits for the project
It will make interface and user experience more consistent, on par with Rizin itself, and other tools.
Assess requirements for midterm/final evaluation
- 1st term: Add scrollbar to necessary widget, improve theming and syntax highlight
- Final term: Managing widgets layouts, docking; provide action status information
Mentors
- thestr4ng3r
- xvilka
- Megabeets
Links/Resources
Plugins and Python High Level API (175 hour project)
Our current public API to be used by plugin authors is somewhat limited. We need to improve a lot of things about our Plugins support and take it few steps ahead. This task is only about improving the public C++ and Python interface of Cutter, specifically its graphical user interface components. For a task about exposing Rizin’s API for disassembly, analysis and other purposes, see the Rizin bindings task above.
Task
- Expose everything Cutter can offer for plugin authors. This includes high level API, integration of the plugin management etc.
- Accessing everything from Python (like Blender) - see issue #1662
- Python integration and IPython console.
Skills
The participant should be comfortable with the C++ and Python languages, and be familiar with Qt framework
Difficulty
Advanced
Benefits for the participant
The participant will gain an experience of creating a suitable API for scripting graphical interface programs.
Benefits for the project
It will greatly improve the scripting experience, will make API more consistent and will ease creating Cutter plugins by the community. Moreover, it will simplify testing of the Cutter features.
Assess requirements for midterm/final evaluation
- 1st term: Design of the high level API and required Rizin changes. Review and implement all missing API functions that are accessible as interface controls.
- Final term: Implement the way to show the API when hovered over some interface control, create documentation.
Mentors
- thestr4ng3r
- Megabeets
Links/Resources
Multi-Tasking and Event-driven architecture (350 hour project)
The information Cutter gets about functions, strings, imports, and the analysis are all performed in Rizin and only displayed in Cutter. Currently, it is pulling most information from Rizin only on demand. This is problematic because sometimes the user performs changes (via plugins, the console widget, and more) that are affecting the information from Rizin, but Cutter doesn’t know about these changes to apply the to the UI. For example, if a user will define a new function in a Python script or via the console widget by using the Rizin command af @ <addr>
, Cutter will not show this new function in the Functions widget until the user will refresh the interface manually (Edit -> Refresh Contents).
The goal of this task is to use an event-driven architecture to overcome this limitation.
In addition, this task will also handle the analysis in the background feature, to allow the analysis performed by Rizin to happen while the interface is active.
Tasks
The overall implementation of this task should start from Rizin by adding events to many of the functions. This can be done using rz_events
. For example, add an even for function creating, for section creation, for flag deletion, for name changed, and more
- Add events to all the relevant functions inside Rizin
- Add support for these events in Cutter and refresh and update the relevant widgets per each event
- Support analysis in the background and allow the user to start its session while Rizin is analyzing (see #1856, #1574)
Skills
The participant should be comfortable with the C++ for Cutter and C for Rizin. They should also be familiar with Qt framework. Experience in GUI code architecture, for example using functional reactive programming or Elm-like approaches is a plus.
Difficulty
Advanced
Benefits for the participant
The participant will gain an experience of creating complex event-driven software in both C and C++ languages.
Benefits for the project
It will allow to work on big files effortlessly in Cutter, will improve analysis quality as well.
Assess requirements for midterm/final evaluation
- 1st term: Implement events everywhere in the relevant places across Rizin code and event-driven interaction with Cutter.
- Final term: Add support for the Cutter interface refresh based on the events from Rizin, implement analysis in background.
Mentors
- thestr4ng3r
Heap viewer completion (175 hour project)
Thanks to the work that was done in the previous GSoC, Cutter and Rizin have nice visualizations of the heap and memory maps. We would like to expand on this feature with performance improvements to the heap parsers and support more memory allocators.
Task
- Complete Cutter’s implementation of the windows heap widget #2723
- Improve the performance of the Windows heap parser
- Fix Windows heap parsing errors
- Make the implementation work with remote debugging modes
Skills
The participant should be comfortable with the C++, and be familiar with Qt framework
Difficulty
Medium
Benefits for the participant
The participant will gain the understanding on how modern runtimes provide the heap for various programs, which will be beneficial for the binary exploitation skills.
Benefits for the project
It will greatly improve the debugging and reverse engineering experience for complex programs, also provides the way to design the exploitation techniques with the help of Rizin/Cutter.
Assess requirements for midterm/final evaluation
- 1st term: Design and implement heap visualization widgets, add Rizin test and fixes
- Final term: Various bugfixes related to the heap inspection support on various platforms and allocators, tests and documentation.
Mentors
- xvilka
- Megabeets
Links/Resources
- Issue #1041
- Heap Viewer plugin for IDA Pro
- Heap parsing for MacOS, tmalloc, jmalloc
- Dynamic Allocator Detection
- “heap”-marked Rizin issues
Diffing mode (175 hour project)
Binary diffing is one of the most common tasks for the reverse engineer. There are many
tools available, but most of them are either detached from the main RE toolbox or poorly integrated.
Rizin provides basic diffing features out of the box with rz-diff
tool, but Cutter has no
interface to represent similar functionality.
Task
- Expose basic
rz-diff
features in the Cutter - Create the interface to choose two files for diffing
- Create the way to show the differences in all main widgets:
- Hexadecimal view
- Disassembly view
- Graph view
- Pseudocode view
Skills
The participant should be comfortable with the C++ language, and be familiar with Qt framework
Difficulty
Medium
Benefits for the participant
The participant will gain an experience of creating efficient graphical interfaces.
Benefits for the project
It will greatly benefit the project since Cutter will be the only FOSS RE tool to provide this feature out of the box.
Assess requirements for midterm/final evaluation
- 1st term: Expose the
rz-diff
features in the Cutter core and create the interface for opening files for diffing. Implement the diff modes for hexadecimal and disassembly views. - Final term: Implement the diff modes for graph and pseudocode views, create the documentation.
Mentors
- xvilka
- deroad
Links/Resources
Rizin
RzIL uplifting migration (350 hour project)
Rizin has had an intermediate language for over a decade. Major architectures are supporting uplifting to ESIL. During the RSoC 2021, the initial version of the new intermediate language, which is based on the BAP’s Core Theory was implemented. In the following years it was improved and some of the architectures were ported to use RZIL instead of ESIL. The main goal of this project is to finish the migration of one or more existing architectures that still use ESIL or add a RzIL support for the architectures that hadn’t any uplifting at all.
Tasks
- Implement a RZIL uplifting for any non-trivial architecture, preferably that is supported by ESIL already
- Improve the integration with analysis (variables and types differences) for the chosen architecture
- Write the test cases for Rizin regression tests and improve the results.
- Update and use rz-tracetest for the chosen architectures
- Implement necessary commands and APIs in Rizin for visual representation of the IL tree
- Implement standard and graph views in Cutter for the IL output (optional)
Due to the sensivity of uplifting to the precision, it’s important to follow these steps:
- For every single lifted opcode, have at the very least one asm test in
test/db/asm/...
containing the IL to detect changes when the code is changed and have it type-checked. This should produce 100% coverage on the lifter C code (except e.g.malloc()
error handling). - Run
rz-tracetest
on real traces. It’s also possible to write custom assembly programs that execute specific obscure instructions where it’s hard to be sure that they were implemented correctly on many random inputs and then feed these executions intorz-tracetest
. - Few
rz-test
command tests that emulate some code snippets in rizin. For example a simple decryption loop to check the overall integration in rizin, or for specific edge cases (like running a division by zero).
Skills
The participant should know C and bits of C++ as well as be familiar with basics of the program analysis. Having an experience with other intermediate language, SAT/SMT, and mathematical logic is a plus.
Difficulty
Medium
Benefits for the participant
The participant will understand the state of the art of intermediate languages research, it’s relation to the mathematical logic, SMT, and program analysis. Moreover, the participant will become familiar with the both symbolic and concrete emulation during the implementation process.
Benefits for the project
Migrating most architectures will help to deprecate and remove outdated ESIL and will help improving the analysis precision. Adding uplifting for new architectures that weren’t even supported by ESIL will improve the analysis to even greater degree.
Assess requirements for midterm/final evaluation
- 1st term: finish the RZIL uplifting for the chosen architecture with basic instruction-level tests
- Final term: implement all changes in the analysis code, added more complex integration tests with types analysis
Mentors
- xvilka
- thestr4ng3r
Links/Resources
- RZIL-labeled issues
- ESIL-labeled issues
- ESIL to RZIL conversion tracking issue
- Cutter: IL output and graph visual representation
Debugger improvements and portability (175 hour project)
Rizin debugger already supports most of the platforms, including native and remote debugging. Nevertheless, for most platforms it’s limited mostly to the x86/x86_64 and ARMv8, often lacking the tests. The task would be to add missing architectures to the native debugger, e.g. MIPS to the Linux Native, ARMv7/ARMv8 to the FreeBSD, System Z debugger for Linux, HPPA debugger for Linux, VAX debugger for NetBSD, and so on. Moreover, some information isn’t available during the debugging mode, e.g. source-level breakpoints or names, it would be necessary to make sure debug commands understand those.
With the help of emulators like QEMU and OpenSIMH we could extend our CI to automatically test these debuggers.
Task
- Integrated source-level information loaded from DWARF or PDB into debug commands and print
p
commands - Support for missing architectures that are supported by Rizin statically in the Linux native debugger
- Support for missing architectures that are supported by Rizin statically in the BSD native debugger
- Cover more platforms supported by the debugger with automated tests, with CI whenever it’s possible
- Fix the bugs in debuggers, minor refactorings of the code
Skills
- Good knowledge of the C language
- Some experience in debugging with GDB or LLDB
Difficulty
Hard
Benefits for the participant
Participant will understand how debugging works on the low level, and will gain experience with variety of different platforms and operating systems.
Assess requirements for midterm/final evaluation
- 1st term: `SystemZ, MIPS, HPPA support in Linux native, remote GDB debuggers
- Final term: ARM and SPARC support in *BSD debuggers, VAX support in NetBSD
Mentors
- xvilka
- thestr4ng3r
- ret2libc
Links/Resources
FRIDA integration (175 hour project)
FRIDA is the famous dynamic instrumentation toolkit that is immensely popular among mobile device researches. Rizin could be easily integrated with Frida by creating a plugin that will allow to connect to the Frida instance, receive traces, set breakpoints, get information and events from it.
Task
- Create the basic plugin that allows attaching, spwaning, launching processes within Frida loco ally
- Support remote connection
- Add feature to receive information from the Frida instanced
- Add breakpoints and run/step/continue feature’s
- Support calling functions and scripts in the context of the instrumented process
Skills
Participant should know C as well as have the experience of working with debuggers.
Difficulty
Hard
Benefits for the participant
Participant will understand and learn how to use Frida toolkit, also the internals of the debugging and instrumentation processes.
Assess requirements for midterm/final evaluation
- 1st term: Implement core of the FRIDA plugin, allowing local and remote debugging features
- Final term: Add support for extended features like calling functions or scripts within the context
Mentors
- xvilka
- thestr4ng3r
- wargio
Links/Resources
Rewriting GPL-only code (175 hour project)
Currently some of the Rizin main features rely on the GPL-only code copied from binutils or GCC. The goal is to rewrite all this code from GPL-only to LGPL or any other less restrictive license. It is quite important for better adoption of Rizin as a library in other FOSS and commercial projects.
Tasks
- Rewrite C++ demangler to and remove the GPL code
- Rewrite some of the mainstream architectures that still rely on binutils without using GPL-only code
Good example of such architectures are:
- SPARC (there is already capstone-based RzAsm and RzAnalysis plugin but it’s less complete than binutils-based one, capstone update is also required)
- Xtensa (It’s better to implement/update it in Capstone)
- ARC (Same, better to implement/update it in Capstone)
- Lanai
- CRIS
- VAX
Skills
Participant should know C and basics of C++ for understanding the mangling scheme
Difficulty
Medium
Benefits for the participant
Participant will understand how C++ type information is stored in the name of the methods and classes.
Assess requirements for midterm/final evaluation
- 1st term: Basic demangling for C++ is rewritten under less restrictive license.
- Final term: At least one binutils-based architecture is reimplemented with more permissive license.
Mentors
- xvilka
- thestr4ng3r
- wargio
Links/Resources
- rizin: rewrite/remove GPL-only code
- rz-libdemangle: rewrite/remove GPL-only code
- Update binutils code to latest
Exploitation capabilities improvements (175 hour project)
Since modern architectures are now enforcing W^X, exploiters are using ROP. (Un)fortunately, building ROP chain by hand can be tedious, this is why some tools can be used to ease this construction: ImmunityDBG has mona.py, there is also ROPgadget and dropper.There exist even tools that can generate ROP chains automatically, for example exrop. It’s a shame that despite having RzIL, Rizin doesn’t have something similar yet. One of the possible solutions would be to build an external plugin or tool which will reuse power of librz and rz-gg. Moreover it makes sense to think about SROP, COOP and BROP support.
The rz-gg
tool while has the ability to create a custom shellcode has the outdated database of the shellcodes, so updating them is crucial for the tool to be relevant.
Task
- Update the shellcodes database, improve
rz-gg
features and documentation - Implement a ropchain syntax parser that uses
rz-gg
or a custom DSL, something like:
register reg1 = 0;
register reg2 = whatever;
register reg3 = reg1 + reg2;
system(reg3);
- Write a compiler which uses SMT solver (like Z3 for example) to produce the ropchain.
- Support main architectures - x86, ARM, MIPS, PowerPC
Skills
The participant should be comfortable with the C language, know some assembly and a high-level language. Also, knowing a little bit of automatic binary analysis wouldn’t hurt.
Difficulty
Advanced
Benefits for the participant
The participant will improve their skills in software exploitation and solvers.
Benefits for the project
This feature would greatly help during exploits development, and people would be able to ditch mona.py for Rizin ;)
Assess requirements for evaluation
- 1st term: Creating the language for defining the ROP chain semantics and integrating it with SMT solver
- Final term: Working ropchain compiler, covered by tests and documented in the Rizin book.
Mentors
- xvilka
- ret2libc
Links/Resources
- ROPGadget
- Ropper
- Angrop
- ROPC
- exrop
- roper2
- mona.py from corelan
- Hunting for ROP Gadgets in Style (2012)
- dropper a BARF-based rop chain generator
- Materials about the exloitation workshop at Hack.lu 2014
- Slides for the exploitation part of workshop at Hack.lu 2015
- RzEgg related bugs
Microtasks
When taking any of microtasks please be sure someone isn’t already working on them, and let us know if you are going to work on a particular one.
Cutter UX improvements
There are many small issues and missing features that when implemented will improve the user experience significantly:
- Allow adding new flags from hexdump
- Scrollbar inside disasssembly windows
- Variables and values popup widgets on mouse hover
- Allow to set RzRun profiles from the GUI during debugging
- Double-click on the type in Disasm and Graph widgets should switch to the Types windows and show the selected type
- Set breakpoint inside X-Refs window
- Unified dialogue to set debug symbols servers
See full list at our User Experience project covering all parts of RizinOrg: Rizin, Cutter, RzGhidra, rz-pm.
File formats
Implementing the support for any new file format counts as a microtask. See New File-Format label for pending issues.
Disassemblers and assemblers
Implementing the support for any new architecture counts as a microtask. See New-Architecture label for pending issues.
Two notable examples are updating existing bytecode plugins to support newer versions of the respective languages:
Analysis
The current code analysis has many caveats and issues which need addressing. Fixing them and writing more tests is important to stabilize and enhance rizin’s analysis engine.
See these issues on our GitHub dashboard.
Heap analysis #157
Currently Rizin has support for heap exploration and analysis, but the feature is still basic and can be improved. Additionally, other allocators can be added (MacOS, tmalloc, etc.), but this should be done after a proper refactoring, because heap analysis shouldn’t depend on the debugger backend, and we may be able to use different heap tools.
Class analysis for C++/ObjectiveC/Swift/Dlang/Java #416
Analysis classes, accessible under the ac
command, is a relatively new feature of rizin.
They provide a way to both manually and automatically manage and use information about classes in the binary.
Devirtualize method calls using class vtables #414
Consider the following call: call dword [eax + 0x6c]
Let’s assume eax is the base pointer of a vtable we have saved in class analysis and we want to find out the actual address of the called method.
So there should be a command that takes the offset (in this case 0x6c) and looks up the actual destination. It should be possible to call this command with a specific class, so it only looks into its vtable, or without a class, so it gives a list of possible destinations for all vtables that are not too small for the offset.
When that is implemented, one could also add a command that does the same thing, but automatically takes the offset from the opcode at the current seek.
Add classes list to Vb
Vb
already supports browsing bin classes. The same thing should be implemented for classes from
analysis.
Rizin legacy code refactoring
Use internal API instead of commands
Currently, Rizin’s source code is rife with calls to rz_core_cmd()
-like functions that run the Rizin command. While it is a useful shortcut for developer, it makes a good source of the potential bugs in case of the command syntax or behavior change. If these changes happen they are invisible to the compiler, so it cannot warn on the changed syntax. It isn’t the case of changed function arguments count or type.
Thus, all these calls eventually should be substituted with direct calls to the corresponding API
functions. If there is no corresponding API function, then one should be created.
Good examples of such cases are:
- Refactor Graph processing from commands to the API use
- Refactor Visual mode from commands to the API use
- Refactor Panels mode from commands to the API use
In general you can just search for rz_core_cmd
pattern in any place inside librz/
.
Miscellaneous
Shell (dietline) improvements
Currently Rizin uses its own readline-compatible implementation of the input handling in the embedded shell that is compact and portable between all supported platforms. It supports both Emacs and Vi modes, but not all bindings and features are supported. Some are omitted by choice, but some were simply not implemented. See the “dietline”-labeled issues.
Improving regression suite and testing
It is required to solve numerous issues, along with improving parallel execution and performance. The next interesting idea is to setup and reuse Godbolt compilation engine for generating tests for different compilers and compilation options. There is even a command line tool for interacting with Godbolt - cce.
Unbreaking broken tests
Almost one thousand of tests marked as “broken” in our testsuite. The task is to take any of those, investigate why it fails, if the test makes sense now or already irrelevant today. Then to try to fix some of the broken tests.
RzGhidra
There are many small issues in the decompiler output:
pdgsd
commands showing incorrect P-code- Improvements in recovering jump tables
- rz-ghidra can’t detect string
- Ghidra Decompiler Error: Could not finish collapsing block structure
- Mishandled tail jump with relocation inside the jump function
- Prioritize keeping vars with lower addresses
- Minor improvements for the SLEIGH plugin
Some of these issues might be related on how Rizin and RzGhidra integrate and might require changes in the Rizin side.
Also note that most of these issues should be paired with the test to verify it will not break in the future.