Abstract
In this blog, I will review the code of rust-bindgen, a binding tool for binding c library to rust. I’ll start with the binding
concepts, then I will explain you the idea behind the project, and the details of how it works.
binding
Concepts of Binding
a binding from a programming language to a library or operating system service is an application programming interface (API) providing glue code to use that library or service in a particular programming language [binding-wiki]
Two related Problems [binding-acm]:
-
the mapping of specific language features/capabilities to the interface.
-
how the language bindings is documented as a standard.
Mapping Interfaces
-
direct mapping, lexically substitute source identifiers for the equivalent target identifiers.
-
simple to implement.
-
not using strong type definition and exception mechanism
-
-
abstract mapping, starts with the API’s interface semantics, and then produces a representation of the abstract interface in terms appropriate to the language at hand.
-
Make full use of entire semantics of target language
-
Requires human analysis
-
Documentation for Binding
-
thin document, cite the previous document.
-
thick document, describe the full semantics by appropriate language binding document.
Review of rust-bindgen
Idea behind the project
Rust-bindgen is a native binding generator for the rust language, originally created by Jyun-Yan You(crabtw), now maintained by Yamakaky. This project was originally ported from clay’s bindgen
The general idea behind the project is AST translation :
-
Use Clang to retrieve AST for source files.
-
Transverse the AST and mark the node with entity information (Convert
node
toglobal
entity). -
Generate new definitions for each entity, make new AST items.
-
Rewrite the code to Rust.
Project structure of rust-bindgen
The project directive is shown as below.
+-- _Cargo.toml +-- _src | +-- clang.rs | +-- gen.rs | +-- lib.rs | +-- main.rs | +-- parser.rs | +-- types.rs
-
clangs.rs
is a wrapper ofclang
related objects,Cursor
,Type
(CXType),TranslationUnit
, etc. -
types.rs
defines representations of top level entity (Global
), C types(Type
). -
parse.rs
transverse the AST, and mark the AST withGlobal
information, the result is a vector ofglobals
. (type definition, composite type, enum, global variable, function prototype and else) -
gen.rs
, classify theglobals
vector into fs(GFunc
), vs and gs, extract definitions and function, and make new AST items (Translation
). The result is definitions of the new items (defs
), and attributes of the module. -
lib.rs
, the higher wrapper forBindings
andbuilder
, it use thesyntex_syntax::print::pprust
in thewrite
function to write thedefs
to rust source file.
Intermediate Running Result of rust-bindgen
-
source code
int (*foo) (int x, int y)
-
globals, after parsing
"foo"
-
defs, after generating
Item { ident: #0, attrs: [], id: 4294967295, node: ForeignMod(ForeignMod { abi: C, items: [ForeignItem { ident: foo#0, attrs: [], node: Static(type(::std::option::Option<extern "C" fn(x: ::std::os::raw::c_int, y: ::std::os::raw::c_int)
-
rust source, after writing.
extern "C" { pub static mut foo: ::std::option::Option<extern "C" fn(x: ::std::os::raw::c_int, y: ::std::os::raw::c_int) -> ::std::os::raw::c_int>; }
Translation Rules for Function
The processing of a function definition:
-
call
extract_functions
(gen.rs) to extract(abi, ast)
-
match the
ty
field ofGFunc
, if it isTFuncPtr
go to 2.-
call the
cfunc_to_rs
function to convert-
make name by
rust_id
-
extract
attrs
bymk_link_name_attr
-
call
mk_foreign_item
to return anast
item.
-
-
return the
ast item
-
-
-
call
mk_extern
to makeast::Item
Translation Rules for Types and Consts
Decide whether it is a type
or const
-
The processing of a
consts
:-
vs =⇒
cvar_to_rs
-
make name by
rust_id
-
extract
attrs
bymk_link_name_attr
-
call
mk_foreign_item
to return anast
item.
-
-
-
The processing of
types
-
call
extract_definition
for gs,-
GType
-
GCompDecl
-
GComp
-
GEnumDecl
-
GEnum
-
GVar
-
-
Analysis of Corrode
This program reads a C source file and prints an equivalent module in Rust syntax.
-
Written in Haskell
-
Follow the idea of translating AST.
-
Use language-c to extract the command-line arguments we care about. We’ll pass the rest to the preprocessor.
-
The user may have specified the
-o <outputfile>
option. Not only do we ignore that, but we need to suppress it so the preprocessor doesn’t write its output where a binary was expected to be written. We also force-undefine preprocessor symbols that would indicate support for language features we can’t actually handle, and remove optimization flags that make GCC define preprocessor symbols. -
Run the preprocessor—except that if the input appears to have already been preprocessed, then we should just read it as-is.
-
Get language-c to parse the preprocessed source to a
CTranslUnit
. -
Generate a list of Rust items from this C translation unit.
-
Pretty-print all the items as a String.
-
Write the final string to a file with the same name as the input, except with any extension replaced by ".rs".
-
Reference
-
[binding-wiki] https://en.wikipedia.org/wiki/Language_binding
-
[binding-acm] David Emery. Standards, APIs, Interfaces and Bindings. http://oldwww.acm.org/tsc/apis.html
-
[ast-instrument] adamrehn. AST Instrumentation (examples by language). http://adamrehn.com/articles/ast-instrumentation-examples-by-language/
-
[basic-tranform-clang] Eli Bendersky, Basic source-to-source transformation with Clang, http://eli.thegreenplace.net/2012/06/08/basic-source-to-source-transformation-with-clang/
-
[transform-clang] Zellescher Weg, Performing Source-to-Source Transformations with Clang, http://llvm.org/devmtg/2013-04/krzikalla-slides.pdf
-
[globalanalysis] Diomidis Spinellis, Global Analysis and Transformations in Preprocessed Languages, http://www.spinellis.gr/pubs/jrnl/2003-TSE-Refactor/html/Spi03r.html