GSOC Week 7

Continuing with the previous week’s work, I wrapped the methods using nested data structures like dict & list, overloaded methods and LibCppTest attributes containing raw pointers. I wrote additional tests to check the wrappers associated with above cases.

I also did some refactoring to enable the user to pass any integer argument without the need of explicit conversion (for e.g. passing 1 instead of “1L” or “as.integer(1)”). This is because R stores any integer type number as double by default.

Wrapping python dictionary as named list in R.

Currently, dictionary is wrapped as a named list where the name(like the key in python) is confined as a character type only. This means, a key will always be stored after being casted to character and we lose information about its type. Also, apart from the basic data types like integer, numeric and logical we cannot use any other object as name in our list as it would be impossible to convert it back to its original type unlike the primitve types.

The below code demonstrates the conversion of a python dict as a named list in R :

> dict <- reticulate::py_dict(c(42L,12L),c(2.0,1.0))
> dict
{42: 2.0, 12: 1.0}
> named_list <- py_to_r(dict)
> named_list
$`42`
[1] 2

$`12`
[1] 1

> names(named_list)
[1] "42" "12"

We also need to be careful if the function requires a dict where the key is bytes, as reticulate does not convert byte type.

> py_builtin <- reticulate::import_builtins()
> dict <- reticulate::py_dict(c(py_builtin$bytes('foo','utf-8')),c("bar"))
> dict
{b'foo': 'bar'}
> reticulate::py_to_r(dict)
$`b'foo'`
[1] "bar"
> dict <- py_eval("dict(zip([k.decode('utf-8') for k in r.dict.keys()],r.dict.values()))")
> dict
$foo
[1] "bar"
> names(dict)
[1] "foo"

Here, notice that we first need to convert the bytes into string of the python dict, then only we get the name as character type correctly.

Nested Data Structures using primitive data type

Consider, the case when we have a nested python list of integer type.

> py_run_string("x = [[[1,2,3],[4,5,6]]]")
> py_eval("x")
[[1]]
[[1]][[1]]
[1] 1 2 3

[[1]][[2]]
[1] 4 5 6

The inner most structure of the converted list is a vector instead of list. This is because a homogeneous python list of primitive data type is converted as vector by reticulate. Although nothing is wrong with this conversion but as we can have nested data structure of any type in general, so, it would be better to return a nested data structure with list at each nesting level. This, ensures similar behaviour for any type of nested data structure.

For, example:

> purrr::modify_depth(py_eval("x"),2,as.list)
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] 1

[[1]][[1]][[2]]
[1] 2

[[1]][[1]][[3]]
[1] 3


[[1]][[2]]
[[1]][[2]][[1]]
[1] 4

[[1]][[2]][[2]]
[1] 5

[[1]][[2]][[3]]
[1] 6

For all the cases, using nested data structure of primitive type, I have done hard-coding so that these data structures are returned with list at every level of depth.

Going Forward

I will start wrapping OpenMS and testing the bindings using the modified autowrap.

Written on July 23, 2020