GSOC Week 3

The wrapper script is available here

This week I completed creating the reticulate wrappers for C++ constructs provided by libcpp test script. For all the process functions, which can be wrapped by autowrap, we can also create reticulate wrappers for them. Processes numbered from 211 to 214, have been wrapped by autowrap but after generating the extension module for libcpp_test and running them leads to crashing of python session which may be due to errorful wrapping of these functions.

process213(std::map<int, float> & in, std::map<std::string, std::vector< std::vector<Int> > > & arg2) is not wrapped by autowrap as their has no conversion provider corresponding to it. All the other processes have been wrapped in reticulate and conversions are happening as expected.

Using native R data structures to facilitate conversion from R to python and vice-versa.

All process functions ranged from 16 to 21 take arguments of type std::map<>. In python, the C++ map can be represented as a dictionary. In R, we don’t have any native map data structure. However, reticulate converts a python dictionary to named list in R,with the keys of the dictionary as names of the list elements. So, the conversion of python dictionary is possible but their are also some cons associated with this conversion.

The names are always stored as character type whatever be the data type of the original object.
R allows for different elements to have the same name.

This would mean, we need to check that the names are unique for each such argument type in our function. Also, we cannot store any R object as name, as it will be stored as string and we the loose information about that object.

Some functions take set type as argument. If we pass a list/vector with unique values to the function, then the final list/vector returned would be unordered, since internally we convert R list/vector to python set which is unordered.

The conversion of other data structures like nested vectors, pair is easily achieved using reticulate. In, some cases, there will be extra overhead to perform conversion not supported by reticulate. Example - converting R character string to bytes in python, if the python function strictly requires bytes.

Wrapping of a function where arguments are of map type.

Consider process20

	#' void  process20(libcpp_map[int, float] & in_)
	#' 
	#' takes a named list of double type and names as integer type.
	process20 = function(dict){

	# check that list is named, all name values can be converted to integer type and the list elements are of double type.
	stopifnot(is_list(dict) && !is.null(names(dict)) && all(check.numeric(names(dict),only.integer = T)) && all(sapply(dict,function(d) is_double(d))))

	# check that list name should be unique.
	if(!length(unique(names(dict))) == length(names(dict))) {
	stop("List names should be unique")
	}

	if(length(dict)==1){
	py$key <- list(as.integer(names(dict)))
	py$val <- unname(dict)
	} else{
	py$key <- as.integer(names(dict))
	py$val <- unname(dict)
	}

	py_run_string("d = dict(zip(key,val))")

	py_call(private$py_obj$process20,py_eval("d",convert = F))

	ans<-py_eval("d")
	py_run_string("del key;del val;del d")
	py_gc$collect()

	tryCatch({
	eval.parent(substitute(dict<-ans))
	invisible(NULL)
	}, error = function(c){invisible(NULL)} )

},

We first perform all the necessary checks on the argument. Next, we create two python lists key and val to store the names and elements of the list. Then, we create a dictionary using these lists. Now, we call the python function passing the dictionary as argument. Convert the dictionary and store it before deleting the python object. Finally, if we want pass by reference behaviour, we change our original list.

Using eval.parent(substitute()), we can evaluate expressions in the parent environment of the function. If, we simply perform this change inside the function enviornment, then a copy of the argument is first created and all the changes are made to this copy. This, is because of the copy on modify semantics in R. However, we need to careful while implementing this approach for achieving pass by reference behaviour.

Consider a simple example:

fun <- function(xval){
  eval.parent(substitute(xval<-4))
}

This function behaves as expected if a variable is passed to it. However, if the passed value is not a variable, then it becomes an invalid assignment operation and results in error.

> x <- 5
> fun(x)
> x
[1] 4
> fun(6)
 Error in 6 <- 4L : invalid (do_set) left-hand side to assignment

A possible workaround this problem(shown above in process20), is to do this operation inside tryCatch block in order to handle the error.

Testing process20

C++ definition

	void  process20(std::map<int, float> & in)
    {
            in[23] = 42.0;  
    }

So, we can handle both the cases as seen in the above example.

Here is an example, for a function where values are passed by reference and and it its return type is not void.

C++ definition

	std::pair<int, LibCppTest> process4(std::pair<int, LibCppTest> & in)
    {
            in.first = 42;
            return std::pair<int, LibCppTest>(in.first, in.second);
    };

Written on June 22, 2020