GSOC Week 1

This week features an attempt aimed towards addressing the issue of using 64 bit integers in R. As, OpenMS utilizes 64 bit integers as an example to store unique feature IDs. This issue can also be solved by storing these IDs as “integer64” objects (of bit64 package) but many tidyverse packages like ggplot2 don’t support the integer64 type. Moreover, bit64 package cannot be used for representing 64 bit integers.

Using gmp package

GMP is a library for arbitrary precision arithmetic. The precision is only limited by the available memory, which is not a very big concern. I have used the gmp R package to represent the feature ID (often a 64 bit integer) as biginteger object.

Consider the function getUniqueId() of Feature Class

getId.JPG

Here, we first call the function getUniqueId on the python object using py_call of reticulate which returns the python output of a function without converting it into R object. We then need to wrap the integer value as string before converting it to a big integer as otherwise the argument is coerced to double and we loose precision. The py_str returns the string representation of a Python object.

If, fmap is an object of FeatureMap class, then the ID representation of a feature would be like below:

bigint.JPG

We can also easily set the Feature Id by passing the integer value to setter function as string.

setId.JPG

Reticulate gives us the power call R objects within python and vice versa. Here, id is a string holding the integer value. We first create, a global R object id1, which we can then access in the main python module by using r keyword. For, py_eval function we set the convert option to False, so that we get back the python integer object. We, then call our function passing this integer object and finally remove id1 from the global environment.

We, can easily set, clear or get the Feature Id shown as follows:

demonstrate.JPG

Creating a wrapper list to store features of a FeatureMap.

I have added a private field map_list in class FeatureMap and a function gen_map() to populate the map_list as R6 objects of class Feature at the time of loading the content from a FeatureXMLFile. If, we don’t store the Feature objects and return a new object each time, this will create a problem as previous changes won’t be present in the new object.

map_list.JPG

Here, reticulate::iterate returns a list storing all the features contained in the python featureMap object. Thus, we can access the feature by index simply through map_list.

getFeature.JPG

Written on June 9, 2020