Monday, July 28, 2014

Unfortunately, I don't really have much progress to report.  I've been working fixing translation the last week or so.

I've spent an embarrassing amount of time working on supporting the == operator on Utf8Str (which is the class that wraps a utf8-encoded string internally.)  RPython generally doesn't support magic methods, but I had already added support for magic methods such as __getitem__ and __len__, so I decided to add support for __eq__.  Although I did have the option of just using an 'equals' method (like one would do in Java) rather than use the == operator, being able to use a Utf8Str as if it were a regular string is ideal.

The problem with implementing support for a method like __eq__ is that RPython is much more static than Python.  Whether or not to call __eq__ or to use the built-in equality operator needs to be determined at translation time rather than at run time.  Consider the following example:

class Base(object):
    pass

class Derived(Base):
    def __eq__(self, other):
        return True

def f(a):
    if a:
        obj = Base()
    else:
        obj = Derived()

    return obj == ''

When translating a function such as f, the RPython type annotator reduces the type of the variable obj to the common base class.  This means that when considering the statement ``obj == ''``, the type annotator will consider obj to be an instance of Base and won't consider that an __eq__ method may be defined in some cases.  After translation, Derived.__eq__ would never be called.  My solution was to raise an error anytime a class is encountered that defines __eq__ and has a base class which doesn't.  That is to say, the class at the root of the hierarchy must define __eq__ if any of it's subclasses are to.

Sunday, July 13, 2014

Status update

Unfortunately, I don't have much interesting to talk about this week. Lately I've been working on fixing the parts of the built-in modules that were broken by moving to UTF-8.  Most of the issues stem from the existing code expecting each unicode character to map cleanly onto one wchar_t.  Also, since the RPython library for interacting with native code, known as "rffi", assumes that you want to use RPython unicode objects when manipulating arrays of wchar_t, some extra massaging has to be done to cast to the appropriate int type.  Working on this, I finally had to set a Windows VM to test 16-bit wchar_t, which I managed to avoid until now somehow.

When I finish the modules, I'll be able to start working on making everything translate.