Sunday, May 18, 2014

Introduction

About me:

I am from Topeka, Kansas. I just finished a bachelor's degree in computer science and will start working on a Master's degree in the fall.
This is the second year I have been selected to participate in a GSoC project. Last year I worked on creating a cffi-port of wxPython, which I, unfortunately, was not able to finish. This summer I will be working on another project with PyPy.

Unfortunately, classes only ended last Friday for me, so I did not have of an opportunity to do much for this project during the community bonding period.

About the project:

The title of my project is: Improvements for Bytearrays and Unicode Strings in PyPy. The project has two separate parts (as you could probably guess from the title):

The first part of my project will be to fix bytearrays in PyPy. Currently, the complexity of many of the operations on bytearrays are incorrect. At some point, the str, unicode, and bytearray implementations were refactored so they shared more code. Unfortunately, this caused a number of operations on bytearrays to create an RPython string representation of the array, which requires making a copy (bytearrays being mutable and RPython strings being immutable.) This was included in my project since it was something that really needed to be fixed and to help make sure the project would fill the entire summer.  This is what I will be working on first.

The more significant part of my project will be to convert PyPy's unicode implementation to UTF-8.  Currently, unicode objects in PyPy internally use RPython unicode objects, which use either UTF-16 or UTF-32 depending on the size of the platform's wchar_t. The idea is to simplify the implementation somewhat by implementing only one representation while providing the same application-level interface, independent of the platform.  I'll have more to write about this when I start working on it.