Current Projects:

Below are a list of some of the current projects I'm working on with links to any related publically available code repositories.

Natural Language and Programming Language Comparisons

Intuition suggests people find reading and writing programs more difficult than their native languages. However, statistical language modeling shows that the actual text of natural language is less repetitive, less predictable, and more ambiguous than code. Some of this restrictiveness is imposed by the constraints of code - the need for code to compile, restricted grammar, etc, but is some of it also a deliberate choice by developers to constrain their choices in code because the language is more difficult? While good coding style recommends code re-use via modularization, copy pasted code and patterns are still quite common. I would like to more formally define language difficulty and associate it with the observable features of code and natural language. This would allow a grounded approach to comparing not only between natural language and programming language, but also between different programming languages. Knowing what elements of programming are most cognitively difficult would allow better tools and more targeted approaches to making programming languages both easier to learn and use.

Other:

Cache Model

I have repackaged and slightly modified Zhaopeng Tu's Cache Model in a more easily used form and included sample corpora from both code and English sources. Check it out here.