Code provides a unique environment to study human cognitive processes as it has dual channel constraints - developers simultaneously write code for two 'audiences', one for the machine that executes code and one for the humans that must read, comprehend, and maintain it. The natural language (NL) channel (comments, variable names, how code is structured, etc.) targets only the developer and the algorithmic (AL) channel (the derived executable meaning of the code) is used by both developers and the machine. Acknowledging the asymmetries in these channels and using their interactions are underexplored research areas that can be used to improve software engineering and program analysis tools.
Moreover, though coding shares some properties with natural language, these constraints (among other things) impact how code is written, and the formal AL channel of code enables experiments not possible in natural language. Therefore, studies of code can teach us not only about how humans think when coding, but also by contrast about how they think when writing natural language.
The use of language models for code has led to an explosion of tools for purposes such as defect prediction and repair, code completion, and recovering minified/obfuscated code. Underlying many of these tools is the metric of surprisal, with lower model surprisal being considered 'better'. Though surprisal has been linked to cognitive load in natural language, it's role in programming language is less understood.
My current project takes advantage of these dual channel constraints to study the impact of language model surprisal in code and help validate its use. We can create code that is exactly equivalent on the AL channel, but different in NL channel; i.e. code with the same meaning, but different implementations. This framing lets us precisely measure the effect of surprisal in a way not possible in natural language, where different ways of saying things can possess different connotative meaning. I have been presenting equivalent code fragments with different surprisal to human subjects to see how they impact preference and ease of comprehension. Do humans prefer less surprising code? Can they execute such code more accurately and quickly? How large are these effects, and can such transformations be used to improve existing software?