WRITTEN BY: Matt Busekroos
With her research group, AnyScale Learning for All (ALFA), Una-May O’Reilly develops new data-driven analyses of online programming courses, deep learning techniques for program representations, adversarial attacks on models of programs, model training for adversarial robustness, cyber hunting tools, and cyber arms race models. O’Reilly joined CSAIL in 1996 after earning a PhD from Carleton University in Ottawa, Canada.
We asked O’Reilly to contrast deep learning and deep understanding. According to her, deep learning used in the field refers to a set of computer science techniques where machine learning is applied to data to arrive at some sort of prediction or insight from that data. She said deep learning is a way of training and developing models, while deep understanding is really about meaning.
“What we're seeking with deep learning and all these algorithm techniques is to have a deep understanding of some piece of data, whether it be code or it be text, or it be images,” O’Reilly said. “We are really seeking that deep understanding and we're not there yet.”
O’Reilly faces some challenges in this area of research. For example, “It's really hard to understand code, but we need to have a deep understanding of code before we can try and automatically, for example, find a bug or see whether someone has altered code to make it run maliciously,” she said.
The group’s efforts in code understanding with deep learning include thinking about how coding leaves vulnerabilities that attackers can exploit.
“We can develop detectors of malware, but these machine learning detectors of malware are deep learning networks,” O’Reilly said. “They can be attacked just the same way as models that are doing image processing can be attacked. My group has looked at the nature of those attacks - whether they target binary code, because you often run a lot of apps with their binary representation, or whether the adversary is actually modifying source code. We are trying to detect whether the source code has been modified.”
O’Reilly and her group also pursued some experiments to try and understand reading code. They collaborated with neuroscientists at MIT who are specialists in language and who research how language is processed in the brain.
“We did fMRI (functional magnetic resonance imaging) experiments where we showed subjects pieces of code,” O’Reilly said. “Then in contrast, we showed them a piece of text that described the same actions taking places in the code.”
The group was able to determine how much the language region was recruited to process the code, in contrast to the brain region devoted to logic and mathematical reasoning. They surveyed approximately 20 to 30 people who had competence in writing Python and pre-tested them to find out their abilities. Additionally, the group came up with a protocol of contrasting questions that allowed them to tease out the use of language versus mathematics from those examples. Throughout this experiment, they observed subjects’ brains in the fMRI machine, looking for contrasting activations in the identified regions.
Additionally, O’Reilly mentions her group worked on a project where they were looking for bugs in smart contracts – the software programs that run when running a digital currency.
“We had a project where we actually scraped smart contracts from the digital memory,” O’Reilly said. “We were looking for the vulnerabilities that someone might be able to come in and exploit and steal digital assets. We were looking at Solidity, a language for writing smart contracts in Ethereum.”
On the horizon, O’Reilly said one thing she’s been interested in working with her team is understanding how people learn to code.
“One of the things we do is we look at the data for students who are taking an online “Introduction to Programming” course,” she said. “And what we're able to do with data science is to actually take each one of those students’ interaction sequences as they move through the digital platform. We can analyze those digital sequences to come up with a narrative of how a student is learning.”
Working on ML approaches to code bugs and detector attacks dovetails with a broader focus of the ALFA group on Artificial Adversarial Intelligence. ALFA is interested, in general, in how conflicts requiring intelligence (especially learning) escalate. They are just starting to examine disinformation, examining it through the lens of adversarial behavior, and considering its role in climate change denial or political manipulation.