Languages, Systems, and Data Seminar (Fall 2020)

Time: Fridays, noon - 1:15pm (PT)
Location: The Internet
Organizers: Lindsey Kuper and Tyler Sorensen

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating students should register for the 2-credit course CSE 280O.

For fall 2020, this seminar is completely virtual. We are excited to welcome a roster of external speakers from around the world!

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
Oct. 2	None	Social event and class introduction
Oct. 9	Matthías Páll Gissurarson	Weakening Type Systems for Faster Prototyping
Oct. 16	Wen Kokke	An introduction to Session Types
Oct. 23	Kalev Alpernas	Correct and Secure Serverless Computing
Oct. 30	Daniel Lehmann	Everything Old is New Again: Binary Security of WebAssembly
Nov. 6	Uma Zalakain	Mechanising the Linear π-Calculus
Nov. 13	Katherine Ye	Penrose: from mathematical notation to beautiful diagrams
Nov. 20	Jacob Nelson	Bundled References: An Abstraction for Highly-Concurrent Linearizable Range Queries
Dec. 4	Alexa VanHattum	Vectorization for Digital Signal Processors via Equality Saturation
Dec. 11	Irene Yoon	Modular, compositional, and executable semantics for LLVM IR

Oct. 2

Class introduction

Oct. 9

Speaker: Matthías Páll Gissurarson (Chalmers University)

Title: Weakening Type Systems for Faster Prototyping

Abstract: Types and type systems are great to provide the compiler with a partial specification of our programs, but it can often be tricky to write code that matches said specification. In this talk, I will demonstrate how we can allow developers to opt-in to a weaker type system “just get it to compile”, and what we can do to “fix” the code during compilation so that it matches the specification provided using synthesis and coercions, and hint to the developer what the issue is and how they might fix it.

Oct. 16

Speaker: Wen Kokke (University of Edinburgh)

Title: An introduction to Session Types

Abstract: Much like the λ-calculus is the foundational calculus for functions, the π-calculus is the foundational calculus for message-passing concurrency. Both are terrifyingly powerful, in the sense that if you can compute something, you can do it using these languages. They’re also very scary, in the sense that you can easily write functions that “go wrong”, e.g., by getting different kinds of data mixed up, looping forever, or just getting stuck. In this talk, I’ll introduce the fundamentals of the π-calculus, and the continued effort to tame its potential for going wrong, while leaving as much of its power as possible intact. I’ll do all of this by analogy to the λ-calculus, and I’ll finish up by talk about concurrent λ-calculus—basically the answer to the question “What do I get if I smash my λs and πs together really hard?”

(The talk starts with a brief recap of the relevant bits of the untyped and simply-typed λ-calculus, just in case you could use a refresher.)

Bio: Wen is a programming languages researcher at the University of Edinburgh, where she works on session types. She is also a researcher at Heriot-Watt University, where she works on lightweight verification for neural networks. In her spare time, she enjoys cooking and runs a small art space.

Oct. 23

Speaker: Kalev Alpernas (Tel Aviv University)

Title: Correct and Secure Serverless Computing

Abstract: Serverless computing is a popular cloud computing paradigm that allows for easy deployment, rapid prototyping, and effortless, near-unlimited scalability. However, serverless computing can provide these benefits by introducing several restrictions and limitations on cloud applications, including limiting task execution time, requiring the use of ephemeral execution environments, and requiring that programs adopt an event-driven programming model. These limitations make it harder to write correct and secure applications.

In this talk I will present two projects aimed at bridging these security and correctness gaps. The first—Trapeze—is a runtime IFC system that guarantees termination-sensitive non-interference, ensuring that sensitive data never leaks from a serverless application. The second—Watchtower—is a runtime monitoring system that checks for violation of arbitrary temporal correctness properties that cross-cut the entire application reporting to the user when violations occur. Watchtower also includes a record-and-replay component for locally reproducing and debugging property violations.

Bio: Kalev is a 4th year PhD candidate at Tel Aviv University, under the supervision of Prof. Mooly Sagiv. His research interests are in the intersection of serverless computing and PL/formal methods.

Oct. 30

Speaker: Daniel Lehmann (University of Stuttgart)

Title: Everything Old is New Again: Binary Security of WebAssembly

Abstract: WebAssembly is an increasingly popular, low-level binary format designed to run code in browsers and on other platforms safely and securely, by strictly separating code and data, enforcing types, and limiting indirect control flow. Still, vulnerabilities in memory-unsafe source languages can translate to vulnerabilities in WebAssembly binaries. We have analyzed to what extent vulnerabilities are exploitable in WebAssembly binaries, and how this compares to native code. We find that many classic vulnerabilities which, due to common mitigations, are no longer exploitable in native binaries, are completely exposed in WebAssembly. Moreover, WebAssembly enables unique attacks, such as overwriting supposedly constant data or manipulating the heap using a stack overflow. In this talk, we will explain several attack primitives that allow an attacker (i) to write arbitrary memory, (ii) to overwrite sensitive data, and (iii) to trigger unexpected behavior by diverting control flow or manipulating the host environment. This can ultimately lead to new forms of cross-site scripting in the browser or remote code execution on Node.js. We will also demonstrate one of our three end-to-end exploits, which cover three different WebAssembly platforms. In our quantitative evaluation of real-world WebAssembly binaries, we also measure how likely our attack primitives are feasible in practice. Overall, our findings show a perhaps surprising lack of binary security in WebAssembly. Finally, we will discuss some potential mitigations and give recommendations on how to harden WebAssembly binaries in the future.

Nov. 6

Speaker: Uma Zalakain (University of Glasgow)

Title: Mechanising the Linear π-Calculus

Abstract: The π-calculus is a computational model for communication and concurrency. The linear π-calculus restricts the π-calculus by demanding that every communication channel is used exactly once. This results in more fine grained control over communication, avoids race conditions, and is in itself enough to serve as a target language to which the session-typed π-calculus can be compiled to.

This talk will focus on mechanizing the linear π-calculus. I will first present a mechanized syntax and an operational semantics for the untyped π-calculus. On top of that, I will use leftover typing to define a resource-aware type system that is parametrized by a set of usage coalgebras. I will compare this type system with its more traditional alternative, and comment on some of its type safety properties.

Finally, I will briefly introduce some of our ongoing research, which aims to mechanize the decidable typechecking of the linear π-calculus. We do so by borrowing ideas from co-contextual type checking algorithms and applying them to the linear π-calculus. This results in a constraint satisfaction problem that, when satisfied, returns type substitutions that can be used to mechanically build typing derivations for terms.

Bio: Uma is a 2nd year PhD student at the University of Glasgow, where she works on the machine verification of typed process calculi under the supervision of Dr Ornela Dardha. On her spare time she enjoys going out along the local riverbank for a run, and she uses every opportunity she has to hike through the Italian Alps and the Basque Pyrenees.

Nov. 13

Speaker: Katherine Ye (Carnegie Mellon University)

Title: Penrose: from mathematical notation to beautiful diagrams

Abstract: How do you design a system for automatically visualizing mathematics? In this talk I will discuss the approach taken in our SIGGRAPH 2020 paper to build a tool, called Penrose, for creating mathematical diagrams.

The basic functionality of Penrose is to translate abstract statements written in familiar math-like notation into one or more possible visual representations. Rather than rely on a fixed library of visualization tools, the visual representation is user-defined in a constraint-based specification language; diagrams are then generated automatically via constrained numerical optimization. The system is user-extensible to many domains of mathematics, and is fast enough for iterative design exploration. In contrast to tools that specify diagrams via direct manipulation or low-level graphics programming, Penrose enables rapid creation and exploration of diagrams that faithfully preserve the underlying mathematical meaning. We demonstrate the effectiveness and generality of the system by showing how it can be used to illustrate a diverse set of concepts from mathematics and computer graphics.

For more information (and pictures!), please see our paper page: https://penrose.ink/siggraph20

Nov. 20

Speaker: Jacob Nelson (Lehigh University)

Title: Bundled References: An Abstraction for Highly-Concurrent Linearizable Range Queries

Abstract: As the name suggests, range queries provide the capability to return all values of a set, whose keys are contained in a given range. In a concurrent setting, this particular operation presents challenges since it can be long running. To offer linearizable range queries, we must ensure that the operations observe a consistent snapshot of the data structure even in the midst of ongoing point operations (i.e. get, put and delete). Bundled references are a new building block to provide linearizable range query operations for highly concurrent linked data structures. At its core, a bundled reference maintains the history of a given data structure link to allow range queries to traverse a path through the data structure corresponding to a particular moment in time. By traversing only links that were “alive” at the range queries outset, a range query observes a view of the data structure that is consistent with the target atomic snapshot and is made of the minimal amount of nodes that should be accessed to preserve linearizability.

Bio: Jacob Nelson is a third-year PhD student at Lehigh University in Bethlehem, PA. His interests encompass a wide range of systems topics including highly concurrent data structures and distribution using remote direct memory access (RDMA). His most recent work is focused on how to leverage high performance one-sided RDMA operations to build a data-movement oriented transactional key-value store. Check out his website for more info!

Dec. 4

Speaker: Alexa VanHattum (Cornell University)

Title: Vectorization for Digital Signal Processors via Equality Saturation

Abstract: Compute-heavy embedded systems, from augmented reality to 5G networking, rely on specialized hardware in the form of digital signal processors (DSPs). However, DSPs are designed to prioritize energy efficiency and predictability over programmability, with simple in-order architectures that offer little hardware-provided parallelism. Existing auto-vectorizing compilers can struggle to optimize small linear algebra kernels that rely on complex data movements. An expert can reach state-of-the-art performance by hand-writing specialized implementations to use vector instructions, but they must repeat this manual effort for each size instance. In this talk, I will describe Diospyros, a search-based compiler that automates this task of finding efficient vectorizations for smaller linear algebra kernels. Diospyros combines two automated reasoning techniques, symbolic evaluation and equality saturation over rewrite rules, to vectorize computations with irregular structure. We show that Diospyros outperforms DSP libraries by 2.8x on average and demonstrate how search-based techniques can help users reach performance competitive with expert tuning with less manual effort.

Dec. 11

Speaker: Irene Yoon (University of Pennsylvania)

Title: Modular, compositional, and executable semantics for LLVM IR

Abstract: The LLVM framework has been a long-time favorite for compiler enthusiasts. It is both a widely used industrial-strength compiler (most recognizably, Apple’s latest macOS and iOS development tools) and a popular research tool (winning the ACM Software Systems Award in 2012).

How do we best ensure that LLVM-based tools (compilers, optimizers, code instrumentation passes, etc.) do what they’re supposed to — especially for safety- or security-critical applications?

This talk is an introduction to a novel formal semantics for a large, sequential subset of the LLVM IR, mechanized in the Coq proof assistant. We will see how the use of modern semantic reasoning techniques allow us to write a compositional, modular, and executable semantics. In particular, I will discuss how the development of an interaction tree-based semantics gives us (1) expressive combinators for defining compositional semantics, (2) a modular separation of concerns for effects in a language, and (3) a “for-free” extraction of an executable definitional interpreter.

No experience with LLVM or formal verification technologies will be assumed.

Source can be found on our GitHub