Languages, Systems, and Data Seminar (Spring 2022)

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper and Tyler Sorensen

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For spring 2022, we will continue to host the LSD Seminar on Zoom, but local folks can gather in person in the lab to tune into the Zoom talks together, assuming in-person meetings are allowed.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
April 1	Olivia Hsu	Compilation of Sparse Array Programming Models
April 8	Suyash Gupta	Resilient and Scalable Architecture for Permissioned Blockchain Fabrics
April 15	Amanda Liu	Verified Tensor-Program Optimization Via High-Level Scheduling Rewrites
April 22	Will Crichton	Modular Information Flow Through Ownership
April 29	Slim Lim and Geoffrey Litt	Peritext: A CRDT for Collaborative Rich Text Editing
May 6	Asta Halkjær From	A Naive Prover for First-Order Logic Formalized in Isabelle/HOL
May 13	Farid Zakaria	Reproducibility is hard: Insights and improvements at the bottom of the stack
May 20	Michael Coblenz	Squashing Bugs and Empowering Programmers with User-Centered Programming Language Design
May 27	Douglas Creager	Incremental, zero-config Code Navigation using stack graphs
June 3	Yao Li	Program Adverbs and Tlön Embeddings

April 1

Speaker: Olivia Hsu

Title: Compilation of Sparse Array Programming Models

Abstract: This talk explains how to compile sparse array programming languages. A sparse array programming language is an array programming language that supports element-wise application, reduction, and broadcasting of arbitrary functions over dense and sparse arrays with any fill value. Such a language has great expressive power and can express sparse and dense linear and tensor algebra, functions over images, exclusion and inclusion filters, and even graph algorithms. This talk describes our compiler strategy, which generalizes prior work in the literature on sparse tensor algebra compilation to support any function applied to sparse arrays, instead of only addition and multiplication. To achieve this, we generalize the notion of sparse iteration spaces beyond intersections and unions. These iteration spaces are automatically derived by considering how algebraic properties annotated onto functions. We then show how to compile these iteration spaces to efficient code.

Bio: Olivia is a computer science PhD student at Stanford University advised by Professor Kunle Olukotun and Professor Fredrik Kjolstad. She currently works on mapping sparse applications to domain-specific architectures, reconfigurable dataflow hardware, and accelerators through the TACO compiler. Her research interests broadly include computer architecture, computer and programming systems, compilers, programming models and languages, and digital circuits/VLSI.

April 8

Speaker: Suyash Gupta

Title: Resilient and Scalable Architecture for Permissioned Blockchain Fabrics

Abstract: Since the introduction of Bitcoin—the first widespread application driven by blockchains—the interest in the design of blockchain-based applications has increased tremendously. At the core of these blockchain applications are consensus protocols that aim at securely replicating a client request among all replicas, even if some replicas are Byzantine faulty. Unfortunately, modern consensus protocols either yield low throughput or face design limitations. In this work, we present the design of three consensus protocols that facilitate efficient consensus among the replicas. Our protocols help to scale consensus through the principles of phase-reduction, parallelization, and geo-scale clustering while ensuring no compromise in fault-tolerance. Further, we believe that the focus on consensus protocols is only one-side of the story. Specifically, we present the design of a well-crafted permissioned blockchain fabric (ResilientDB) that can help even a slow consensus protocol outperform a faster protocol. Our results indicate that it is easy to scale BFT protocols to hundreds of replicas and achieve throughputs of the order 350K txns/s.

Bio: Suyash Gupta is a postdoctoral researcher at the RISELab, University of California, Berkeley. He is also the Lead Architect of ResilientDB fabric. Prior to joining RISELab, he received his Ph.D. degree from University of California, Davis. He also holds two Master of Science degrees; one from Purdue University and another from Indian Institute of Technology Madras. His current research focuses on attaining safe and efficient, fault tolerant distributed consensus and communication. He has also co-authored a book on fault-tolerant distributed transaction processing at Morgan & Claypool. In his free time, Suyash likes to code and has won awards at several hackathons.

April 15

Speaker: Amanda Liu

Title: Verified Tensor-Program Optimization Via High-Level Scheduling Rewrites

Abstract: We present a lightweight Coq framework for optimizing tensor kernels written in a pure, functional array language. Optimizations rely on user scheduling using a series of verified, semantics-preserving rewrites. Unusually for compilation targeting imperative code with arrays and nested loops, all rewrites are source-to-source within a purely functional language. Our language comprises a set of core constructs for expressing high-level computation detail and a set of what we call reshape operators, which can be derived from core constructs but trigger low-level decisions about storage patterns and ordering. We demonstrate that not only is this system capable of deriving the optimizations of existing state-of-the-art languages like Halide and generating comparably performant code, it is also able to schedule a family of useful program transformations beyond what is reachable in Halide.

Bio: Amanda Liu is a second year PhD student working with Prof. Adam Chlipala and Prof. Jonathan Ragan-Kelley. Her interests are using formal methods and programming languages to develop verified, principled methods for writing high-performance systems.

April 22

Speaker: Will Crichton

Title: Modular Information Flow Through Ownership

Abstract: Statically analyzing information flow, or how data influences other data within a program, is a challenging task in imperative languages. Analyzing pointers and mutations requires access to a program’s complete source. However, programs often use pre-compiled dependencies where only type signatures are available. We demonstrate that ownership types can be used to soundly and precisely analyze information flow through function calls given only their type signature. From this insight, we built Flowistry, a system for analyzing information flow in Rust, an ownership-based language. In this talk, I will describe how Flowistry works, how we prove its soundness, and how we show that the ownership-based modular approximation does not lose much precision versus a whole-program analysis.

Bio: Will Crichton is a 6th year PhD student at Stanford University advised by Profs. Pat Hanrahan and Maneesh Agrawala. His research combines cognitive psychology and programming language theory to understand how people program, and to design better tools for programmers.

April 29

Speakers: Slim Lim and Geoffrey Litt

Title: Peritext: A CRDT for Collaborative Rich Text Editing

Abstract: Conflict-Free Replicated Data Types (CRDTs) support decentralized collaborative editing of shared data, enabling peer-to-peer sharing and flexible branching and merging workflows. While there is extensive work on CRDTs for plain text, much less is known about CRDTs for rich text with formatting. No algorithms have been published, and existing open-source implementations do not always preserve user intent. In this talk, we describe a model of intent preservation in rich text editing, developed through a series of concurrent editing scenarios. We then describe Peritext, a CRDT algorithm for rich text that satisfies the criteria of our model. The key idea is to store formatting spans alongside the plaintext character sequence, linked to a stable identifier for the first and last character of each span, and then to derive the final formatted text from these spans in a deterministic way that ensures concurrent operations commute.

Bios:

Slim Lim is a first-year PhD student at UC Berkeley studying the intersection of programming languages and human-computer interaction. She is particularly interested in type systems for UI programming and UI programming for type systems. Previously she worked as a software engineer developing collaborative text editing features. Even more previously, she researched semantic tooling for Cascading Style Sheets, culminating in the Inactive CSS inspection feature in Firefox 70.

Geoffrey Litt is a third-year PhD student at MIT in HCI, researching end-user programming and local-first collaboration. His recent research interests include powering UIs with a malleable client-side database, and managing schema evolution in distributed systems

May 6

Speaker: Asta Halkjær From

Title: A Naive Prover for First-Order Logic Formalized in Isabelle/HOL

Abstract: When using sequent calculus to prove a formula in classical first-order logic by hand, we may rely on our intuition to pick the right sequence of rule applications, choose the most useful instantiations of variables and so on. In implementing a prover to do so, we must very carefully ensure that the mechanical process always finds a proof for a provable formula. This can be intricate and proving that the crafted algorithm lives up to expectations can be even more challenging, especially in a formalized setting. To mitigate all this, we may instead rely on a fair stream of all possible rule applications to all possible formulas. Such a stream works as an oracle, telling us exactly which rule to apply to which formula, resulting in a very simple prover. In this talk, I present a prover based on this technique, with formalized soundness and completeness proofs in the proof assistant Isabelle/HOL.

Bio: Asta is a 3rd and final year PhD student in computer science and logic at the Technical University of Denmark where she is advised by Jørgen Villadsen and Nina Gierasimczuk. She works on using proof assistants, especially Isabelle/HOL, to develop and formalize results in logic and is also interested in the pedagogical benefits of using interactive theorem provers in teaching. She is the author of multiple entries in the Archive of Formal Proofs.

May 13

Speaker: Farid Zakaria

Title: Reproducibility is hard: Insights and improvements at the bottom of the stack

Abstract: One of the fundamental data management units within a Linux system are shared object files with which processes dynamically link to at startup. The mechanism and approach to dynamic linking is becoming increasingly inadequate for complex systems because of the lack of data management principles. Novel software packaging models, such as Nix, have emerged and are becoming increasingly popular to try to tame the chaos induced by the Filesystem Hierarchy Standard but suffer from limitations of a simplistic schema. Restricted to a minimal schema that only allows modification of a list of directory paths to modify the search query, the resolution of needed dependencies can quickly become prohibitively costly. Shrinkwrap is a tool that when used within hermetic packaging models such as Nix, embosses the required dependencies to fixed locations avoid repetitive lookups by storing the absolute paths of the needed dependencies. This ability to have queries cached is currently missing from these increasingly popular store-based distributions and has meaningful improvements on startup times by up to 35x.

Bio: Farid Zakaria is a 2nd year PhD student at University of California Santa Cruz and is advised by Dr. Carlos Maltzhan. He is currently looking into opportunities to improve reproducibility of software, especially deeper in the software stack. He is an active member of the NixOS community and his research areas include linkers, build systems and operating systems.

May 20

Speaker: Michael Coblenz

Title: Squashing Bugs and Empowering Programmers with User-Centered Programming Language Design

Abstract: Programming languages are simultaneously formal systems and user interfaces with which programmers work. Unfortunately, programmers have a hard time writing safe software: serious bugs and security vulnerabilities are common. In many cases, however, languages with strong safety guarantees have been hard to use. In this talk, I’ll discuss user-centered design methods I developed to help language designers create languages that are easier to use. I’ll show how I created and evaluated Obsidian, a new smart contract language that uses a linear type system, and Bronze, a new garbage collector for Rust. I found that programmers are more effective when using the resulting languages than when using prior languages.

Bio: Michael Coblenz is a Basili Postdoctoral Fellow at the University of Maryland. Michael developed the PLIERS method, which integrates user-centered methods into the design process for safe programming languages. He created Glacier, an immutability system for Java; Obsidian, a strongly-typed language for smart contracts; and Bronze, a garbage collector that improves usability of Rust. He holds a Ph.D. in computer science from Carnegie Mellon University. Previously, he was a Senior Software Engineer at Apple. His work, which is at the intersection of programming languages, software engineering, and human-computer interaction research, has been published at OOPSLA, TOPLAS, ICSE, and TOCHI.

May 27

Speaker: Douglas Creager

Title: Incremental, zero-config Code Navigation using stack graphs

Abstract: Exploring a large or unfamiliar codebase can be tricky. Code Navigation features like “jump to definition” and “find all references” let you discover how different pieces of code relate to each other. To power these features, we need to extract lists of symbols from the code, and describe the language-specific rules for how those symbols relate to each other.

It’s difficult to add Code Nav to a large hosted service like GitHub, where we must support hundreds of programming languages, hundreds of millions of repositories, and petabytes of history. At this scale, we have a different set of design constraints than a local IDE. We need our data extraction to be incremental, so that we can reuse previous results for files that haven’t changed in a newly pushed commit, saving both compute and storage costs. And to support cross-repo lookups, it should require zero configuration — repo owners should not have to set up anything manually to activate the feature.

In this talk I’ll describe stack graphs, which use a graphical notation to define the name binding rules for a programming language. They work equally well for dynamic languages like Python and JavaScript, and for static languages like Go and Java. Our solution is fast — processing most commits within seconds of us receiving your push. It does not require setting up a CI job, or tapping into a project-specific build process. And it is open-source, building on the tree-sitter project’s existing ecosystem of language tools.

Bio: Douglas Creager manages the Semantic Code team at GitHub. We apply academic programming language research to build productivity tools that can operate on all of the code hosted on GitHub. Our goal is to make it easier for developers and maintainers to understand what their code does, and how other developers use it.

June 3

Speaker: Yao Li

Title: Program Adverbs and Tlön Embeddings

Abstract: This will be a two-part talk. In the first part, I’d like to present you the first formal verification of a networked server implemented in C. Interaction trees, a general structure for representing reactive computations, are used to tie together disparate verification and testing tools (Coq, VST, and QuickChick) and to axiomatize the behavior of the operating system on which the server runs (CertiKOS). The main theorem connects a specification of acceptable server behaviors, written in a straightforward “one client at a time” style, with the CompCert semantics of the C program. The variability introduced by low-level buffering of messages and interleaving of multiple TCP connections is captured using network refinement, a variant of observational refinement. In the second part, I’ll dive into a question to the first part: “why interaction trees”? I’d like to show you that, inspired by applicative functors, selective functors, and other structures, we can define a collection of data structures and theories, which we call program adverbs, that capture a variety of computational patterns. Program adverbs are themselves composable, allowing them to be used to specify the semantics of languages with multiple computation patterns. We use program adverbs as the basis for a new class of semantic embeddings called Tlön embeddings.

Bio: Yao Li is a Ph.D. candidate at the University of Pennsylvania (advised by Stephanie Weirich) and an incoming assistant professor at Portland State University (Fall 2022). His main research interests are programming languages and formal verification. His research aims to (1) advance the state of the art of verification on real-world software and (2) make verification easier to use from a programming languages perspective. Previously, Yao Li obtained his bachelor’s and master’s degrees in Software Engineering from Shanghai Jiao Tong University. He was a research intern at Microsoft Research in 2018 and a visiting researcher at Università della Svizzera italiana in 2015. He also served as a mentor in Research Experience for Undergraduates at the University of Pennsylvania in 2021 and in Google Summer of Code for Scala in 2016.