Languages, Systems, and Data Seminar (Winter 2024)

Time: Fridays, noon - 1:05pm (PT)
Location: The Internet / The LSD Lab (Engineering 2, Room 398)
Organizers: Lindsey Kuper, Tyler Sorensen, and Gan Shen

The Languages, Systems, and Data Seminar meets weekly to discuss interesting topics in the areas of programming languages, systems, databases, formal methods, security, software engineering, verification, architecture, and beyond. Our goal is to encourage interactions and discussions between students, researchers, and faculty with interests in these areas. The seminar is open to everyone interested. Participating UCSC students should register for the 2-credit course CSE 280O (let the organizers know if you’re an undergrad and need a permission code).

For winter 2024, we will continue to host the LSD Seminar in a hybrid fashion. Anyone can attend on Zoom, and local folks can gather in person in the lab. Speakers can join either in person or on Zoom, whichever is convenient.

Talks will be advertised on the ucsc-lsd-seminar-announce (for anyone) and lsd-group (for UCSC-affiliated people) mailing lists.

Date	Speaker	Title
Jan. 12	Beta Ziliani	Making monkeys and ducks behave with Crystal Lang
Jan. 19	Talk rescheduled for Feb. 9
Jan. 26	Devashish Purandare	Shimmer: Supercharging Adoption of Modern SSD Interfaces
Feb. 2	Joe Cutler	delta: ordered types for stream processing
Feb. 9	Katherine Philip	Formalizing Type-Directed Specialization
Feb. 16	Laura Israel	The Persistence of Past: A Demand Semantics for Mechanized Cost Analysis of Lazy Programs
Feb. 23	Achilles Benetopoulos	Don’t Let APIs Constrain Your Distributed Systems
March 1	Karuna Grewal	Expressive Policies for Microservice Networks
March 8	Abhiroop Sarkar	HasTEE+ - Confidential Computing with Haskell
March 15	Matthew C. Davis	NaNofuzz to TestLoop: A Journey from Empirical to Theoretical Research (and back again)

Jan. 12

Speaker: Beta Ziliani

Title: Making monkeys and ducks behave with Crystal Lang

Abstract: In the zoo of programming languages there are two cute yet rather misbehaved animals, typically found in the Dynamic Languages section: the Duck Typing and the Monkey Patching.

The Duck Typing is hardly seen. You hear a “quack!“, but you can’t easily tell if it’s coming from an actual duck, a parrot, or a recording. Monkey Patching, like the name suggests, patches any existing creature to change their behavior. It can even make a dog quack!

While these two animals bring lots of joy, they are also quite dangerous when used in the wild, as they can bring unexpected behavior to the rest of the creatures.

Crystal is a rarity in the Static Languages section that has Duck Typing and Monkey Patching. Given the strong – yet barely visible – fences of types, it manages to properly contain these beasts. In this talk I will present Crystal and provide a glimpse at how it manages to feel so dynamic while being safe and efficient.

Bio: Beta leads the development of the Crystal Programming Language and teaches about programming languages at Universidad Nacional de Córdoba in Argentina. With a recent past as a researcher in programming languages, he developed tools and theories for two very different languages: Coq and Lua. He has no ducks nor monkeys, despite them being effective weapons against Córdoba’s venomous scorpions.

Jan. 19

Rescheduled for Feb. 9

Jan. 26

Speaker: Devashish Purandare

Title: Shimmer: Supercharging Adoption of Modern SSD Interfaces

Abstract: While the demand for NAND-flash-based storage keeps growing, capacity increase comes at the cost of degradation in performance and device lifetime. Traditional storage abstractions exacerbate such degradation, increasing garbage collection overhead due to in-place updates. Even flash-optimized append-only systems suffer the overhead of interleaving unrelated data streams. Host-device coordination techniques allow the host to provide the device with hints, which help the SSD to ensure performance isolation across write streams and reduce garbage collection overhead by grouping data related by lifetime. However, such devices have seen limited adoption due to the difficulty of rewriting applications and filesystems and the impact on their stability, security, and portability in the face of changing interfaces.

We present Shimmer, an all-userspace shim layer that enables host-device coordination with no change to the system or the application. Shimmer utilizes application insights to provide intelligent placement hints to any capable storage backend and can shield applications and filesystems from changing interfaces. We demonstrate Shimmer’s ease of adoption by enabling host-device coordination for popular IO intensive applications: RocksDB, WiredTiger and Cachelib. Shimmer improves write throughput by 20-90%, depending on the application over modern filesystems like f2fs and can match application-specific backends like ZenFs, offering up to 14× lower tail latency and reduced write amplification.

Bio: Dev is a PhD candidate researching storage systems, SSDs, and data management at the Center for Research in Storage and Systems, UC Santa Cruz. He works on systems that enable easier host-device coordination for performance and reliability improvements. His latest project, Shimmer, allows easy shim layers to generate and provide lifetime hints that improve the performance of log-structured systems on top of SSDs with no change to applications or operating systems.He will be graduating in March and is actively seeking full time opportunities.

Feb. 2

Speaker: Joe Cutler

Title: delta: ordered types for stream processing

Abstract: We present delta, a new language for building stream processing programs. Unlike other streaming languages which restrict the programmer to a small set of combinators, programs in delta are written as standard functional list programs, written as if they transform the entire input at once. However, delta is equipped with a semantics which runs these list programs as stream programs, producing their outputs incrementally as inputs arrive. To accomplish this, delta uses a novel “ordered” substructural type system to ensure that all well-typed programs can be executed incrementally. The type system also enforces further guarantees, ensuring that the programs are monotone — they never “retract” a previous output — and that they are deterministic, even in the presence of parallel inputs. In the talk, I’ll discuss the design of the delta type system, how it guarantees the above properties, and how it enables a radical new programming model for stream processing.

Bio: Joe Cutler is a 3rd year PhD Student at The University of Pennsylvania, where he works with Benjamin Pierce on all things type systems.

Feb. 9

Speaker: Katherine Philip

Title: Formalizing Type-Directed Specialization

Abstract: We present a formal study of type-directed specialization, an implementation and optimization technique for implementing parametric polymorphism in programming languages. Type-directed specialization (also known as monomorphization) systematically eliminates polymorphic code by generating monomorphic copies, each specialized for a particular type. We focus on the full specialization of parametric polymorphic expressions in an ML-like language that uses the Hindley-Milner type system. We formalize this with a family of representation transformation functions that translate source language programs into a novel target language that is capable of binding the potentially infinite families of specialized functions generated by full specialization. A key contribution of this paper is to prove that these functions preserve typing for all well-typed source programs. Finally, we lay the groundwork for future study of semantics preservation and for the formalization of other kinds of specialization and representation transformations.

Bio: Katherine is a PhD student at Portland State University, advised by Mark P. Jones. They are interested in the design and implementation of efficient languages for low-level systems development. Currently, they are working on the Habit programming language (https://www.habit-lang.org/).

Feb. 16

Speaker: Laura Israel

Title: The Persistence of Past: A Demand Semantics for Mechanized Cost Analysis of Lazy Programs

Abstract: Lazy evaluation is a powerful tool that enables better compositionality and potentially better performance in functional programming, but it is challenging to analyze its computation cost. Existing works either require manually annotating sharing, or rely on separation logic to reason about heaps of mutable cells. In this paper, we propose a bidirectional demand semantics that allows for reasoning about the computation cost of lazy programs without relying on special program logics. To show the effectiveness of our approach, we apply the demand semantics to formally prove that Okasaki’s banker’s queue is both amortized and persistent using the Coq theorem prover. We also propose the reverse physicist’s method, a novel variant of the classical physicist’s method, which enables mechanized, modular and compositional reasoning about amortization and persistence with the demand semantics.

Bio: Laura Israel is a first-year doctoral student at Portland State University studying formal verification with Yao Li. She has a particular interest in complexity theory and its formalization in proof assistants. Laura developed a passing interest in theoretical CS while getting her BA in Psychology at Reed College, resulting in a stint in the tech industry, then culminating in her current research at PSU. When she’s not doing math thinly-veiled as computer science, you can find her hanging out with her cat Socks, running Dungeons and Dragons, or writing (and occasionally publishing) short stories.

Feb. 23

Speaker: Achilles Benetopoulos

Title: Don’t Let APIs Constrain Your Distributed Systems

Abstract: Large scale distributed applications frequently consist of compositions of independent systems communicating over RPC-like interfaces. Despite the familiar model they offer, RPCs constitute a low-level programming interface; just like an application programmer needs to be aware of how a computer executes their program when writing in something like C, they need to become distributed systems experts to correctly implement complex distributed application logic. Additionally, the shared-nothing model that underpins these systems as well as the hard boundaries established by the remote service interfaces introduce overheads in the hot path of applications and preclude end-to-end optimizations. As distribution becomes more pervasive, and as our understanding of the limitations of attempting to scale out computation over the network deepens, it is time to reimagine the role of infrastructure in orchestrating distribution and empowering domain experts with big data problems but not necessarily the distributed systems expertise to implement a bespoke system for their needs.

We introduce Magpie, a distributed single-level store and runtime. Magpie, like Distributed Shared Memory systems before it, organizes application data in a unified global address space. Unlike previous systems, Magpie always co-locates a computation with its set of data dependencies, avoiding the coherence overheads that made DSM untenable. We accomplish this co-location through fine-grained data and code mobility, enabled by a programming model that allows users to write programs as compositions of local, location-independent functions, which the runtime is free to schedule anywhere in the cluster their data dependencies can be met. In this talk I will go over Magpie’s design, its programming model, and our experience using it to build distributed infrastructure and applications.

Bio: Achilles is a third year PhD candidate at UCSC, working at the intersection of distributed systems, databases, and programming languages with Peter Alvaro. He still cooks when he gets too frustrated by computers.

March 1

Speaker: Karuna Grewal

Title: Expressive Policies for Microservice Networks

Abstract: Microservice-based application deployments need to administer safety properties while serving requests. However, today such properties can be specified only in limited ways that can lead to overly permissive policies and the potential for illegitimate flow of information across microservices, or ad hoc policy implementations.

We argue that a range of use cases require safety properties for the flow of requests across the whole microservice network, rather than only between adjacent hops. To begin specifying such expressive policies, we propose a system for declaring and deploying service tree policies. These policies are compiled down into declarative filters that are inserted into microservice deployment manifests. We use a light-weight dynamic monitor based enforcement mechanism, using ideas from automata theory. Experiments with our preliminary prototype show that we can capture a wide class of policies that we describe as case studies.

(This is a joint work with Brighten Godfrey from UIUC and Justin Hsu from Cornell University that appeared at HotNets’23.)

Bio: Karuna Grewal is a third year Ph.D. student in the Computer Science department at Cornell University advised by Prof. Justin Hsu. Her current research focus is to apply techniques from programming languages and formal methods to distributed and networked systems with a focus on security properties.

March 8

Speaker: Abhiroop Sarkar

Title: HasTEE+ - Confidential Computing with Haskell

Abstract: This talk presents HasTEE+, a Domain-Specific Language in Haskell designed for seamless programming of Trusted Execution Environments (TEEs). HasTEE+ offers a high-level programming model that uses the Haskell-type system to compartmentalise confidential portions of a program within the TEE unit. It then allows the specification of Information Flow Control (IFC) policies and incorporates a dynamic mechanism for enforcing them. HasTEE+ also includes built-in support for remote attestation using the Intel RA-TLS protocol. The practicality of HasTEE+ was demonstrated through various practical case studies including a Federated Learning system and a Data Clean Room. I will conclude the talk by discussing potential research avenues for enhancing the security guarantees provided by HasTEE+

Bio: Abhiroop Sarkar is a final year PhD student at Chalmers University, Sweden. Defending his thesis at the end of March, his research involves improving systems security for both cloud and embedded environments using programming language techniques, particularly functional programming.

March 15

Speaker: Matthew C. Davis

Title: NaNofuzz to TestLoop: A Journey from Empirical to Theoretical Research (and back again)

Abstract: Software testing labor cost in the United States exceeded an estimated $47 billion in 2021. For 50 years, Automatic Test sUite Generators (ATUGs) have intended to help developers generate test suites. The vast majority of ATUGs have focused on effectiveness measures such as code coverage, mutants killed, and bugs elicited. But few ATUGs have focused on the developer’s experience. In this talk we discuss NaNofuzz, an ATUG that focuses on the developer’s experience, as well as TestLoop, a theoretical model of the developer’s test suite generation process. We describe ways in which our empirical ATUG research informs and refines our theoretical research – and vice-versa.

Bio: Matthew C. Davis is a Software Engineering PhD student at Carnegie Mellon University’s School of Computer Science and is advised by Dr. Brad A. Myers and Dr. Joshua Sunshine. Matthew’s research focuses on interventions that improve the human ability to build and test useful software and is motivated by twenty years of experience as a software engineer and as a global technology director in industry, where he observed first-hand many ways in which developers encounter barriers to efficiently and effectively using various types of common software engineering tools.