## Open Source Projects

### DopCert: Database Optimizations Certified

DopCert is a framework developed using the Coq Proof Assistant for formally verifing query optimizations in databases. It consists of the following parts: 1) HoTTSQL, a SQL like language that covers all major SQL features, including selection-projection-join, aggregation, correlated subqueries, and indexes. We provide a machine checkable denotational semantics of HoTTSQL using Homotopy Type Theory for proving rewriting rules. 2) A library that consists of building blocks of proofs and automatic decision procedures (e.g., decision procedure for conjunctive queries). 3) Machine checkable proofs for existing rewrite rules from database literature as well as real-world optimizers, ranging from basic ones such as selection push down to complex ones such as magic sets rewrites.

For more information, visit the DopCert website.

### Bagpipe: BGP Policy Verification

Internet Service Providers (ISPs) rely on the Border Gateway Protocol (BGP) to exchange routing information, which is necessary to deliver traffic over the Internet. The BGP behavior of an ISP’s routers is determined by their configurations. Router misconfigurations have caused major monetary loss, performance degradation, service outages, and security violations. Some of the effects of misconfiguration are highly visible, such as the worldwide extended downtime of YouTube in 2009 and route hijacks by China Telecom in 2010 and 2014. Less visible is the high cost of developing and maintaining correct configurations, which requires checking invariants across hundreds of thousands of lines of configuration for all of an ISP’s routers.

Bagpipe is a tool which enables an ISP to express its BGP policy in a domain-specific specification language and verify that its router configurations implement this policy. We evaluated Bagpipe on Internet2 and Selfnet, two ISPs with a combined total of over 100,000 lines of router configuration. We identified and expressed policies for these ISPs, and found 19 inconsistencies between the policies and the router configurations without issuing any false positives.

For more information, visit the bagpipe website.

### Dataflow Analysis for Google error-prone

Error­-prone is a Java compiler extension that is used in Google’s internal build system to eliminate classes of serious bugs from entering the code base.

Error-prone checks used to be based on AST matching, which made it hard to write checks that require flow-sensitive information. For example, one might want to check that a field access does not happen with a null variable, or that a lock is released after it is acquired.

During my Google internship, I ported the Checker Framework’s dataflow framework to error-prone (the port required bug fixes, performance improvements, feature enhancements, and infrastructure changes).

It is now easy to write flow-sensitive error-prone checks. Some of these checks are run at Google with every Java code commit.

Some of the code is open source. See for example Boolean equality bug fix, performance improvements, and enhanced switch support.

### A Type System for Format Strings

Most programming languages support format strings, but their use is error-prone. Using the wrong format string syntax, or passing the wrong number or type of arguments, leads to unintelligible text output, program crashes, or security vulnerabilities.

In this project, we developed a type system that guarantees that calls to format string APIs will never fail. In Java, this means that the API will not throw exceptions. In C, this means that the API will not return negative values, corrupt memory, etc.

We instantiated this type system for Java’s Formatter API, and evaluated it on 6 large and well-maintained open-source projects. Format string bugs are common in practice (our type system found 104 bugs), and the annotation burden on the user of our type system is low (on average, for every bug found, only 1.0 annotations need to be written).

### Hardware Assisted Resource Overcommitment for the Linux KVM on System Z

One important advantage of virtualizing a computer’s physical resources is that it enables resource overcommitment (more resources are promised than can be delivered, in the hope that they will not actually be used).

Traditionally, memory overcommitment has been handled by lazily assigning memory to the virtualized guest, dynamically redistributing the guests’ memory, or swapping out guest memory. These techniques have problems, that we overcame by sharing more information about memory usage between the hypervisor (Linux KVM) and guest (any Operating System), in an approach called Collaborative Memory Management (CMM). The code is part of the Linux kernel (see this patch, and this follow up patch).

A spinlock is a commonly used locking primitive that spins in an idle loop until the accessed resource becomes unlocked. If the CPU is overcommited (there are more virtual cores than physical cores) not all virtual cores can run at the same time. In such a scenario, holding a spinlock can waste large amounts of CPU time. We overcame this problem by sharing information about held spinlocks between the hypervisor and guest, which allows the hypervisor to make better CPU scheduling descisions. The code is part of the Linux kernel (see this patch).

## Publications

“Formal Semantics & Verification for the Border Gateway Protocol” (Tech Report) by Konstantin Weitz, Doug Woos, Arvind Krishnamurthy, Michael D. Ernst, and Zachary Tatlock.

“Scalable Verification of Border Gateway Protocol Configurations with an SMT Solver” by Konstantin Weitz, Doug Woos, Emina Torlak, Michael D. Ernst, Arvind Krishnamurthy, and Zachary Tatlock. In OOPSLA 2016.

“Formal Semantics and Automated Verification for the Border Gateway Protocol” by Konstantin Weitz, Doug Woos, Emina Torlak, Michael D. Ernst, Arvind Krishnamurthy, and Zachary Tatlock. In NetPL 2016.

“HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics (arXiv)” by Shumo Chu, Konstantin Weitz, Alvin Cheung and Dan Suciu.

“A type system for format strings” by Konstantin Weitz, Gene Kim, Siwakorn Srisakaokul, and Michael D. Ernst. In ISSTA 2014.
Download: PDF, BibTeX Entry, Slides (PDF), Slides (ODP)

“A format string checker for Java” (Demo Paper) by Konstantin Weitz, Siwakorn Srisakaokul, Gene Kim, and Michael D. Ernst. In ISSTA 2014.

“Real-Time Collaborative Analysis with (Almost) Pure SQL: A Case Study in Biogeochemical Oceanography” by Daniel Halperin, Francois Ribalet, Konstantin Weitz, Mak A. Saito, Bill Howe, and E. Virginia Armbrust. In SSDBM 2013.

## (Dis)advantages of using Abstract Data Types in Proof Assistants

This blog post explains the advantages and disadvantages of using Abstract Data Types (ADTs, see TaPL Chapter 24) in a Proof Assistant. On the plus side, ADTs promote data representation independence, code reuse, and clean code extraction; but they also do not support fix/match syntax, have to expose derived operations, and prohibit computational reasoning.

## Machine Checkable Pictorial Mathematics for Category Theory

Pictures are essential to conveying proofs and definitions (Pinto and Tall, 2002), and are thus common in traditional pen and paper mathematics. While proof assistants offer several benefits over pen and paper mathematics, proofs and definitions in proof assistants are entirely textual (i.e. lack images), and are thus often harder to convey than their traditional equivalents. This blog post changes this unsatisfactory situation by showing how to write pictorial definitions in a proof assistant. Specifically, this blog post shows how to write commutative diagrams for category theory in Coq.

## 3 Shocking Ways to Break Mathematics

This post provides formal proofs of three paradoxes that occur in type systems with unrestricted recursion, negative data types, or type in type (Girard’s paradox). The title is a pun on BuzzFeedy headlines.