Open Source Projects

SpaceSearch: Verifying Solver Aided Tools

Many effective verification tools build on automated solvers. These tools reduce problems in an application domain (ranging from data-race detection to compiler optimization validation) to the domain of a highly optimized solver like Z3. However, this reduction is rarely formally verified in practice, leaving the end-to-end soundness of the tool in question. SpaceSearch is a library to verify such tools by means of a proof assistant.

For more information, visit the SpaceSearch github repository.


Cosette: An Automated SQL Solver

Cosette is a framework developed using the Coq Proof Assistant for formally verifing query optimizations in databases. It consists of the following parts: 1) HoTTSQL, a SQL like language that covers all major SQL features, including selection-projection-join, aggregation, correlated subqueries, and indexes. We provide a machine checkable denotational semantics of HoTTSQL using Homotopy Type Theory for proving rewriting rules. 2) A library that consists of building blocks of proofs and automatic decision procedures (e.g., decision procedure for conjunctive queries). 3) Machine checkable proofs for existing rewrite rules from database literature as well as real-world optimizers, ranging from basic ones such as selection push down to complex ones such as magic sets rewrites.

For more information, visit the Cosette website.


Bagpipe: BGP Policy Verification

Internet Service Providers (ISPs) rely on the Border Gateway Protocol (BGP) to exchange routing information, which is necessary to deliver traffic over the Internet. The BGP behavior of an ISP’s routers is determined by their configurations. Router misconfigurations have caused major monetary loss, performance degradation, service outages, and security violations. Some of the effects of misconfiguration are highly visible, such as the worldwide extended downtime of YouTube in 2009 and route hijacks by China Telecom in 2010 and 2014. Less visible is the high cost of developing and maintaining correct configurations, which requires checking invariants across hundreds of thousands of lines of configuration for all of an ISP’s routers.

Bagpipe is a tool which enables an ISP to express its BGP policy in a domain-specific specification language and verify that its router configurations implement this policy. We evaluated Bagpipe on Internet2 and Selfnet, two ISPs with a combined total of over 100,000 lines of router configuration. We identified and expressed policies for these ISPs, and found 19 inconsistencies between the policies and the router configurations without issuing any false positives.

For more information, visit the bagpipe website.


Dataflow Analysis for Google error-prone

Error­-prone is a Java compiler extension that is used in Google’s internal build system to eliminate classes of serious bugs from entering the code base.

Error-prone checks used to be based on AST matching, which made it hard to write checks that require flow-sensitive information. For example, one might want to check that a field access does not happen with a null variable, or that a lock is released after it is acquired.

During my Google internship, I ported the Checker Framework’s dataflow framework to error-prone (the port required bug fixes, performance improvements, feature enhancements, and infrastructure changes).

It is now easy to write flow-sensitive error-prone checks. Some of these checks are run at Google with every Java code commit.

Some of the code is open source. See for example Boolean equality bug fix, performance improvements, and enhanced switch support.


A Type System for Format Strings

Most programming languages support format strings, but their use is error-prone. Using the wrong format string syntax, or passing the wrong number or type of arguments, leads to unintelligible text output, program crashes, or security vulnerabilities.

In this project, we developed a type system that guarantees that calls to format string APIs will never fail. In Java, this means that the API will not throw exceptions. In C, this means that the API will not return negative values, corrupt memory, etc.

We instantiated this type system for Java’s Formatter API, and evaluated it on 6 large and well-maintained open-source projects. Format string bugs are common in practice (our type system found 104 bugs), and the annotation burden on the user of our type system is low (on average, for every bug found, only 1.0 annotations need to be written).

Download: Paper (PDF), Slides (PDF), Slides (ODP), Demo Paper (PDF), Format String Checker Implementation


Hardware Assisted Resource Overcommitment for the Linux KVM on System Z

One important advantage of virtualizing a computer’s physical resources is that it enables resource overcommitment (more resources are promised than can be delivered, in the hope that they will not actually be used).

Traditionally, memory overcommitment has been handled by lazily assigning memory to the virtualized guest, dynamically redistributing the guests’ memory, or swapping out guest memory. These techniques have problems, that we overcame by sharing more information about memory usage between the hypervisor (Linux KVM) and guest (any Operating System), in an approach called Collaborative Memory Management (CMM). The code is part of the Linux kernel (see this patch, and this follow up patch).

A spinlock is a commonly used locking primitive that spins in an idle loop until the accessed resource becomes unlocked. If the CPU is overcommited (there are more virtual cores than physical cores) not all virtual cores can run at the same time. In such a scenario, holding a spinlock can waste large amounts of CPU time. We overcame this problem by sharing information about held spinlocks between the hypervisor and guest, which allows the hypervisor to make better CPU scheduling descisions. The code is part of the Linux kernel (see this patch).


Publications

“Formal Semantics & Verification for the Border Gateway Protocol” (Tech Report) by Konstantin Weitz, Doug Woos, Arvind Krishnamurthy, Michael D. Ernst, and Zachary Tatlock.
Download: PDF.

“Scalable Verification of Border Gateway Protocol Configurations with an SMT Solver” by Konstantin Weitz, Doug Woos, Emina Torlak, Michael D. Ernst, Arvind Krishnamurthy, and Zachary Tatlock. In OOPSLA 2016.
Download: PDF.

“Formal Semantics and Automated Verification for the Border Gateway Protocol” by Konstantin Weitz, Doug Woos, Emina Torlak, Michael D. Ernst, Arvind Krishnamurthy, and Zachary Tatlock. In NetPL 2016.
Download: PDF.

“HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics (arXiv)” by Shumo Chu, Konstantin Weitz, Alvin Cheung and Dan Suciu.
Download: PDF

“A type system for format strings” by Konstantin Weitz, Gene Kim, Siwakorn Srisakaokul, and Michael D. Ernst. In ISSTA 2014.
Download: PDF, BibTeX Entry, Slides (PDF), Slides (ODP)

“A format string checker for Java” (Demo Paper) by Konstantin Weitz, Siwakorn Srisakaokul, Gene Kim, and Michael D. Ernst. In ISSTA 2014.
Download: PDF, BibTeX Entry.

“Real-Time Collaborative Analysis with (Almost) Pure SQL: A Case Study in Biogeochemical Oceanography” by Daniel Halperin, Francois Ribalet, Konstantin Weitz, Mak A. Saito, Bill Howe, and E. Virginia Armbrust. In SSDBM 2013.
Download: PDF, BibTeX Entry, Talk

Blog Posts

HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics

SQL is the lingua franca for retrieving structured data. Existing semantics for SQL, however, either do not model crucial features of the language (e.g., relational algebra lacks bag semantics, correlated subqueries, and aggregation), or make it hard to formally reason about SQL query rewrites (e.g., the SQL standard’s English is too informal). This post focuses on the ways that HoTT concepts (e.g., Homotopy Types, the Univalence Axiom, and Truncation) enabled us to develop HoTTSQL — a new SQL semantics that makes it easy to formally reason about SQL query rewrites. Our paper also details the rich set of SQL features supported by HoTTSQL.

Read more ...

(Dis)advantages of using Abstract Data Types in Proof Assistants

This blog post explains the advantages and disadvantages of using Abstract Data Types (ADTs, see TaPL Chapter 24) in a Proof Assistant. On the plus side, ADTs promote data representation independence, code reuse, and clean code extraction; but they also do not support fix/match syntax, have to expose derived operations, and prohibit computational reasoning.

Read more ...

Machine Checkable Pictorial Mathematics for Category Theory

Pictures are essential to conveying proofs and definitions (Pinto and Tall, 2002), and are thus common in traditional pen and paper mathematics. While proof assistants offer several benefits over pen and paper mathematics, proofs and definitions in proof assistants are entirely textual (i.e. lack images), and are thus often harder to convey than their traditional equivalents. This blog post changes this unsatisfactory situation by showing how to write pictorial definitions in a proof assistant. Specifically, this blog post shows how to write commutative diagrams for category theory in Coq.

Read more ...

3 Shocking Ways to Break Mathematics

This post provides formal proofs of three paradoxes that occur in type systems with unrestricted recursion, negative data types, or type in type (Girard’s paradox). The title is a pun on BuzzFeedy headlines.

Read more ...

Abstract Data Types and Object Oriented Programming in Coq (Part I)

This post describes how to use data abstractions, specifically Abstract Data Types (ADTs) and Object Oriented Programming (OOP), to hide a value’s representation in a functional programming language with dependent types.

Read more ...

Connections between Natural Numbers, Types, Sets, and Propositions

This post shows connections between natural numbers, types, sets and propositions. This post draws ideas from the Curry Howard correspondence which shows a connection between functional programs and mathematical proofs, and Chris Taylor’s blog post which shows a connection between natural numbers and Haskell types.

Read more ...