Publications

2020

Testing DNN Image Classifiers for Confusion & Bias Errors, 13 pages, acceptance rate: 22.8%.
by Yuchi Tian, Ziyuan Zhong, Vicente Ordonez, Gail Kaiser, Baishakhi Ray.
[ICSE 2020]

Neutaint: Efficient Dynamic Taint Analysis with Neural Networks, 13 pages (to appear) .
by Dongdong She, Yizheng Chen, Abhishek Shah, Baishakhi Ray, Suman Jana.
[S&P (Oakland) 2020]

An Empirical Study on the Use and Misuse of Java 8 Streams, 10 pages (to appear) .
by Raffi Khatchadourian, Yiming Tang, Mehdi Bagherzadeh and Baishakhi Ray.
[FASE 2020]

2019

Metric Learning for Adversarial Robustness, 10+8 pages, acceptance rate: 21.17%.
by Chengzhi Mao, Ziyuan Zhong, Junfeng Yang, Carl Vondrick, Baishakhi Ray.
[NeurIPS 2019]

NEUZZ: Efficient Fuzzing with Neural Program Smoothing,13 pages, acceptance rate ~ 12%.
by Dongdong She, Kexin Pei, Dave Epstein, Junfeng Yang, Baishakhi Ray, Suman Jana.
[S&P (Oakland) 2019]

Toward Optimal Selection of Information Retrieval Models for Software Engineering Tasks.
by Md Masudur Rahman, Saikat Chakraborty, Gail Kaiser and Baishakhi Ray.
[SCAM 2019]

2018

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars, 10 pages, acceptance rate 20.9%.
by Yuchi Tian, Kexin Pei, Suman Jana, Baishakhi Ray.
[ICSE 2018]

Building Language Models for Text with Named Entities, 10 pages, acceptance rate 24%.
by Md Rizwan Parvez, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.
[ACL 2018]

Replay without Recording of Production Bugs for Service Oriented Applications, 10 pages, acceptance rate 19.9%.
by Nipun Arora, Jonathan Bell, Franjo Ivancic, Gail Kaiser, Baishakhi Ray.
[ASE 2018]

Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval, 10 pages, acceptance rate: 33%.
by Md Masudur Rahman, Jed Barson, Sydney Paul, Joshua Kayani, Federico Andrés Lois, Sebastián Fernandez Quezada, Christopher Parnin, Kathryn T. Stolee, and Baishakhi Ray.
[MSR 2018]
Obfuscation Resilient Search through Executable Classification, Fang-Hsiang Su, Jonathan Bell, Gail Kaiser, and Baishakhi Ray, 10 pages. In Second ACM SIGPLAN Workshop on Machine Learning and Programming Languages (MAPL'18).

Short Papers

Poster: Which Similarity Metric to Use for Software Documents? A Study on Information Retrieval-Based Software Engineering Tasks, Md Masudur Rahman, Saikat Chakraborty, Baishakhi Ray. 2 pages. ICSE’18-Poster.
Poster: Searching for High-performing Software Configurationswith Metaheuristic Algorithms, Chong Tang, Kevin Sullivan, Baishakhi Ray. 2 pages. ICSE’18-Poster.
Poster: A Recommender System for Developer Onboarding, Chao Liu, Dan Yang, Xiaohong Zhang, Haibo Hu, Jed Barson, Baishakhi Ray. 2 pages. ICSE’18-Poster.

2017

Automatically Diagnosing and Repairing Error Handling Bugs in C, 10 pages, acceptance rate: 24.4%.
by Yuchi Tian, Baishakhi Ray. ACM SIGSOFT Distinguished Paper Award.
[FSE 2017]
GitcProc: A Tool for Processing and Classifying GitHub Commits, 4 pages. by C. Casalnuovo, Y. Suchak, Baishakhi Ray, C. Rubio-Gonzalez.
[ISSTA’17 Tool-demo]
Some From Here, Some From There: Cross-Project Code Reuse in GitHub, 10 pages, acceptance rate: 27%..
M. Gharehyazie, Baishakhi Ray, V. Filkov. ACM SIGSOFT Distinguished Paper Award.
[MSR’17]
A Large Scale Study of Programming Languages and Code Quality in Github.
by B. Ray, D. Posnett, P. T. Devanbu, V. Filkov. CACM Research Highlights.

2016

APEx: Automated Inference of Error Specifications for C APIs, 10 pages, acceptance rate: 19.1%
by Yuan Jochen Kang, Baishakhi Ray, Suman Jana.
[ASE 2016]

Automatically Detecting Error Handling Bugs using Error Specifications, 18 pages, acceptance rate: 15.5%
by Suman Jana, Yuan Jochen Kang, Samuel Roth, Baishakhi Ray.
[USENIX Security 2016]

On the “Naturalness” of Buggy Code, 12 pages, acceptance rate: 19%
by Baishakhi Ray, Vincent Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar Devanbu.
[ ICSE 2016]

@inproceedings{ray2015naturalness,
   title={On the" Naturalness" of Buggy Code},
  author={Ray, Baishakhi and Hellendoorn, Vincent and Tu, Zhaopeng and Nguyen, Connie and Godhane, Saheel and Bacchelli, Alberto and Devanbu, Premkumar},
   series = {ICSE '16},
   year={2016},
   organization={ACM}
}

Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be “natural”, like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This suggests that code that appears improbable, or surprising, to a good statistical language model is “unnatural” in some sense, and thus possibly suspicious. In this paper, we investigate this hypothesis. We consider a large corpus of bug fix commits (ca. 7,139), from 10 different Java projects, and focus on its language statistics, evaluating the naturalness of buggy code and the corresponding fixes. We find that code with bugs tends to be more entropic (i.e. unnatural), becoming less so as bugs are fixed. Ordering files for inspection by their average entropy yields cost-effectiveness scores comparable to popular defect prediction methods. At a finer granularity, focusing on highly entropic lines is similar in cost-effectiveness to some well-known static bug finders (PMD, FindBugs) and ordering warnings from these bug finders using an entropy measure improves the cost-effectiveness of inspecting code implicated in warnings. This suggests that entropy may be a valid, simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.

2015

Assert Use in GitHub Projects, 11 pages, acceptance rate: 18.5%
by Casey Casalnuovo, Prem Devanbu, Abilio Oliveira, Vladimir Filkov, Baishakhi Ray.
ICSE 2015

@inproceedings{casey2014Assert,
   title={Assert Use in GitHub Projects},
  author={Casey, Casalnuovo and Prem, Devanbu and Abilio, Oliveira and Vladimir, Filkov and Ray, Baishakhi},
   series = {ICSE '15},
   year={2015},
   organization={ACM}
}

Assertions in a program are believed to help with automated verification, code
understandability, maintainability, fault localization, and diagnosis, all eventually leading
to better software quality. Using a large dataset of assertions in C and C++ programs, we
confirmed this claim, i.e., methods with assertions do have significantly fewer defects. Assertions
also appear to play a positive role in collaborative software development, where many
programmers are working on the same method. We further characterized assertion usage along
process and product metrics. Such detailed characterization of assertions will help to predict
relevant locations of useful assertions and will improve code quality.

A revised version of the paper is available here.

The Uniqueness of Changes: Characteristics and Applications, 11 pages, acceptance rate: 30%
by Baishakhi Ray, Meiyappan Nagappan, Christian Bird, Nachiappan Nagappan, Thomas Zimmermann.
[ MSR 2015]

@inproceedings{ray2014uniqueness,
   title={The Uniqueness of Changes: Characteristics and Applications},
  author={Ray, Baishakhi and Nagappan, Meiyappan and Bird, Christian and Nagappan, Nachiappan and Zimmermann, Thomas},
   series = {MSR '15},
   year={2015},
   organization={ACM}
}

Changes in software development come in many forms. Some changes are frequent, idiomatic, or
repetitive (e.g. adding checks for nulls or logging important values) while others are unique.
We hypothesize that unique changes are different from the more common similar (or non-unique)
changes in important ways; they may require more expertise or represent code that is more complex
or prone to mistakes. As such, these unique changes are worthy of study. In this paper, we present a
definition of unique changes and provide a method for identifying them in software project history.
Based on the results of applying our technique on the Linux kernel and two large projects at
Microsoft, we present an empirical study of unique changes. We explore how prevalent unique changes
are and investigate where they occur along the architecture of the project. We further investigate
developers’ contribution towards uniqueness of changes. We also describe potential applications of
leveraging the uniqueness of change and implement two of those applications, evaluating the risk of
changes based on uniqueness and providing change recommendations for non-unique changes.

Gender and Tenure Diversity in GitHub Teams, 10 pages, acceptance rate: 20%.
by Bogdan Vasilescu, Daryl Posnett, Baishakhi Ray, Mark van den Brand, Alexander Serebrenik, Premkumar Devanbu, Vladimir Filkov.
[ CHI 2015]

@inproceedings{bogdan2014Diversity,
   title={Gender and Tenure Diversity in GitHub Teams},
   author={Bogdan, Vasilescu and Posnett, Daryl and Ray, Baishakhi and Brand, Mark van den and Filkov and Serebrenik, Alexander and Premkumar, Devanbu and Filkov, Vladimir},
   series = {CHI '15},
   year={2015},
   organization={ACM}
}

Using GitHub, we studied gender and tenure diversity in online
programming teams. Using the results of a survey and regression modeling of
GitHub data set comprising over 2 Million projects, we studied how diversity
relates to team productivity and turnover. We showed that both gender and
tenure diversity are positive and significant predictors of productivity. These
results can inform decision-making on all levels, leading to better outcomes in
recruiting and performance.

2014

A Large Scale Study of Programming Languages and Code Quality in Github, 10 pages, acceptance rate: 20%
by Baishakhi Ray, Daryl Posnett, Vladimir Filkov, Premkumar T. Devanbu.
[ FSE 2014]
Media Coverage: SlashDot, The Register, Reddit, InfoWorld, Hacker News

@inproceedings{ray2014lang,
   title={A Large Scale Study of Programming Languages and Code Quality in Github},
   author={Ray, Baishakhi and Posnett, Daryl and Filkov, Vladimir and Devanbu, Premkumar},
   booktitle={Proceedings of the ACM SIGSOFT 22nd International Symposium on the
             Foundations of Software Engineering},
   series = {FSE '14},
   year={2014},
   organization={ACM}
}

To investigate whether a programming language is the right tool for the job, I gathered a
very large data set from GitHub (728 projects, 63M lines of code, 29K authors, 1.5M commits,
in 17 languages). Using a mixed-methods approach, combining multiple regression modeling with
visualization and text analytics, I studied the effect of language features such as static v.s.
dynamic typing, strong v.s. weak typing on software quality. By triangulating findings from
different methods, and controlling for confounding effects such as code size, project age, and
contributors, I observed that a language design choice does have a significant, but modest
effect on software quality.

Using Frankencerts for Automated Adversarial Testing of Certificate Validation in SSL/TLS Implementations S&P 2014 Best Practical Paper Award, 16 pages, acceptance rate: 13%
by Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, Vitaly Shmatikov.
[ S&P (Oakland) 2014]
Media Coverage: Reddit, Golem, Heise

@inproceedings{brubaker2014using,
   title={Using Frankencerts for Automated Adversarial Testing of Certificate Validation
         in SSL/TLS Implementations},
   author={Brubaker, Chad and Jana, Suman and Ray, Baishakhi and Khurshid, Sarfraz and
          Shmatikov, Vitaly},
   booktitle={IEEE Symposium on Security and Privacy 2014},
   year={2014},
   organization={IEEE}
}

Nowadays in open software market, multiple software are available to users that provide
similar functionality. For example, there exists a pool of popular SSL/TLS libraries (e.g.,
OpenSSL, GnuTLS, NSS, CyaSSL, GnuTLS, PolarSSL, MatrixSSL, etc.) for securing network
connections from man-in-the-middle attacks. Certificate validation is a crucial part of
SSL/TLS connection setup. Though implemented differently, the certificate validation logic of
these different libraries should serve the same purpose, following the SSL/TLS protocol, i.e.
for a given certificate, all of the libraries should either accept or reject it. In
collaboration with security researchers at the University of Texas at Austin, we designed the
first large-scale framework for testing certificate validation logic in SSL/TLS
implementations. First, we generated millions of synthetic certificates by randomly mutating
parts of real certificates and thus induced unusual combinations of extensions and
constraints. A valid SSL implementation should be able to detect and reject the unusual
mutants. Next, using a differential testing framework, we checked whether one SSL/TLS
implementation accepts a certificate while another rejects the same certificate. We used such
discrepancies as an oracle for finding flaws in individual implementations. We uncovered 208
discrepancies between popular SSL/TLS implementations, many of them are caused by serious
security vulnerabilities.

2013

Detecting and Characterizing Semantic Inconsistencies in Ported Code. Nominated for distinguished paper award, Invited for journal special issue, 10 pages, acceptance rate: 23%
by Baishakhi Ray, Miryung Kim, Suzette Person, Neha Rungta
[ ASE 2013]

@inproceedings{ray2013detecting,
   title={Detecting and characterizing semantic inconsistencies in ported code},
   author={Ray, Baishakhi and Kim, Miryung and Person, Suzette and Rungta, Neha},
   booktitle={Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on},
   pages={367--377},
   year={2013},
   organization={IEEE}
}

In order to automatically detect copy-paste errors, I investigated: (1) What are the common
types of copy-paste errors? (2) How can they be automatically detected? By analyzing
the version histories of FreeBSD and Linux, I found five common types of copy-paste errors and
then leveraging this categorization I designed a two-stage analysis technique to detect and
characterize copy-paste errors. The first stage of the analysis, SPA, detects and categorizes
inconsistencies in repetitive changes based on a static control and data dependence analysis.
SPA successfully identifies copy-paste errors with 65% to 73% precision, an improvement by 14
to 17 percentage points with respect to previous tools. The second stage of the analysis,
SPA++, uses the inconsistencies computed by SPA to direct symbolic execution in order to
generate program behaviors that are impacted by the inconsistencies. SPA++ further compares
these program behaviors leveraging logical equivalence checking (implemented with z3 theorem
prover) and generates test inputs that exercise program paths containing the reported
inconsistencies. A case study shows that SPA++ can refine the results reported by SPA and help
developers analyze copy-paste inconsistencies. I collaborated with researchers from NASA for
this work.

An Empirical Study of API Stability and Adoption in the Android Ecosystem . 10 pages, acceptance rate: 22%
by Tyler McDonnell, Baishakhi Ray, Miryung Kim
[ ICSM 2013]

@inproceedings{mcdonnell2013empirical,
   title={An empirical study of API stability and adoption in the Android ecosystem},
   author={McDonnell, Tyler and Ray, Baishakhi and Kim, Miryung},
   booktitle={Software Maintenance (ICSM), 2013 29th IEEE International Conference on},
   pages={70--79},
   year={2013},
   organization={IEEE}
}

In today’s software ecosystem, which is primarily governed by web, cloud, and mobile
technologies, APIs perform a key role to connect disparate software. Big players like
Google, FaceBook, Microsoft aggressively publish new APIs to accommodate new feature
requests, bugs fixes, and performance improvements. We investigated how such fast paced
API evolution affects the overall software ecosystem? Our study on Android API evolution
showed that the developers are hesitant to adopt fast evolving, unstable APIs. For
instance, while Android updates 115 APIs per month on average, clients adopt the new APIs
rather slowly, with a median lagging period of 16 months. Furthermore, client code with
new APIs is typically more defect prone than the ones without API adaptation. To the best
of my knowledge, this is the first work studying API adoption in a large software
ecosystem, and the study suggests how to promote API adoption and how to facilitate growth
of the overall ecosystems.

2012

A Case Study of Cross-System Porting in Forked Software Projects. 11 pages, acceptance rate: 17%
by Baishakhi Ray, Miryung Kim
[ FSE 2012]

@inproceedings{Ray2012,
   title = {A Case Study of Cross-system Porting in Forked Projects},
   author = {Ray, Baishakhi and Kim, Miryung},
   booktitle = {Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering},
   series = {FSE 2012},
   articleno = {53},
   pages = {53:1--53:11}
}

This paper empirically demonstrates that developers spend a significant amount of time and
effort in introducing similar features and bug-fixes in and across different projects.
This involves a significant amount of repeated work. To automatically identify the
repetitive changes, I designed Repertoire, an source code change analysis tool that
compares the edit contents and the corresponding operations of program patches to identify
similar changes, with 94% precision and 84% recall. Using Repertoire, I showed that
developers often introduce a significant amount of repeated changes within and across
projects. Most notably, repetitive changes among forked projects (different variants of an
existing project, e.g., FreeBSD, NetBSD and OpenBSD) incur significant duplicate work. In
each BSD release, on average, more than twelve thousand lines are ported from peer
projects, and more than 25% of active developers participate in cross-system porting in
each release.

Repertoire: A Cross-System Porting Analysis Tool for Forked Software Projects . 4 pages
by Baishakhi Ray, Christopher Wiley, Miryung Kim
[ FSE 2012]

@inproceedings{ray2012repertoire,
   title={Repertoire: A cross-system porting analysis tool for forked software projects},
   author={Ray, Baishakhi and Wiley, Christopher and Kim, Miryung},
   booktitle={Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering},
   series = {FSE 2012},
   articleno = {8},
   pages = {8:1--8:4},
}

To create a new variant of an existing project, developers often copy an existing
codebase and modify it. This process is called software forking. After forking software,
developers often port new features or bug fixes from peer projects. Repertoire analyzes
repeated work of cross-system porting among forked projects. It takes the version
histories as input and identifies ported edits by comparing the content of individual
patches. It also shows users the extent of ported edits, where and when the ported edits
occurred, which developers ported code from peer projects, and how long it takes for
patches to be ported.

An Empirical Study of Supplementary Bug Fixes. 10 pages, acceptance rate: 28%
by Jihun Park, Miryung Kim, Baishakhi Ray, Doo-Hwan Bae
[ MSR 2012]

@inproceedings{park2012empirical,
   title={An empirical study of supplementary bug fixes},
   author={Park, Jihun and Kim, Miryung and Ray, Baishakhi and Bae, Doo Hwan},
   booktitle={Mining Software Repositories (MSR), 2012 9th IEEE Working Conference on},
   pages={40--49},
   year={2012},
   organization={IEEE}
}

A recent study finds that errors of omission are harder for programmers to detect than
errors of commission. While several change recommendation systems already exist to
prevent or reduce omission errors during software development, there have been very few
studies on why errors of omission occur in practice and how such errors could be
prevented. In order to understand the characteristics of omission errors, this paper
investigates a group of bugs that were fixed more than once in open source
projects—those bugs whose initial patches were later considered incomplete and to which
programmers applied supplementary patches.

2011

PTask: Operating System Abstractions To Manage GPUs as Compute Devices. 16 pages, acceptance rate: 17%
by C. J. Rossbach, J. Currey, M. Silberstein, Baishakhi Ray, E. Witchel
[ SOSP 2011]

@inproceedings{rossbach2011ptask,
   title={PTask: Operating system abstractions to manage GPUs as compute devices},
   author={Rossbach, Christopher J and Currey, Jon and Silberstein, Mark and Ray, Baishakhi and Witchel, Emmett},
   shorthand = {SOSP'11},
   booktitle={Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles},
   pages={233--248},
   year={2011},
   organization={ACM}
}

GPUs are typically used for high-performance rendering or batch-oriented computations, but
not as general purpose compute-intensive tasks, such as brain-computer interfaces or file
system encryption. Current OS treats GPU as an I/O device as opposed to a general purpose
computational resource, like a CPU. To overcome this issue, we proposed PTask APIs, a new
set of OS abstractions. As part of this work, I ported EncFS, a FUSE based encrypted file
system for Linux, to CUDA framework such that it can use GPU for AES encryption and
decryption. Using PTask’s GPU scheduling mechanism, I showed that running EncFS on GPU over
CPU made a sequential read and write of a 200MB file 17% and 28% faster.

Older

Touch Me Wear: Getting Physical with Social Networks, 6 pages
by Aaron Beach, Baishakhi Ray, Leah Buechley
Workshop on Sensor-based Models and Feedback Systems for Social Computing, associated with SocialCom 2009. DOI

A Protocol for Building Secure and Reliable Covert Channel. 8 pages
by Baishakhi Ray, Shivakant Mishra
6th Annual Conference on Security and Privacy and Trust (PST 2008), Fredericton, NB, Canada. DOI

WhozThat?: Evolving an Ecosystem for Context-Aware Mobile Social Networks. 6 pages
by Aaron Beach, Mike Gartrell, Sirisha Akkala, Jack Elston, John Kelley, Keisuke Nishimoto, Baishakhi Ray, Sergei Razgulin, Karthik Sundaresan, Bonnie Surendar, Michael Terada, Richard Han
IEEE Network Magazine Special Issue on Composable context aware services, 2008. DOI

Book Chapter

SecureWear: A Framework for Securing Mobile Social Networks .
by Baishakhi Ray, Richard Han.
In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, Volume 85, Page 515-524