Saumya K Debray
- Professor, Computer Science
- Member of the Graduate Faculty
Contact
- (520) 621-4527
- Gould-Simpson, Rm. 735
- Tucson, AZ 85721
- debray@cs.arizona.edu
Degrees
- Ph.D. Computer Science
- SUNY at Stony Brook, Stony Brook, New York
Awards
- College of Science Distinguished Career Teaching Award
- University of Arizona, Summer 2022
- Faculty Outstanding Service Award
- Department of Computer Science, Spring 2019
- Outstanding Faculty Service
- Department of Computer Science, The University of Arizona, Tucson, Spring 2016
- Outstanding Research
- Department of Computer Science, The University of Arizona, Tucson, Spring 2016
- Most influential paper from IEEE Working Conference on Reverse Engineering (WCRE) 2005
- IEEE, Spring 2015
- Outstanding Mentor
- Dept. of Computer Science, The University of Arizona, Tucson, Spring 2015
- Tech Launch Arizona Catapult Award
- University of Arizona, Spring 2014
- Course Exemplar for the ACM Computer Science Curricula 2013
- ACM, Spring 2013
Interests
No activities entered.
Courses
2024-25 Courses
-
Principle Of Compilation
CSC 553 (Fall 2024)
2023-24 Courses
-
Compilers+Systems Sftwr
CSC 453 (Spring 2024) -
Research
CSC 900 (Spring 2024) -
Dissertation
CSC 920 (Fall 2023) -
Principle Of Compilation
CSC 553 (Fall 2023) -
Research
CSC 900 (Fall 2023)
2022-23 Courses
-
Dissertation
CSC 920 (Spring 2023) -
Honors Independent Study
CSC 499H (Spring 2023) -
Honors Thesis
CSC 498H (Spring 2023) -
Research
CSC 900 (Spring 2023) -
Directed Research
CSC 492 (Fall 2022) -
Dissertation
CSC 920 (Fall 2022) -
Honors Thesis
CSC 498H (Fall 2022) -
Principle Of Compilation
CSC 553 (Fall 2022) -
Research
CSC 900 (Fall 2022)
2021-22 Courses
-
Dissertation
CSC 920 (Spring 2022) -
Directed Research
CSC 492 (Fall 2021) -
Independent Study
CSC 699 (Fall 2021) -
Research
CSC 900 (Fall 2021)
2020-21 Courses
-
Honors Thesis
CSC 498H (Spring 2021) -
Principle Of Compilation
CSC 553 (Spring 2021) -
Research
CSC 900 (Spring 2021) -
Compilers+Systems Sftwr
CSC 453 (Fall 2020) -
Honors Thesis
CSC 498H (Fall 2020) -
Research
CSC 900 (Fall 2020)
2019-20 Courses
-
Adv Tpcs:Doctoral Colloq
CSC 695C (Spring 2020) -
Compilers+Systems Sftwr
CSC 453 (Spring 2020) -
Directed Research
CSC 492 (Fall 2019) -
Independent Study
CSC 599 (Fall 2019) -
Intro to Computer Prog II
CSC 120 (Fall 2019) -
Thesis
CSC 910 (Fall 2019)
2018-19 Courses
-
Honors Thesis
CSC 498H (Spring 2019) -
Thesis
CSC 910 (Spring 2019) -
Adv Tpcs:Doctoral Colloq
CSC 695C (Fall 2018) -
Compilers+Systems Sftwr
CSC 453 (Fall 2018) -
Directed Research
CSC 492 (Fall 2018) -
Honors Thesis
CSC 498H (Fall 2018) -
Independent Study
CSC 599 (Fall 2018) -
Principle Of Compilation
CSC 553 (Fall 2018)
2017-18 Courses
-
Directed Research
CSC 392 (Summer I 2018) -
Directed Research
CSC 492 (Summer I 2018) -
Thesis
CSC 910 (Summer I 2018) -
Adv Tpcs:Doctoral Colloq
CSC 695C (Spring 2018) -
Compilers+Systems Sftwr
CSC 453 (Spring 2018) -
Directed Research
CSC 492 (Spring 2018) -
Independent Study
CSC 499 (Spring 2018) -
Independent Study
CSC 599 (Spring 2018) -
Thesis
CSC 910 (Spring 2018) -
Adv Tpcs:Doctoral Colloq
CSC 695C (Fall 2017) -
Directed Research
CSC 392 (Fall 2017) -
Directed Research
CSC 492 (Fall 2017) -
Independent Study
CSC 699 (Fall 2017) -
Thesis
CSC 910 (Fall 2017)
2016-17 Courses
-
Directed Research
CSC 492 (Spring 2017) -
Independent Study
CSC 499 (Spring 2017) -
Intro to Computer Prog II
CSC 120 (Spring 2017) -
Adv Tpcs:Doctoral Colloq
CSC 695C (Fall 2016) -
Directed Research
CSC 392 (Fall 2016) -
Directed Research
CSC 492 (Fall 2016) -
Honors Thesis
CSC 498H (Fall 2016)
2015-16 Courses
-
Adv Tpcs:Doctoral Colloq
CSC 695C (Spring 2016) -
Compilers+Systems Sftwr
CSC 453 (Spring 2016) -
Dissertation
CSC 920 (Spring 2016) -
Honors Independent Study
CSC 499H (Spring 2016) -
Honors Thesis
CSC 498H (Spring 2016) -
System Programming+Unix
CSC 352 (Spring 2016)
Scholarly Contributions
Journals/Publications
- Lim, H., & Debray, S. (2023). Directed Test Program Generation for JIT Compiler Bug Localization. arXiv preprint arXiv:2307.08885.More infoBug localization techniques for Just-in-Time (JIT) compilers are based on analyzing the execution behaviors of the target JIT compiler on a set of test programs generated for this purpose; characteristics of these test inputs can significantly impact the accuracy of bug localization. However, current approaches for automatic test program generation do not work well for bug localization in JIT compilers. This paper proposes a novel technique for automatic test program generation for JIT compiler bug localization that is based on two key insights: (1) the generated test programs should contain both passing inputs (which do not trigger the bug) and failing inputs (which trigger the bug); and (2) the passing inputs should be as similar as possible to the initial seed input, while the failing programs should be as different as possible from it. We use a structural analysis of the seed program to determine which parts of the code should be mutated for each of the passing and failing cases. Experiments using a prototype implementation indicate that test inputs generated using our approach result in significantly improved bug localization results than existing approaches. [Journal_ref: ]
- Jacobsen, B., Rahaman, S., & Debray, S. K. (2021). Optimization to the Rescue: Evading Binary Code Stylometry with Adversarial Use of Code Optimizations. Proceedings of the 2021 Worksop on Research on offensive and defensive techniques in the Context of Man At The End (MATE) Attacks.More infoRecent work suggests that it may be possible to determine the author of a binary program simply by analyzing stylistic features preserved within it. As this poses a threat to the privacy of programmers who wish to distribute their work anonymously, we consider steps that can be taken to mislead such analysis. We begin by exploring the effect of compiler optimizations on the features used for stylistic analysis. Building on these findings, we propose a gray-box attack on a state-of-the-art classifier using compiler optimizations. Finally, we discuss our results, as well as implications for the field of binary stylometry.
- Kang, X., & Debray, S. K. (2021). A Framework for Automatic Exploit Generation for JIT Compilers. Proceedings of the 2021 Workshop on Research on offensive and defensive techniques in the Context of Man At The End (MATE) Attacks.More infoThis paper proposes a framework for automatic exploit generation in JIT compilers, focusing in particular on heap corruption vulnerabilities triggered by dynamic code, i.e., code generated at runtime by the JIT compiler. The purpose is to help assess the severity of vulnerabilities and thereby assist with vulnerability triage. The framework consists of two components: the first extracts high-level representations of exploitation primitives from existing exploits, and the second uses the primitives so extracted to construct exploits for new bugs. We are currently building a prototype implementation of the framework focusing on JavaScript JIT compilers. To the best of our knowledge, this is the first proposal to consider automatic exploit generation for code generated dynamically by JIT compilers.
- Lim, H., & Debray, S. K. (2021). Automated Bug Localization in JIT Compilers. 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’21).More infoNOTE: This is a CORE A conference
- Bartels, J., Stephens, J., & Debray, S. K. (2020). Representing and Reasoning About Dynamic Code. 35th IEEE/ACM International Conference on Automated Software Engineering.More infoThis conference is ranked A* in CORE. Acceptance rate = 93 accepted out of 414 submissions = 22.5%.
- Stephens, J., Yadegari, B., Scheidegger, C. E., Debray, S. K., & Collberg, C. S. (2017). Probabilistic Obfuscation through Covert Channels. Third IEEE European Symposium on Security and Privacy (EuroS&P).More infoThis paper presents a program obfuscation framework that uses covert channels through the program's execution environment to obfuscate information flow through the program. Unlike prior works on obfuscation, the use of covert channels removes visible information flows from the computation of the program and reroutes them through the program's runtime system and/or the operating system. This renders these information flows, and the corresponding control and data dependencies, invisible to program analysis tools such as symbolic execution engines. Additionally, we present the idea of probabilistic obfuscation which uses imperfect covert channels to leak information with some probabilistic guarantees. Experimental evaluation of our approach against state of the art detection and analysis techniques show the engines are not well-equipped to handle these obfuscations, particularly those of the probabilistic variety.
- Strout, M., Debray, S. K., Isaacs, K., Kreaseck, B., Cardenas-Rodriguez,, J., Hurwitz, B., Volk, K., Badger, S., Bertolacci, I., Bartels, J., Devkota, S., Encinas, A., Gaska, B., Sackos, T., Stephens, J., Willer, S., & Yadegari, B. (2017). Language-Agnostic Optimization and Parallelization for Interpreted Languages. The 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017).
- Yadegari, B., & Debray, S. K. (2017). Control Dependencies in Interpretive Systems. The 17th International Conference on Runtime Verification.More infoInterpretive systems—interpreters and associated software components such as just-in-time (JIT) compilers—are ubiquitous in modern computing systems. This makes it important to have good program analyses for reasoning about and manipulating such systems. Control dependence, which plays a fundamental role in a number of program analyses, is an important contender on this regard. Existing algorithms for (dynamic) control dependence analysis do not take into account some important runtime characteristics of interpretive computations, and as a result produce results that maybe imprecise and/or unsound. This paper describes a new notion of control dependence for interpretive systems, together with an analysis algorithm that addresses the problems described above for control dependence analysis in interpretive computations. This significantly improves dynamic control dependence information, with corresponding improvements in client analyses such as dynamic slicing and implicit information flow. To the best of our knowledge, this is the first proposal to reason about low-level dynamic control dependencies in interpretive systems in the presence of dynamic code generation and optimization.
- Yadegari, B., Stephens, J., & Debray, S. K. (2017). Analysis of Exception-Based Control Transfers. 7TH ACM Conference on Data and Application Security and Privacy.
- Dalla Preda, M., Giacobazzi, R., & Debray, S. K. (2015). Modeling Metamorphism by Abstract Interpretation. Theoretical Computer Science.
- Debray, S. K., Yadegari, B., Johannesmeyer, B., & Whitely, B. (2015). A Generic Approach to Automatic Deobfuscation of Executable Code. IEEE Symposium on Security and Privacy.More infoCORE: A*csrankings: one of the top three conferences listed
- Debray, S., Qiu, J., Yadegari, B., Johannesmeyer, B., & Su, X. (2015). Identifying and Understanding Self-Checksumming Defenses in Software. Fifth ACM Conference on Data and Application Security and Privacy (CODASPY).
- Yadegari, B., & Debray, S. K. (2015). Symbolic Execution of Obfuscated Code. ACM Conference on Computer and Communication Security (CCS).More infoCORE: A*;csrankings: one of the top three conferences listed
- Debray, S. K., & Yadegari, B. (2014). Bit-Level Taint Analysis.. 14th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM).
- Debray, S. K., Qiu, J., Yadegari, B., Johannesmeyer, B., & Su, X. (2014). A Framework for Understanding Dynamic Anti-Analysis Defenses. Proc. 4th Program Protection and Reverse Engineering Workshop (PPREW-4).
- Gen, L. u., & Debray, S. (2013). Weaknesses in defenses against web-borne malware (short paper). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7967 LNCS, 139-149.More infoAbstract: Web-based mechanisms, often mediated by malicious JavaScript code, play an important role in malware delivery today, making defenses against web-borne malware crucial for system security. This paper explores weaknesses in existing approaches to the detection of malicious JavaScript code. These approaches generally fall into two categories: lightweight techniques focusing on syntactic features such as string obfuscation and dynamic code generation; and heavier-weight approaches that look for deeper semantic characteristics such as the presence of shellcode-like strings or execution of exploit code. We show that each of these approaches has its weaknesses, and that state-of-the-art detectors using these techniques can be defeated using cloaking techniques that combine emulation with dynamic anti-analysis checks. Our goal is to promote a discussion in the research community focusing on robust defensive techniques rather than ad-hoc solutions. © 2013 Springer-Verlag.
- Gen, L. u., Chadha, K., & Debray, S. (2013). A simple client-side defense against environment-dependent web-based malware. Proceedings of the 2013 8th International Conference on Malicious and Unwanted Software: "The Americas", MALWARE 2013, 124-131.More infoAbstract: Web-based malware tend to be environment-dependent, which poses a significant challenge on defending web-based attacks, because the malicious code - which may be exposed and activated only under specific environmental conditions such as the version of the browser - may not be triggered during analysis. This paper proposes a simple approach for defending environment-dependent malware. Instead of increasing analysis coverage in detector, the goal of this technique is to ensure that the client will take the same execution path as the one examined by the detector. This technique is designed to work alongside a detector, it can handle cases existing multi-path exploration techniques are incapable of, and provides an efficient way to identify discrepancies in a JavaScript program's execution behavior in a user's environment compared to its behavior in a sandboxed detector, thereby detecting false negatives that may have been caused by environment dependencies. Experiment shows that this technique can effectively detect environment- dependent behavior discrepancy of various forms, including those seen in real malware. © 2013 IEEE.
- Gen, L. u., & Debray, S. (2012). Automatic simplification of obfuscated JavaScript code: A semantics-based approach. Proceedings of the 2012 IEEE 6th International Conference on Software Security and Reliability, SERE 2012, 31-40.More infoAbstract: JavaScript is a scripting language that is commonly used to create sophisticated interactive client-side web applications. However, JavaScript code can also be used to exploit vulnerabilities in the web browser and its extensions, and in recent years it has become a major mechanism for web-based malware delivery. In order to avoid detection, attackers often take advantage of the dynamic nature of JavaScript to create highly obfuscated code. This paper describes a semantics-based approach for automatic deobfuscation of JavaScript code. Experiments using a prototype implementation indicate that our approach is able to penetrate multiple layers of complex obfuscations and extract the core logic of the computation, which makes it easier to understand the behavior of the code. © 2012 IEEE.
- Gen, L. u., Coogan, K., & Debray, S. (2012). Automatic simplification of obfuscated JavaScript code. Communications in Computer and Information Science, 285 CCIS, 348-359.More infoAbstract: Javascript is a scripting language that is commonly used to create sophisticated interactive client-side web applications. It can also be used to carry out browser-based attacks on users. Malicious JavaScript code is usually highly obfuscated, making detection a challenge. This paper describes a simple approach to deobfuscation of JavaScript code based on dynamic analysis and slicing. Experiments using a prototype implementation indicate that our approach is able to penetrate multiple layers of complex obfuscations and extract the core logic of the computation. © 2012 Springer-Verlag.
- Zhang, R., Debray, S., & Snodgrass, R. T. (2012). Micro-specialization: Dynamic code specialization of database management systems. Proceedings - International Symposium on Code Generation and Optimization, CGO 2012, 63-73.More infoAbstract: Database management systems (DBMSes) form a cornerstone of modern IT infrastructure, and it is essential that they have excellent performance. Much of the work to date on optimizing DBMS performance has emphasized ensuring eff cient data access from secondary storage. This paper shows that DBMSes can also benefit significantly from dynamic code specialization. Our approach focuses on the iterative query evaluation loops typically used by such systems. Query evaluation involves extensive references to the relational schema, predicate values, and join types, which are all invariant during query evaluation, and thus are subject to dynamic value-based code specialization. We introduce three distinct types of specialization, each corresponding to a particular kind of invariant. We realize these techniques, in concert termed micro-specialization, via a DBMS-independent run-time environment and apply them to a high performance open-source DBMS, PostgreSQL. We show that microspecialization requires minimal changes to the DBMS and can yield performance improvements simultaneously across a wide range of queries and modifications, in terms of storage, CPU usage, and I/O time of standard DBMS benchmarks. We also discuss an integrated development environment that helps DBMS developers apply micro-specializations to identified target code sequences. Copyright © 2012 ACM.
- Zhang, R., Snodgrass, R. T., & Debray, S. (2012). Application of micro-specialization to query evaluation operators. Proceedings - 2012 IEEE 28th International Conference on Data Engineering Workshops, ICDEW 2012, 315-321.More infoAbstract: Relational database management systems support a wide variety of data types and operations. Such generality involves much branch condition checking, which introduces inefficiency within the query evaluation loop. We previously introduced micro-specialization, which improves performance by eliminating unnecessary branching statements and the actual code branches by exploiting invariants present during the query evaluation loop. In this paper, we show how to more aggressively apply micro-specialization to each individual operator within a query plan. Rather than interpreting the query plan, the DBMS dynamically rewrites its object code to produce executable code tailored to the particular query. We explore opportunities for applying micro-specialization to DBMSes, focusing on query evaluation. We show through an examination of program execution profiles that even with a simple query in which just a few operators are micro-specialized, significant performance improvement can be achieved. © 2012 IEEE.
- Zhang, R., Snodgrass, R. T., & Debray, S. (2012). Micro-specialization in DBMSes. Proceedings - International Conference on Data Engineering, 690-701.More infoAbstract: Relational database management systems are general in the sense that they can handle arbitrary schemas, queries, and modifications, this generality is implemented using runtime metadata lookups and tests that ensure that control is channelled to the appropriate code in all cases. Unfortunately, these lookups and tests are carried out even when information is available that renders some of these operations superfluous, leading to unnecessary runtime overheads. This paper introduces micro-specialization, an approach that uses relation- and query-specific information to specialize the DBMS code at runtime and thereby eliminate some of these overheads. We develop a taxonomy of approaches and specialization times and propose a general architecture that isolates most of the creation and execution of the specialized code sequences in a separate DBMS-independent module. Through three illustrative types of micro-specializations applied to Postgre SQL, we show that this approach requires minimal changes to a DBMS and can improve the performance simultaneously across a wide range of queries, modifications, and bulk-loading, in terms of storage, CPU usage, and I/O time of the TPC-H and TPC-C benchmarks. © 2012 IEEE.
- Coogan, K., & Debray, S. (2011). Equational reasoning on x86 assembly code. Proceedings - 11th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2011, 75-84.More infoAbstract: Analysis of software is essential to addressing problems of correctness, efficiency, and security. Existing source code analysis tools are very useful for such purposes, but there are many instances where high-level source code is not available for software that needs to be analyzed. A need exists for tools that can analyze assembly code, whether from disassembled binaries or from handwritten sources. This paper describes an equational reasoning system for assembly code for the ubiquitous Intel x86 architecture, focusing on various problems that arise in low-level equational reasoning, such as register-name aliasing, memory indirection, condition-code flags, etc. Our system has successfully been applied to the problem of simplifying execution traces from obfuscated malware executables. © 2011 IEEE.
- Coogan, K., Gen, L. u., & Debray, S. (2011). Deobfuscation of virtualization-obfuscated software: A semantics-based approach. Proceedings of the ACM Conference on Computer and Communications Security, 275-284.More infoAbstract: When new malware are discovered, it is important for researchers to analyze and understand them as quickly as possible. This task has been made more difficult in recent years as researchers have seen an increasing use of virtualization-obfuscated malware code. These programs are difficult to comprehend and reverse engineer, since they are resistant to both static and dynamic analysis techniques. Current approaches to dealing with such code first reverse-engineer the byte code interpreter, then use this to work out the logic of the byte code program. This outside-in approach produces good results when the structure of the interpreter is known, but cannot be applied to all cases. This paper proposes a different approach to the problem that focuses on identifying instructions that affect the observable behavior of the obfuscated code. This inside-out approach requires fewer assumptions, and aims to complement existing techniques by broadening the domain of obfuscated programs eligible for automated analysis. Results from a prototype tool on real-world malicious code are encouraging. © 2011 ACM.
- Debray, S., & Patel, J. (2010). Reverse engineering self-modifying code: Unpacker extraction. Proceedings - Working Conference on Reverse Engineering, WCRE, 131-140.More infoAbstract: An important application of binary-level reverse engineering is in reconstructing the internal logic of computer malware. Most malware code is distributed in encrypted (or "packed") form; at runtime, an unpacker routine transforms this to the original executable form of the code, which is then executed. Most of the existing work on analysis of such programs focuses on detecting unpacking and extracting the unpacked code. However, this does not shed any light on the functionality of different portions of the code so obtained, and in particular does not distinguish between code that performs unpacking and code that does not; identifying such functionality can be helpful for reverse engineering the code. This paper describes a technique for identifying and extracting the unpacker code in a self-modifying program. Our algorithm uses offline analysis of a dynamic instruction trace both to identify the point(s) where unpacking occurs and to identify and extract the corresponding unpacker code. © 2010 IEEE.
- Preda, M. D., Giacobazzi, R., Debray, S., Coogan, K., & Townsend, G. M. (2010). Modelling metamorphism by abstract interpretation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6337 LNCS, 218-235.More infoAbstract: Metamorphic malware apply semantics-preserving transformations to their own code in order to foil detection systems based on signature matching. In this paper we consider the problem of automatically extract metamorphic signatures from these malware. We introduce a semantics for self-modifying code, later called phase semantics, and prove its correctness by showing that it is an abstract interpretation of the standard trace semantics. Phase semantics precisely models the metamorphic code behavior by providing a set of traces of programs which correspond to the possible evolutions of the metamorphic code during execution. We show that metamorphic signatures can be automatically extracted by abstract interpretation of the phase semantics, and that regular metamorphism can be modelled as finite state automata abstraction of the phase semantics. © 2010 Springer-Verlag.
- Coogan, K., Debray, S., Kaochar, T., & Townsend, G. (2009). Automatic static unpacking of malware binaries. Proceedings - Working Conference on Reverse Engineering, WCRE, 167-176.More infoAbstract: Current malware is often transmitted in packed or encrypted form to prevent examination by anti-virus software. To analyze new malware, researchers typically resort to dynamic code analysis techniques to unpack the code for examination. Unfortunately, these dynamic techniques are susceptible to a variety of anti-monitoring defenses, as well as "time bombs" or "logic bombs," and can be slow and tedious to identify and disable. This paper discusses an alternative approach that relies on static analysis techniques to automate this process. Alias analysis can be used to identify the existence of unpacking, static slicing can identify the unpacking code, and control flow analysis can be used to identify and neutralize dynamic defenses. The identified unpacking code can be instrumented and transformed, then executed to perform the unpacking. We present a working prototype that can handle a variety of malware binaries, packed with both custom and commercial packers, and containing several examples of dynamic defenses. © 2009 IEEE.
- Krishnamoorthy, N., Debray, S., & Fligg, K. (2009). Static detection of disassembly errors. Proceedings - Working Conference on Reverse Engineering, WCRE, 259-268.More infoAbstract: Static disassembly is a crucial first step in reverse engineering executable files, and there is a considerable body of work in reverse-engineering of binaries, as well as areas such as semantics-based security analysis, that assumes that the input executable has been correctly disassembled. However, disassembly errors, e.g., arising from binary obfuscations, can render this assumption invalid. This work describes a machine-learning-based approach, using decision trees, for statically identifying possible errors in a static disassembly; such potential errors may then be examined more closely, e.g., using dynamic analyses. Experimental results using a variety of input executables indicate that our approach performs well, correctly identifying most disassembly errors with relatively few false positives. © 2009 IEEE.
- Preda, M. D., Christodorescu, M., Jha, S., & Debray, S. (2008). A semantics-based approach to malware detection. ACM Transactions on Programming Languages and Systems, 30(5).More infoAbstract: Malware detection is a crucial aspect of software security. Current malware detectors work by checking for signatures, which attempt to capture the syntactic characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes current detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter the syntactic properties of the malware byte sequence without significantly affecting their execution behavior. This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behavior of malware as well as that of the program being checked for infection, and uses abstract interpretation to "hide" irrelevant aspects of these behaviors. As a concrete application of our approach, we show that (1) standard signature matching detection schemes are generally sound but not complete, (2) the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers and (3) the malware detection scheme proposed by Kinder et al. and based on standard model-checking techniques is sound in general and complete on some, but not all, obfuscations handled by the semantics-aware malware detector. © 2008 ACM.
- Haifeng, H. e., Debray, S. K., & Andrews, G. R. (2007). The revenge of the overlay: Automatic compaction of OS kernel code via on-demand code loading. EMSOFT'07: Proceedings of the Seventh ACM and IEEE International Conference on Embedded Software, 75-83.More infoAbstract: There is increasing interest in using general-purpose operating systems, such as Linux, on embedded platforms. It is especially important in embedded systems to use memory efficiently because embedded processors often have limited physical memory. This paper describes an automatic technique for reducing the memory footprint of general-purpose operating systems on embedded platforms by keeping infrequently executed code on secondary storage and loading such code only if it is needed at run time. Our technique is based on an old idea - memory overlays - and it does not require hardware or operating system support for virtual memory. A prototype of the technique has been implemented for the Linux kernel. We evaluate our approach with two benchmark suites: MiBench and MediaBench, and a Web server application. The experimental results show that our approach reduces memory requirements for the Linux kernel code by about 53% with little degradation in performance. Copyright 2007 ACM.
- Haifeng, H. e., Trimble, J., Perianayagam, S., Debray, S., & Andrews, G. (2007). Code compaction of an operating system kernel. International Symposium on Code Generation and Optimization, CGO 2007, 283-295.More infoAbstract: General-purpose operating systems, such as Linux, are increasingly being used in embedded systems. Computational resources are usually limited, and embedded processors often have a limited amount of memory. This makes code size especially important. This paper describes techniques for automatically reducing the memory footprint of general-purpose operating systems on embedded platforms. The problem is complicated by the fact that kernel code tends to be quite different from ordinary application code, including the presence of a significant amount of hand-written assembly code, multiple entry points, implicit control flow paths involving interrupt handlers, and frequent indirect control flow via function pointers. We use a novel "approximate decompilation" technique to apply source-level program analysis to hand-written assembly code. A prototype implementation of our ideas on an Intel x86 platform, applied to a Linux kernel that has been configured to exclude unnecessary code, obtains a code size reduction of close to 24%. © 2007 IEEE.
- Preda, M. D., Christodorescu, M., Jha, S., & Debray, S. (2007). A semantics-based approach to malware detection. ACM SIGPLAN Notices, 42(1), 376-388.More infoAbstract: Malware detection is a crucial aspect of software security. Current malware detectors work by checking for "signatures," which attempt to capture (syntactic) characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes such detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter syntactic properties of the malware byte sequence without significantly affecting their execution behavior. This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behaviors of malware as well as the program being checked for infection, and uses abstract interpretation to "hide" irrelevant aspects of these behaviors. As a concrete application of our approach, we show that the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers. Copyright © 2007 ACM.
- Preda, M. D., Christodorescu, M., Jha, S., & Debray, S. (2007). A semantics-based approach to malware detection. Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 377-388.More infoAbstract: Malware detection is a crucial aspect of software security. Current malware detectors work by checking for "signatures," which attempt to capture (syntactic) characteristics of the machine-level byte sequence of the malware. This reliance on a syntactic approach makes such detectors vulnerable to code obfuscations, increasingly used by malware writers, that alter syntactic properties of the malware byte sequence without significantly affecting their execution behavior.This paper takes the position that the key to malware identification lies in their semantics. It proposes a semantics-based framework for reasoning about malware detectors and proving properties such as soundness and completeness of these detectors. Our approach uses a trace semantics to characterize the behaviors of malware as well as the program being checked for infection, and uses abstract interpretation to "hide" irrelevant aspects of these behaviors. As a concrete application of our approach, we show that the semantics-aware malware detector proposed by Christodorescu et al. is complete with respect to a number of common obfuscations used by malware writers. Copyright © 2007 ACM.
- Debray, S. (2005). Code compression. Lecture Notes in Computer Science, 3350, 5-6.More infoAbstract: The use of classical compiler optimizations to reduce the size of the generated code was discussed. Code size reduction is possible using techniques for code factoring, which aims to reduce code size by getting rid of repeated code fragments. Classical optimizations, coupled with code factoring, gives code size reductions of around 30% on average. An orthogonal direction to code size reduction involves dynamic code mutation.
- Dux, B., Iyer, A., Debray, S., Forrester, D., & Kobourov, S. (2005). Visualizing the behavior of dynamically modifiable code. Proceedings - IEEE Workshop on Program Comprehension, 337-340.More infoAbstract: Recent years have seen an increased recognition of some of the advantages offered by dynamically modifiable code, i.e., code that changes during the execution of the program. In its full generality, it can be very difficult to understand the behavior of such self-modifiable code. This paper describes a system that graphically displays the execution behavior of dynamic code, focusing on code modifications and their effect on the structure of the program, i.e., the call graph and control flow graphs of functions. This can help users visualize the structure of runtime code modifications and understand the behavior of dynamically modifiable programs. © 2005 IEEE.
- Jo, C., Mernik, M., Bryant, B. R., Ancona, M., Auguston, M., Cheung, S., Debray, S. K., Doh, K., Gabbrielli, M., Harris, T., Heering, J., Jeffery, C., Johnstone, A., Leung, H., Lins, R. D., Logozzo, F., E., P., Meijer, E., Michaelson, G., , Pareja-Flores, C., et al. (2005). Editorial: Programming languages track. Proceedings of the ACM Symposium on Applied Computing, 2, 1383-1384.
- Madou, M., Anckaert, B., Moseley, P., Debray, S., Sutter, B. D., & Bosschere, K. D. (2005). Software protection through dynamic code mutation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3786 LNCS, 194-206.More infoAbstract: Reverse engineering of executable programs, by disassembling them and then using program analyses to recover high level semantic information, plays an important role in attacks against software systems, and can facilitate software piracy. This paper introduces a novel technique to complicate reverse engineering. The idea is to change the program code repeatedly as it executes, thereby thwarting correct disassembly. The technique can be made as secure as the least secure component of opaque variables and pseudorandom number generators. © Springer-Verlag Berlin Heidelberg 2006.
- Snavely, N., Debray, S., & Andrews, G. R. (2005). Unpredication, unscheduling, unspeculation: Reverse engineering itanium executables. IEEE Transactions on Software Engineering, 31(2), 99-115.More infoAbstract: EPIC(Explicitly Parallel Instruction Computing) architectures, exemplified by the Intel Itanium, support a number of advanced architectural features, such as explicit instruction-level parallelism, instruction predication, and speculative loads from memory. However, compiler optimizations that take advantage of these features can profoundly restructure the program's code, making it potentially difficult to reconstruct the original program logic from an optimized Itanium executable. This paper describes techniques to undo some of the effects of such optimizations and thereby improve the quality of reverse engineering such executables. © 2005 IEEE.
- Udupa, S. K., Debray, S. K., & Madou, M. (2005). Deobfuscation reverse engineering obfuscated code. Proceedings - Working Conference on Reverse Engineering, WCRE, 2005, 45-56.More infoAbstract: In recent years, code obfuscation has attracted attention as a low cost approach to improving software security by making it difficult for attackers to understand the inner workings of proprietary software systems. This paper examines techniques for automatic deobfuscation of obfuscated programs, as a step towards reverse engineering such programs. Our results indicate that much of the effects of code obfuscation, designed to increase the difficulty of static analyses, can be defeated using simple combinations of straightforward static and dynamic analyses, Our results have applications to both software engineering and software security. In the context of software engineering, we show how dynamic analyses can be used to enhance reverse engineering, even for code that has been designed to be difficult to reverse engineer. For software security, our results serve as an attack model for code obfuscators, and can help with the development of obfuscation techniques that are more resilient to straightforward reverse engineering. © 2005 IEEE.
- Debray, S. (2004). Writing efficient programs performance issues in an undergraduate CS curriculum. SIGCSE Bulletin (Association for Computing Machinery, Special Interest Group on Computer Science Education), 36(1), 275-279.More infoAbstract: Performance is an essential aspect of many software systems, and it is important for programmers to understand performance issues. However, most undergraduate curricula do not explicitly cover performance issues-performance monitoring and profiling tools, performance improvement techniques, and case studies-in their curricula. This paper describes how we address this topic as part of a third-year programming course. We focus on tools and techniques for monitoring and improving performance, as well as the interaction between clean program design and performance tuning. Copyright 2004 ACM.
- Debray, S. (2004). Writing efficient programs: Performance issues in an undergraduate CS curriculum. Proceedings of the SIGCSE Technical Symposium on Computer Science Education, 275-279.More infoAbstract: Performance is an essential aspect of many software systems, and it is important for programmers to understand performance issues. However, most undergraduate curricula do not explicitly cover performance issues - performance monitoring and profiling tools, performance improvement techniques, and case studies - in their curricula. This paper describes how we address this topic as part of a third-year programming course. We focus on tools and techniques for monitoring and improving performance, as well as the interaction between clean program design and performance tuning.
- Debray, S., & Evans, W. S. (2003). Cold code decompression at runtime. Communications of the ACM, 46(8), 54-60.More infoAbstract: The decompression of selected code fragments during program execution using a software-based technique is discussed. The profiling information from the original, uncompressed program is used to choose code fragments that are infrequently executed. This limit the effect of time overhead involved in dynamic decompression on the execution speed of the entire program. It is expected that the better compression of the infrequently executed code can contribute to a significant improvement in overall size reduction, but the increased decompression effort does not lead to a significant runtime penalty. The benefits of using data compression techniques to dynamically decompressed infrequently executed codes are also discussed.
- Fernández, M., Espasa, R., & Debray, S. (2003). Load redundancy elimination on executable code. Concurrency Computation Practice and Experience, 15(10), 979-997.More infoAbstract: Optimizations performed at link time or directly applied to final program executables have received increased attention in recent years. This paper discusses the discovery and elimination of redundant load operations in the context of a link-time optimizer, an optimization that we call Load Redundancy Elimination (LRE). Our experiments show that between 50% and 75% of a program's memory references can be considered redundant because they are accessing memory locations that have been referenced less than 200-400 instructions away. We then present three profile-based LRE algorithms targeted at optimizing away these redundancies. Our results show that between 5% and 30% of the redundancy detected can indeed be eliminated, which translates into program speedups of around 8%. We also test our algorithm assuming different cache latencies, and show that, if latencies continue to grow, the load redundancy elimination will become more important. Copyright © 2003 John Wiley & Sons, Ltd.
- Linn, C., & Debray, S. (2003). Obfuscation of executable code to improve resistance to static disassembly. Proceedings of the ACM Conference on Computer and Communications Security, 290-299.More infoAbstract: A great deal of software is distributed in the form of executable code. The ability to reverse engineer such executables can create opportunities for theft of intellectual property via software piracy, as well as security breaches by allowing attackers to discover vulnerabilities in an application. The process of reverse engineering an executable program typically begins with disassembly, which translates machine code to assembly code. This is then followed by various decompilation steps that aim to recover higher-level abstractions from the assembly code. Most of the work to date on code obfuscation has focused on disrupting or confusing the de-compilation phase. This paper, by contrast, focuses on the initial disassembly phase. Our goal is to disrupt the static disassembly process so as to make programs harder to disassemble correctly. We describe two widely used static disassembly algorithms, and discuss techniques to thwart each of them. Experimental results indicate that significant portions of executables that have been obfuscated using our techniques are disassembled incorrectly, thereby showing the efficacy of our methods. Copyright 2003 ACM.
- Snavely, N., Debray, S., & Andrews, G. (2003). Unscheduling, Unpredication, Unspeculation: Reverse Engineering Itanium Executables. Reverse Engineering - Working Conference Proceedings, 4-13.More infoAbstract: EPIC (Explicitly Parallel Instruction Computing) architectures, exemplified by the Intel Itanium, support a number of advanced architectural features, such as explicit instruction-level parallelism, instruction predication, and speculative loads from memory. However, compiler optimizations to take advantage of such architectural features can profoundly restructure the program's code, making it potentially difficult to reconstruct the original program logic from an optimized Itanium executable. This paper describes techniques to undo some of the effects of such optimizations and thereby improve the quality of reverse engineering such executables.
- Debray, S. (2002). Making compiler design relevant for students who will (most likely) never design a compiler. SIGCSE Bulletin (Association for Computing Machinery, Special Interest Group on Computer Science Education), 341-345.More infoAbstract: Compiler Design courses are a common component of most modern Computer Science undergraduate curricula. At the same time, however, compiler design has become a highly specialized topic, and it is not clear that a significant number of Computer Science students will find themselves designing compilers professionally. This paper argues that the principles, techniques, and tools discussed in compiler design courses are nevertheless applicable to a wide variety of situations that would generally not be considered to be compiler design. Generalizing the content of compiler design courses to emphasize this broad applicability can make them more relevant to students.
- Debray, S., & Evans, W. (2002). Profile-guided code compression. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 95-105.More infoAbstract: As computers are increasingly used in contexts where the amount of available memory is limited, it becomes important to devise techniques that reduce the memory footprint of application programs while leaving them in an executable form. This paper describes an approach to applying data compression techniques to reduce the size of infrequently executed portions of a program. The compressed code is decompressed dynamically (via software) if needed, prior to execution. The use of data compression techniques increases the amount of code size reduction that can be achieved; their application to infrequently executed code limits the runtime overhead due to dynamic decompression; and the use of software decompression renders the approach generally applicable, without requiring specialized hardware. The code size reductions obtained depend on the threshold used to determine what code is "infrequently executed" and hence should be compressed: for low thresholds, we see size reductions of 13.7% to 18.8%, on average, for a set of embedded applications, without excessive runtime overhead.
- Rajagopalan, M., Debray, S. K., Hiltunen, M. A., & Schlichting, R. D. (2002). Profile-directed optimization of event-based programs. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 106-116.More infoAbstract: Events are used as a fundamental abstraction in programs ranging from graphical user interfaces (GUIs) to systems for building customized network protocols. While providing a flexible structuring and execution paradigm, events have the potentially serious drawback of extra execution overhead due to the indirection between modules that raise events and those that handle them. This paper describes an approach to addressing this issue using static optimization techniques. This approach, which exploits the underlying predictability often exhibited by event-based programs, is based on first profiling the program to identify commonly occurring event sequences. A variety of techniques that use the resulting profile information are then applied to the program to reduce the overheads associated with such mechanisms as indirect function calls and argument marshaling. In addition to describing the overall approach, experimental results are given that demonstrate the effectiveness of the techniques. These results are from event-based programs written for X Windows, a system for building GUIs, and Cactus, a system for constructing highly configurable distributed services and network protocols.
- Muth, R., Debray, S. K., Watterson, S., & Bosschere, K. D. (2001). alto: a link-time optimizer for the Compaq Alpha. Software - Practice and Experience, 31(1), 67-101.More infoAbstract: Traditional optimizing compilers are limited in the scope of their optimizations by the fact that only a single function, or possibly a single module, is available for analysis and optimization. In particular, this means that library routines cannot be optimized to specific calling contexts. Other optimization opportunities, exploiting information not available before link time, such as addresses of variables and the final code layout, are often ignored because linkers are traditionally unsophisticated. A possible solution is to carry out whole-program optimization at link time. This paper describes alto, a link-time optimizer for the Compaq Alpha architecture. It is able to realize significant performance improvements even for programs compiled with a good optimizing compiler with a high level of optimization. The resulting code is considerably faster than that obtained using the OM link-time optimizer, even when the latter is used in conjunction with profile-guided and inter-file compile-time optimizations.
- Sutter, B. D., Bus, B. D., Bosschere, K. D., & Debray, S. (2001). Combining global code and data compaction. SIGPLAN Notices (ACM Special Interest Group on Programming Languages), 36(8), 29-38.More infoAbstract: Computers are increasingly being incorporated in devices with a limited amount of available memory. As a result research is increasingly focusing on the automated reduction of program size. Existing literature focuses on either data or code compaction or on highly language dependent techniques. This paper shows how combined code and data compaction can be achieved using a link-time code compaction system that reasons about the use of both code and data addresses. The analyses proposed rely only on fundamental properties of linked code and are therefore generally applicable. The combined code and data compaction is implemented in SQUEEZE, a link-time program compaction system, and evaluated on SPEC2000, MediaBench and C++ programs, resulting in total binary program size reductions of 23.6%-46.6%. This compaction involves no speed tradeoff, as the compacted programs are on average about 8% faster. Copyright ACM 2001.
- Debray, S. K., Evans, W., Muth, R., & Sutter, B. D. (2000). Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems, 22(2), 378-415.More infoAbstract: In recent years there has been an increasing trend toward the incorporation of computers into a variety of devices where the amount of memory available is limited. This makes it desirable to try to reduce the size of applications where possible. This article explores the use of compiler techniques to accomplish code compaction to yield smaller executables. The main contribution of this article is to show that careful, aggressive, interprocedural optimization, together with procedural abstraction of repeated code fragments, can yield significantly better reductions in code size than previous approaches, which have generally focused on abstraction of repeated instruction sequences. We also show how "equivalent" code fragments can be detected and factored out using conventional compiler techniques, and without having to resort to purely linear treatments of code sequences as in suffix-tree-based approaches, thereby setting up a framework for code compaction that can be more flexible in its treatment of what code fragments are considered equivalent. Our ideas have been implemented in the form of a binary-rewriting tool that reduces the size of executables by about 30% on the average.
- Debray, S., & Hickey, T. (2000). Constraint-based termination analysis for cyclic active database rules. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1861 LNAI, 1121-1136.More infoAbstract: There are many situations where cyclic rule activations - where some set of active database rules may be activated repeatedly until the database satisfies some condition - arise naturally. However, most existing approaches to termination analysis of active rules, which typically rely on checking that the triggering graph for the rules is acyclic, cannot infer termination for such rules. We present a constraint-based approach to termination analysis that is able to handle such cyclic rule activations for a wide class of rules. © Springer-Verlag Berlin Heidelberg 2000.
- Muth, R., & Debray, S. (2000). On the complexity of flow-sensitive dataflow analyses. Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 67-80.More infoAbstract: This paper attempts to address the question of why certain dataflow analysis problems can be solved efficiently, but not others. We focus on flow-sensitive analyses, and give a simple and general result that shows that analyses that require the use of relational attributes for precision must be PSPACE-hard in general. We then show that if the language constructs are slightly strengthened to allow a computation to maintain a very limited summary of what happens along an execution path, inter-procedural analyses become EXPTIME-hard. We discuss applications of our results to a variety of analyses discussed in the literature. Our work elucidates the reasons behind the complexity results given by a number of authors, improves on a number of such complexity results, and exposes conceptual commonalities underlying such results that are not readily apparent otherwise.
- Bigot, P. A., & Debray, S. (1999). Return value placement and tail call optimization in high level languages. Journal of Logic Programming, 38(1), 1-29.More infoAbstract: This paper discusses the interaction between tail call optimization and the placement of output values in functional and logic programming languages. Implementations of such languages typically rely on fixed placement policies: most functional language implementations return output values in registers, while most logic programming systems return outputs via memory. Such fixed placement policies incur unnecessary overheads in many commonly encountered situations: the former are unable to implement many intuitively iterative computations in a truly iterative manner, while the latter incur a performance penalty due to additional memory references. We describe an approach that determines, based on a low-level cost model for an implementation together with an estimated execution profile for a program, whether or not the output of a procedure should be returned in registers or in memory. This can be seen as realising a restricted form of inter-procedural register allocation, and avoids the disadvantages associated with the fixed register and fixed memory output placement policies. Experimental results indicate that it provides good performance improvements compared to existing approaches. © 1999 Elsevier Science Inc. All rights reserved.
- Debray, S., Muth, R., & Weippert, M. (1998). Alias analysis of executable code. Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 12-24.More infoAbstract: Recent years have seen increasing interest in systems that reason about and manipulate executable code. Such systems can generally benefit from information about aliasing. Unfortunately, most existing alias analyses are formulated in terms of high-level language features, and are unable to cope with features, such as pointer arithmetic, that pervade executable programs. This paper describes a simple algorithm that can be used to obtain aliasing information for executable code. In order to be practical, the algorithm is careful to keep its memory requirements low, sacrificing precision where necessary to achieve this goal. Experimental results indicate that it is nevertheless able to provide a reasonable amount of information about memory references across a variety of benchmark programs.
- Bigot, P. A., & Debray, S. K. (1997). A simple approach to supporting untagged objects in dynamically typed languages. Journal of Logic Programming, 32(1), X-47.More infoAbstract: In many modern high-level programming languages, the exact low-level representation of data objects cannot always be predicted at compile-time. Implementations usually get around this problem using descriptors ("tags") and/or indirect ("boxed") representations. However, the flexibility so gained can come at the cost of significant performance overheads. The problem is especially acute in dynamically typed languages, where both tagging and boxing are necessary in general. This paper discusses a straightforward approach to using untagged and unboxed values in dynamically typed languages. An implementation of our algorithms allows a dynamically typed language to attain performance close to that of highly optimized C code on a variety of benchmarks (including many floating-point intensive computations) and dramatically reduces heap usage. © Elsevier Science Inc., 1997.
- Bigot, P. A., & Debray, S. K. (1997). Simple approach to supporting untagged objects in dynamically typed languages. Journal of Logic Programming, 32(1), 25-47.More infoAbstract: In many modern high-level programming languages, the exact low-level representation of data objects cannot always be predicted at compile-time. Implementations usually get around this problem using descriptors ('tags') and/or indirect ('boxed') representations. However, the flexibility so gained can come at the cost of significant performance overheads. The problem is especially acute in dynamically typed languages, where both tagging and boxing are necessary in general. This paper discusses a straightforward approach to using untagged and unboxed values in dynamically typed languages. An implementation of our algorithms allows a dynamically typed language to attain performance close to that of highly optimized C code on a variety of benchmarks (including many floating-point intensive computations) and dramatically reduces heap usage.
- Debray, S. (1997). Resource-Bounded Partial Evaluation. SIGPLAN Notices (ACM Special Interest Group on Programming Languages), 32(12), 179-192.More infoAbstract: Most partial evaluators do not take the availability of machine-level resources, such as registers or cache, into consideration when making their specialization decisions. The resulting resource contention can lead to severe performance degradation - causing, in extreme cases the specialized code to run slower than the unspecialized code. In this paper we consider how resource considerations can be incorporated within a partial evaluanor. We develop an abstract formulation of the problem show that optimal resource-bounded partial evaluation is NP-complete, and discuss simple heuristics that can be used to address the problem in practice.
- Debray, S. (1997). Resource-bounded partial evaluation. Proceedings of the ACM SIGPLAN Symposium on Partial Evaluation and Semantics-Based Program Manipulation, 179-192.More infoAbstract: Most partial evaluators do not take the availability of machine-level resources, such as registers or cache, into consideration when making their specialization decisions. The resulting resource contention can lead to severe performance degradation - causing, in extreme cases, the specialized code to run slower than the unspecialized code. In this paper we consider how resource considerations can be incorporated within a partial evaluator. We develop an abstract formulation of the problem, show that optimal resource-bounded partial evaluation is NP-complete, and discuss simple heuristics that can be used to address the problem in practice.
- Debray, S. K., & Proebsting, T. A. (1997). Interprocedural Control Flow Analysis of First-Order Programs with Tail-Call Optimization. ACM Transactions on Programming Languages and Systems, 19(4), 568-585.More infoAbstract: Knowledge of low-level control flow is essential for many compiler optimizations. In systems with tail-call optimization, the determination of interprocedural control flow is complicated by the fact that because of tail-call optimization, control flow at procedure returns is not readily evident from the call graph of the program. This article shows how interprocedural control-flow analysis of first-order programs can be carried out using well-known concepts from parsing theory. In particular, we show that context-insensitive (or zeroth-order) control-flow analysis corresponds to the notion of FOLLOW sets in context-free grammars, while context-sensitive (or first-order) control-flow analysis corresponds to the notion of LR(1) items. The control-flow information so obtained can be used to improve the precision of interprocedural dataflow analyses as well as to extend certain low-level code optimizations across procedure boundaries.
- Debray, S. K., Gudeman, D., & Bigot, P. (1996). Detection and optimization of suspension-free logic programs. Journal of Logic Programming, 29(1-3), 170-194.More infoAbstract: In recent years, language mechanisms to suspend, or delay, the execution of goals until certain variables become bound have become increasingly popular in logic programming languages. While convenient, such mechanisms can make control flow within a program difficult to predict at compile time, and therefore render many traditional compiler optimizations inapplicable. Unfortunately, this performance cost is also incurred by programs that do not use any delay primitives. In this paper, we describe a simple dataflow analysis for detecting computations where suspension effects can be ignored, and discuss several low-level optimizations that rely on this information. Our algorithm has been implemented in the jc system. Optimizations based on information it produces result in significant performance improvements, demonstrating speed comparable to or exceeding that of optimized C programs.
- Lopez, P., Hermenegildo, M., & Debray, S. (1996). A Methodology for granularity-based control of parallelism in logic programs. Journal of Symbolic Computation, 21(4-6), 715-734.More infoAbstract: Several types of parallelism can be exploited in logic programs while preserving correctness and efficiency, i.e. ensuring that the parallel execution obtains the same results as the sequential one and the amount of work performed is not greater. However, such results do not take into account a number of overheads which appear in practice, such as process creation and scheduling, which can induce a slow-down, or, at least, limit speedup, if they are not controlled in some way. This paper describes a methodology whereby the granularity of parallel tasks, i.e. the work available under them, is efficiently estimated and used to limit parallelism so that the effect of such overheads is controlled. The run-time overhead associated with the approach is usually quite small, since as much work is done at compile time as possible. Also, a number of run-time optimizations are proposed. Moreover, a static analysis of the overhead associated with the granularity control process is performed in order to decide its convenience. The performance improvements resulting from the incorporation of grain size control are shown to be quite good, specially for systems with medium to large parallel execution overheads. © 1996 Academic Press Limited.
- Bruynooghe, M., Debray, S., Hermenegildo, M., & Maher, M. (1995). Special section: Ten Years of Logic Programming. The Journal of Logic Programming, 23(2), 87-88.
- Debray, S. K. (1995). On the complexity of dataflow analysis of logic programs. ACM Transactions on Programming Languages and Systems, 17(2), 331-365.More infoAbstract: It is widely held that there is a correlation between complexity and precision in dataflow analysis, in the sense that the more precise an analysis algorithm, the more computationally expensive it must be. The details of this relationship, however, appear to not have been explored extensively. This article reports some results on this correlation in the context of logic programs. A formal notion of the 'precision' of an analysis algorithm is proposed, and this is used to characterize the worst-case computational complexity of a number of dataflow analyses with different degrees of precision. While this article considers the analysis of logic programs, the technique proposed, namely the use of 'exactness sets' to study relationships between complexity and precision of analyses, is not specific to logic programming in any way, and is equally applicable to flow analyses of other language families.
- Giacobazzi, R., Debray, S. K., & Levi, G. (1995). Generalized semantics and abstract interpretation for constraint logic programs. The Journal of Logic Programming, 25(3), 191-247.More infoAbstract: We present simple and powerful generalized algebraic semantics for constraint logic programs that are parameterized with respect to the underlying constraint system. The idea is to abstract away from standard semantic objects by focusing on the general properties of any-possibly nonstandard-semantic definition. In constraint logic programming, this corresponds to a suitable definition of the constraint system supporting the semantic definition. An algebraic structure is introduced to formalize the notion of a constraint system, thus making classical mathematical results applicable. Both top-down and bottom-up semantics are considered. Nonstandard semantics for constraint logic programs can then be formally specified using the same techniques used to define standard semantics. Different nonstandard semantics for constraint logic languages can be specified in this framework. In particular, abstract interpretation of constraint logic programs can be viewed as an instance of the constraint logic programming framework itself. © 1995.
- Leonard, P. A., Bailey, M. L., & Debray, S. K. (1995). Experiences with high-level parallel programming systems. Conference Proceedings - International Phoenix Conference on Computers and Communications, 472-478.More infoAbstract: A programmer developing software for parallel systems usually is responsible for the management of low level issues such as communication and synchronization. This is potentially problematic from the perspectives of correctness and portability. This paper considers an alternative approach, namely, the use of high level parallel programming systems. We report our experiences with implementing parallel simulation problem using two very different programming systems: Poker, a high-level tool for parallel programming, based on the programming language C; and Janus, a concurrent logic programming language. Overall, our experience was that the use of such high-level tools greatly simplified the task of developing parallel programs for `ordinary' programmers. While the fundamentally different programming paradigms underlying the two systems resulted in implementations with very different code, our overall conclusions of the strengths and weaknesses of the two systems were remarkably similar. This suggests that the abstractions supported by a high level parallel programming system are fundamentally more important in achieving good programming solutions than the particular underlying programming paradigm.
- Bosschere, K. D., Debray, S., Gudeman, D., & Kannan, S. (1994). Call forwarding: a simple interprocedural optimization technique for dynamically typed languages. Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 409-420.More infoAbstract: This paper discusses call forwarding, a simple interprocedural optimization technique for dynamically typed languages. The basic idea behind the optimization is straightforward: fin an ordering for the 'entry actions' of a procedure, and generate multiple entry points for the procedure, so as to maximize the savings realized from different call sites bypassing different sets of entry actions. We show that the problem of computing optimal solutions to arbitrary call forwarding problems is NP-complete, and describe an efficient greedy algorithm for the problem. Experimental results indicate that (i) this algorithm is effective, in that the solutions produced are generally close to optimal; and (ii) the resulting optimization leads to significant performance improvements for a number of benchmarks tested.
- Bruynooghe, M., Debray, S., Hermenegildo, M., & Maher, M. (1994). Guest editors' introduction. The Journal of Logic Programming, 19-20(SUPPL. 1), 1-3.
- Debray, S., & Ramakrishnan, R. (1994). Abstract interpretation of logic programs using magic transformations. The Journal of Logic Programming, 18(2), 149-176.More infoAbstract: In dataflow analysis of logic programs, information must be propagated according to the control strategy of the language under consideration. However, for languages with top-down control flow, naive top-down dataflow analyses may have problems guaranteeing completeness and/or termination. What is required in the dataflow analysis is a bottom-up fixpoint computation, guided by the (possibly top-down) control strategy of the language. This paper describes the application of the magic templates algorithm, originally devised as a technique for efficient bottom-up computation of logic programs, to dataflow analysis of logic programs. It turns out that a direct application of the magic templates algorithm can result in an undesirable loss in precision, because connections between "calling patterns" and the corresponding "success patterns" may be lost. We show how the original magic templates algorithm can be modified to avoid this problem, and prove that the resulting analysis algorithm is at least as precise as any other abstract interpretation that uses the same abstract domain and abstract operations. © 1994.
- Bowman, M., Debray, S. K., & Peterson, L. L. (1993). Reasoning about naming systems. ACM Transactions on Programming Languages and Systems, 15(5), 795-825.More infoAbstract: This paper reasons about naming systems as specialized inference mechanisms. It describes a preference hierarchy that can be used to specify the structure of a naming system's inference mechanism and defines criteria by which different naming systems can be evaluated. For example, the preference hierarchy allows one to compare naming systems based on how discriminating they are and to identify the class of names for which a given naming system is sound and complete. A study of several example naming systems demonstrates how the preference hierarchy can be used as a formal tool for designing naming systems.
- Codish, M., Debray, S. K., & Giacobazzi, R. (1993). Compositional analysis of modular logic programs. Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 451-464.More infoAbstract: This paper describes a semantic basis for a compositional approach to the analysis of logic programs. A logic program is viewed as consisting of a set of modules, each module defining a subset of the program's predicates. Analyses are constructed by considering abstract interpretations of a compositional semantics. The abstract meaning of a module corresponds to its analysis and composition of abstract meanings corresponds to composition of analyses. Such an approach is essential for large program development so that altering one module does not require re-analysis of the entire program. We claim that for a substantial class of programs, compositional analyses which are based on a notion of abstract unfolding provide the same precision as non-compositional analysis. A compositional analysis for ground dependencies is included to illustrate the approach. To the best of our knowledge this is the first account of a compositional framework for the analysis of logic programs.
- Debray, S. K. (1993). QD-Janus: A sequential implementation of Janus in Prolog. Software - Practice and Experience, 23(12), 1337-1360.More infoAbstract: Janus is a language designed for distributed constraint programming. This paper describes QD-Janus, a sequential implementation of Janus in Prolog. The compiler uses a number of novel analyses and optimizations to improve the performance of the system. The choice of Prolog as the target language for a compiler, although unusual, is motivated by the following: (i) the semantic gap between Janus and Prolog is much smaller than that between Janus and, say, C or machine language - this simplifies the compilation process significantly, and makes it possible to develop a system with reasonable performance fairly quickly; (ii) recent progress in Prolog implementation techniques, and the development of Prolog systems whose speeds are comparable to those of imperative languages, indicates that the translation to Prolog need not entail a significant performance loss compared to native code compilers; and (iii) compilation to Prolog can benefit immediately from a significant body of work on, and implementations of, parallel Prolog systems. Our experience indicates that translation of logic programming languages to Prolog, accompanied by the development of good program analysis and optimization tools, is an effective way to quickly develop flexible and portable implementations with good performance and low cost.
- Debray, S. K., & Lin, N. (1993). Cost analysis of logic programs. ACM Transactions on Programming Languages and Systems, 15(5), 826-875.More infoAbstract: Cost analysis of programs has been studied in the context of imperative and functional programming languages. For logic programs, the problem is complicated by the fact that programs may be nondeterministic and produce multiple solutions. This paper addresses these problems and develops a method for (semi-)automatic analysis of the worst-case cost of a large class of logic programs. The primary contribution of this paper is the development of techniques to deal with nondeterminism and the generation of multiple solutions via backtracking.
- Debray, S. K. (1992). A simple code improvement scheme for prolog. The Journal of Logic Programming, 13(1), 57-88.More infoAbstract: The generation of efficient code for Prolog programs requires sophisticated code transformation and optimization systems. Much of the recent work in this area has focused on high level transformations, typically at the source level. Unfortunately, such high level transformations suffer from the deficiency of being unable to address low level implementation details. This paper presents a simple code improvement scheme that can be used for a variety of low level optimizations. Applications of this scheme are illustrated using low level optimizations that reduce tag manipulation, dereferencing, trail testing, envi- ronment allocation, and redundant bound checks. The transformation scheme serves as a unified framework for reasoning about a variety of low level optimizations that have, to date, been dealt with in a more or less ad hoc manner. © 1992.
- Debray, S. K. (1992). Efficient dataflow analysis of logic programs. Journal of the ACM, 39(4), 949-984.More infoAbstract: A framework for efficient dataflow analyses of logic programs is investigated. A number of problems arise in this context: aliasing effects can make analysis computationally expensive for sequential logic programming languages; synchronization issues can complicate the analysis of parallel logic programming languages: and finiteness restrictions to guarantee termination can limit the expressive power of such analyses. Our main result is to give a simple characterization of a family of flow analyses where these issues can be ignored without compromising soundness. This results in algorithms that are simple to verify and implement, and efficient in execution. Based on this approach, we describe an efficient algorithm for flow analysis of sequential logic programs, extend this approach to handle parallel executions, and finally describe how infinite chains in the analysis domain can be accommodated without compromising termination.
- Debray, S. K. (1992). Guest editor's introduction. The Journal of Logic Programming, 13(2-3), 99-101.
- Hermenegildo, M., Warren, R., & Debray, S. (1992). Global flow analysis as a practical compilation tool. The Journal of Logic Programming, 13(4), 349-366.More infoAbstract: This paper addresses the issue of the practicality of global flow analysis in logic program compilation, in terms of speed of the analysis, precision, and usefulness of the information obtained. To this end, design and implementation aspects are discussed for two practical abstract interpretation-based flow analysis systems: MA3, the MCC And-parallel Analyzer and Annotator; and Ms, an experimental mode inference system developed for SB-Prolog. The paper also provides performance data obtained from these implementations and, as an example of an application, a study of the usefulness of the mode information obtained in reducing run-time checks in independent and-parallelism. Based on the results obtained, it is concluded that the overhead of global flow analysis is not prohibitive, and the results of analysis can be quite precise and useful. © 1992.
- Debray, S. K., & Lin, N. (1991). Automatic complexity analysis of logic programs. Array, 599-613.More infoAbstract: Automatic complexity analysis of programs has been widely studied in the context of functional languages. This paper develops a method for automatic analysis of the worse-case complexity of a large class of logic programs. The primary contribution of this paper is that it shows how to deal with nondeterminism and the generation of multiple solutions via backtracking. One advantage of our method is that analyses for different complexity measures (e.g. time complexity, space complexity, number of solutions, etc) are performed in a unified framework that simplifies both formal reasoning about, and implementation of, the algorithms.
- Debray, S. K., & Warren, D. S. (1990). Towards banishing the cut from Prolog. IEEE Transactions on Software Engineering, 16(3), 335-349.More infoAbstract: Logic programs can often be inefficient. The usual solution to this problem has been to return some control to the user in the form of impure language features like cut. The authors argue that it is not necessary to resort to such impure features for efficiency. This point is illustrated by considering how most of the common uses of cut can be eliminated from Prolog source programs, relying on static analysis to generate them at compile time. Three common situations where the cut is used are considered. Static analysis techniques are given to detect such situations, and applicable program transformations are described. Two language constructs, firstof and oneof, for situations involving don't-care nondeterminism, are suggested. These constructs have better declarative readings than the cut and extend better to parallel evaluation strategies. Together, these proposals result in a system where users need rely much less on cuts for efficiency, thereby promoting a purer programming style without sacrificing efficiency.
- Debray, S. K., Lin, N., & Hermenegildo, M. (1990). Task granularity analysis in logic programs. SIGPLAN Notices (ACM Special Interest Group on Programming Languages), 25(6), 174-188.More infoAbstract: While logic programming languages offer a great deal of scope for parallelism, there is usually some overhead associated with the execution of goals in parallel because of the work involved in task creation and scheduling. In practice, therefore, the 'granularity' of a goal, i.e. an estimate of the work available under it, should be taken into account when deciding whether or not to execute a goal concurrently as a separate task. This paper describes a method for estimating the granularity of a goal at compile time. The runtime overhead associated with our approach is usually quite small, and the performance improvements resulting from the incorporation of grainsize control can be quite good. This is shown by means of experimental results.
- Debray, S. K. (1989). Flow analysis of dynamic logic programs. The Journal of Logic Programming, 7(2), 149-176.More infoAbstract: Research on flow analysis and optimization of logic programs typically assumes that the programs being analyzed are static, i.e. any code that can be executed at runtime is available for analysis at compile time. This assumption may not hold for "real" programs, which can contain dynamic goals of the form call(X), where X is a variable at compile time, or where predicates may be modified via features like assert and retract. In such contexts, a compiler must be able to take the effects of such dynamic constructs into account in order to perform nontrivial flow analyses that can be guaranteed to be sound. This paper outlines how this may be done for certain kinds of dynamic programs. Our techniques allow analysis and optimization techniques that have been developed for static programs to be extended to a large class of "well-behaved" dynamic programs. © 1989.
- Debray, S. K. (1989). Static inference of modes and data dependencies in logic programs. ACM Transactions on Programming Languages and Systems, 11(3), 418-450.More infoAbstract: Mode and data dependency analyses find many applications in the generation of efficient executable code for logic programs. For example, mode information can be used to generate specialized unification instructions where permissible, to detect determinacy and functionality of programs, generate index structures more intelligently, reduce the amount of runtime tests in systems that support goal suspension, and in the integration of logic and functional languages. Data dependency information can be used for various source-level optimizing transformations, to improve backtracking behavior and to parallelize logic programs. This paper describes and proves correct an algorithm for the static inference of modes and data dependencies in a program. The algorithm is shown to be quite efficient for programs commonly encountered in practice.
- Debray, S. K., & Warren, D. S. (1989). Functional computations in logic programs. ACM Transactions on Programming Languages and Systems, 11(3), 451-481.More infoAbstract: Although the ability to simulate nondeterminism and to compute multiple solutions for a single query is a powerful and attractive feature of logic programming languages, it is expensive in both time and space. Since programs in such languages are very often functional, that is, they do not produce more than one distinct solution for a single input, this overhead is especially undesirable. This paper describes how programs may be analyzed statically to determine which literals and predicates are functional, and how the program may then be optimized using this information. Our notion of 'functionality' subsumes the notion of 'determinacy' that has been considered by various researchers. Our algorithm is less reliant on language features such as the cut, and thus extends more easily to parallel execution strategies, than others that have been proposed.
- Debray, S. K. (1988). PROFILING PROLOG PROGRAMS.. Software - Practice and Experience, 18(9), 821-839.More infoAbstract: Profilers play an important role in the development of efficient programs. Profiling techniques developed for traditional languages are inadequate for logic programming languages, for a number of reasons: first, the flow of control in logic programming languages, involving backtracking and failure, is significantly more complex than in traditional languages; second, the time taken by a unification operation, the principal primitive operation of such languages, cannot be predicted statically because it depends on the size of the input; and finally, programs may change at run-time because clauses may be added or deleted using primitives such as assert and retract. This paper describes a simple profiler for Prolog. The ideas outlined here may be used either to implement a simple interactive profiler, or integrated into Prolog compilers.
- Debray, S. K., & Mishra, P. (1988). Denotational and operational semantics for prolog. The Journal of Logic Programming, 5(1), 61-91.More infoAbstract: The semantics of PROLOG programs is usually given in terms of the model theory of first-order logic. However, this does not adequately characterizethe computational behavior of PROLOG programs. PROLOG implementations typically use a sequential evaluation strategy based on the textual order of clauses and literals in a program, as well as nonlogical features like cut. In this work we develop a denotational semantics that captures thecomputational behavior of PROLOG. We present a semantics for "cut-free" PROLOG, which is then extended to PROLOG with cut. For each case we develop a congruence proof that relates the semantics to a standard operational interpreter. As an application of our denotational semantics, we show the correctness of some standard "folk" theorems regarding transformations on PROLOG programs. © 1988.
- Debray, S. K., & Warren, D. S. (1988). Automatic mode inference for logic programs. The Journal of Logic Programming, 5(3), 207-229.More infoAbstract: In general, logic programs are undirected, i.e., there is no concept of "input" and "output" arguments to a procedure. An argument may be used either as an input or as an output argument, and programs may be executed either in a "forward" direction or in a "backward" direction. However, it is often the case that in a given program, a predicate is used with some of its arguments used consistently as input arguments and others as output arguments. Such mode information can be used by a compiler to effect various optimizations. This paper considers the problem of automatically inferring the models of the predicates in a program. The dataflow analysis we use is more powerful than approaches relying on syntactic characteristics of programs. Our work differs from that of Mellish in that (1) we give a sound and efficient treatment of variable aliasing in mode inference; (2) by propagating instantiation information using state transformations rather than through dependencies between variables, we achieve greater precision in the treatment of unification, e.g. through =/2; and (3) we describe an efficient implementation based on the dynamic generation of customized mode interpreters. Several optimizations to improve the performance of the mode inference algorithm are described, as are various program optimizations based on mode information. © 1988.
- Debray, S. K. (1987). FLOW ANALYSIS OF A SIMPLE CLASS OF DYNAMIC LOGIC PROGRAMS.. Array, 307-316.More infoAbstract: Research on flow analysis and optimization of logic programs has typically concentrated on programs that are static, i. e. which do not change at runtime through the use of constructs like Prolog's 'assert'. It is often the case, however, that a program may use assert in a way that has only local effects on parts of the program, leaving the rest of the program unaffected. In such cases, it would be desirable to be able to identify such unaffected portions of the program and carry out analysis and optimization on these portions as before. An outline is presented of how this might be done for a simple class of dynamic programs, using a type analysis scheme to estimate the effects of runtime modifications to the program. The author's approach allows static analysis and optimization techniques that have been developed for static programs to be extended to a reasonably large class of 'well-behaved' dynamic programs.
- Debray, S. K. (1986). REGISTER ALLOCATION IN A PROLOG MACHINE.. Array, 267-275.More infoAbstract: The author considers the Prolog engine described by D. H. D. Warren (1983). He describes three register allocation algorithms for such a machine. These strategies rely on a high-level analysis of the source program to compute information which is then used for register allocation during code generation. The algorithms are simple yet quite efficient, and produce code of good quality.
- Debray, S. K. (1986). TOWARDS BANISHING THE CUT FROM PROLOG.. Array, 2-12.More infoAbstract: The author considers the use of the 'cut' language feature in logic programming to provide greater efficiency by returning some control to the user. He argues that it is not necessary to resort to such impure features for efficiency. This point is illustrated by considering how most of the common uses of cut can be eliminated from Prolog source programs, relying on static analysis to generate them at compile time. Three common situations where the cut is used are considered. Static analysis techniques are given to detect such situations, and applicable program transformations are described. The author also proposes new language constructs, 'firstof' and 'oneof,' for situations involving 'don't care' nondeterminism. These constructs have better declarative readings than the cut and extend better to parallel evaluation strategies.
- Debray, S. K., & Warren, D. S. (1986). AUTOMATIC MODE INFERENCE FOR PROLOG PROGRAMS.. Array, 78-88.More infoAbstract: The problem of automatically inferring the modes of the predicates in a program is considered. The dataflow analysis used is more powerful than approaches relying on syntactic characteristics of programs. A sound and efficient treatment of variable aliasing in mode inference is given. By propagating instantiation information using state transformations, rather than through dependencies between variables, greater precision in the treatment of unification than in previous studies is achieved. Several optimizations to improve the performance of the mode inference algorithm are described, as are various program optimizations based on mode information.
- Smolka, S. A., Frank, A. J., & Debray, S. K. (1985). TESTING PROTOCOL ROBUSTNESS THE CCS WAY.. Array, 93-109.More infoAbstract: We present a procedure to decide if a finite-state communication protocol provides end-users with a robust communication medium. The underlying model is algebraic and is based on Milner's CCS and observation equivalence, the notion that two processes cannot be distinguished by an external observer. In particular, we test if the protocol's visible behavior is observationally equivalent to the behavior of an ideal communication medium. We develop asynchronous versions of the protocols to illustrate our technique.
- Warren, D. S., Ahamad, M., Debray, S. K., & Kale, L. V. (1984). EXECUTING DISTRIBUTED PROLOG PROGRAMS ON A BROADCAST NETWORK.. Array, 12-21.
Proceedings Publications
- Strout, M., Debray, S. K., Isaacs, K., Kreaseck, B., Cardenas-Rodriguez,, J., Hurwitz, B., Volk, K., Badger, S., Bertolacci, I., Bartels, J., Devkota, S., Encinas, A., Gaska, B., Sackos, T., Stephens, J., Willer, S., & Yadegari, B. (2017, October). Language-Agnostic Optimization and Parallelization for Interpreted Languages. In The 30th International Workshop on Languages and Compilers for Parallel Computing (LCPC 2017).
Presentations
- Bartels, J., Stephens, J., & Debray, S. K. (2017, Oct 2017). Constructing Dynamic Control Flow Graphs from Execution Traces. International Workshop on Dynamic Analysis (WODA 2017).
- Debray, S. K. (2016, December). Information flow obfuscation. Keynote address: Software Security, Protection, and Reverse Engineering Workshop.
- Debray, S. K. (2015, March). Deobfuscation: ten years later. 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering. Montreal, Canada: IEEE.More infoInvited talk.
- Debray, S. K. (2014, June). Understanding Programs that Don't Want to be Understood. Dagstuhl Seminar 14241: "Challenges in Analysing Executables: Scalability, Self-Modifying Code and Synergy". Dagstuhl, Germany.
Others
- Debray, S. K., Snodgrass, R. T., & Zhang, R. (2017, Feb). Methods of Microspecialization in Database Management Systems. United States Patent and Trademark Office.More infoU.S. Patent No. 9,576,002.
- Debray, S. K., & Collberg, C. S. (2014, June). Code Obfuscation: An Informal Assessment. White paper provided at the request of Dr. Mike Hsieh (Program Manager at DARPA).
- Debray, S. K., Giacobazzi, R., & Collberg, C. S. (2014, June). Strength of Obfuscation. White paper created at the request of Dr. Michael Hsieh (Program Manager, DARPA).