Christian S Collberg
- Professor, Computer Science
- Member of the Graduate Faculty
Contact
- (520) 621-6612
- Gould-Simpson, Rm. 758
- Tucson, AZ 85721
- collberg@cs.arizona.edu
Degrees
- Ph.D. Computer Science
- Lund University, Lund, Sweden
Interests
No activities entered.
Courses
2024-25 Courses
-
Reverse Engineering
CSC 465 (Spring 2025)
2023-24 Courses
-
Special Topics in Comp Science
CSC 496 (Spring 2024) -
Honors Thesis
CSC 498H (Fall 2023)
2022-23 Courses
-
Directed Research
CSC 492 (Spring 2023) -
Honors Thesis
CSC 498H (Spring 2023) -
Research
CSC 900 (Fall 2022)
2021-22 Courses
-
Dissertation
CSC 920 (Spring 2022) -
Dissertation
CSC 920 (Fall 2021)
2020-21 Courses
-
Computer Security
CSC 466 (Spring 2021) -
Dissertation
CSC 920 (Spring 2021) -
Computer Security
CSC 466 (Fall 2020) -
Computer Security
CSC 566 (Fall 2020) -
Dissertation
CSC 920 (Fall 2020)
2019-20 Courses
-
Compar Programming Lang
CSC 372 (Spring 2020) -
Directed Research
CSC 492 (Spring 2020) -
Dissertation
CSC 920 (Spring 2020) -
Computer Security
CSC 466 (Fall 2019) -
Computer Security
CSC 566 (Fall 2019) -
Research
CSC 900 (Fall 2019)
2018-19 Courses
-
Compar Programming Lang
CSC 372 (Spring 2019) -
Directed Research
CSC 492 (Spring 2019) -
Research
CSC 900 (Spring 2019) -
Computer Security
CSC 466 (Fall 2018) -
Computer Security
CSC 566 (Fall 2018) -
Directed Research
CSC 492 (Fall 2018) -
Research
CSC 900 (Fall 2018)
2017-18 Courses
-
Adv Tpcs:Doctoral Colloq
CSC 695C (Spring 2018) -
Directed Research
CSC 392 (Spring 2018) -
Adv Tpcs:Doctoral Colloq
CSC 695C (Fall 2017) -
Compar Programming Lang
CSC 372 (Fall 2017) -
Computer Security
CSC 466 (Fall 2017) -
Computer Security
CSC 566 (Fall 2017)
2016-17 Courses
-
Independent Study
CSC 599 (Summer I 2017) -
Adv Tpcs:Doctoral Colloq
CSC 695C (Spring 2017) -
Computer Security
CSC 466 (Spring 2017) -
Adv Tpcs:Doctoral Colloq
CSC 695C (Fall 2016) -
Computer Security
CSC 466 (Fall 2016) -
Computer Security
CSC 566 (Fall 2016) -
Independent Study
CSC 399 (Fall 2016)
2015-16 Courses
-
Adv Tpcs:Doctoral Colloq
CSC 695C (Spring 2016) -
Computer Security
CSC 466 (Spring 2016)
Scholarly Contributions
Journals/Publications
- Collberg, C., & Proebsting, T. A. (2016). Repeatability in computer systems research. Communications of the ACM, 59(3), 62--69.
- Collberg, C., Gibson, A., Martin, S., Shinde, N., Herzberg, A., & Shulman, H. (2013). Provenance of exposure: Identifying sources of leaked documents. 2013 IEEE Conference on Communications and Network Security, CNS 2013, 367-368.More infoAbstract: We design a provenance system for documents on clouds. The system allows writing documents by several collaborating individuals. Provenance allows recovery of information about the sequence of significant events relevant to the documents. Existing provenance systems focus on editing events, such as creation or removal of document parts. In this work, we introduce provenance of exposure events, allowing identification of one, or more, individuals which are possible sources of the exposure to external source of a particular version of documents. Our design provides a practical solution for provenance of documents via not-fully-trusted cloud systems, with support for provenance of both exposure and editing events. © 2013 IEEE.
- Collberg, C., Nagra, J., & Wang, F. (2013). Surreptitious software: Models from biology and history. Communications in Computer and Information Science, 374(PART II), 1-21.More infoAbstract: Over the last decade a bewildering array of techniques have been proposed to protect software from piracy, malicious reverse engineering, and tampering. While we can broadly classify these techniques as obfuscation, watermarking/fingerprinting, birthmarking, and tamper-proofing there is a need for a more constructive taxonomy. In this paper we present a model of Surreptitious Software techniques inspired by defense mechanisms found in other areas: we will look at the way humans have historically protected themselves from each other and from the elements, how plants and animals have evolved to protect themselves from predators, and how secure software systems have been architected to protect against malicious attacks. In this model we identify a set of primitives which underlie many protection schemes. We propose that these primitives can be used to characterize existing techniques and can be combined to construct novel schemes which address a specific set of protective requirements. © Springer-Verlag Berlin Heidelberg 2007.
- Collberg, C., Martin, S., Myers, J., & Nagra, J. (2012). Distributed application tamper detection via continuous software updates. ACM International Conference Proceeding Series, 319-328.More infoAbstract: We present a new general technique for protecting clients in distributed systems against Remote Man-at-the-end (R-MATE) attacks. Such attacks occur in settings where an adversary has physical access to an untrusted client device and can obtain an advantage from tampering with the hardware itself or the software it contains. In our system, the trusted server overwhelms the analytical abilities of the untrusted client by continuously and automatically generating and pushing to him diverse client code variants. The diversity subsystem employs a set of primitive code transformations that provide an ever-changing attack target for the adversary, making tampering difficult without this being detected by the server. Copyright 2012 ACM.
- Collberg, C., Davidson, J., Giacobazzi, R., Gu, Y. X., Herzberg, A., & Wang, F. (2011). Toward digital asset protection. IEEE Intelligent Systems, 26(6), 8-13.More infoAbstract: The goal of software protection (SP) research is to make software safe from malicious attacks by preventing adversaries from tampering, reverse engineering, and illegally redistributing software. The Digital Asset Protection Association (DAPA) was launched in July 2011 to address the challenges of such attacks and SP research in general. As DAPA activities and efforts get underway, the ultimate goal is to establish standards and baseline definitions for SP research and to promote coordinated, open efforts among academia and industry. © 2006 IEEE.
- Falcarin, P., Collberg, C., Atallah, M., & Jakubowski, M. (2011). Guest editors' introduction: Software protection. IEEE Software, 28(2), 24-27.More infoAbstract: Software protection is increasingly becoming an important requirement for industrial software development, especially when building systems for military defense, national infrastructure, and medical informatics. Every software vendor should be aware of the potential for attacks against its products and the techniques available to mitigate these attacks. Employing software protection techniques can mean the difference between business survival and failure. The guest editors present this special issue on new tools and techniques for software protection. © 2006 IEEE.
- Ceccato, M., Preda, M. D., Nagra, J., Collberg, C., & Tonella, P. (2009). Trading-off security and performance in barrier slicing for remote software entrusting. Automated Software Engineering, 16(2), 235-261.More infoAbstract: Network applications often require that a trust relationship is established between a trusted host (e.g., the server) and an untrusted host (e.g., the client). The remote entrusting problem is the problem of ensuring the trusted host that whenever a request from an untrusted host is served, the requester is in a genuine state, unaffected by malicious modifications or attacks. Barrier slicing helps solve the remote entrusting problem. The computation of the sensitive client state is sliced and moved to the server, where it is not possible to tamper with it. However, this solution might involve unacceptable computation and communication costs for the server, especially when the slice to be moved is large. In this paper, we investigate the trade-off between security loss and performance overhead associated with moving only a portion of the barrier slice to the server and we show that this trade-off can be reduced to a multi-objective optimization problem. We describe how to make decisions in practice with reference to a case study, for which we show how to choose among the alternative options. © 2009 Springer Science+Business Media, LLC.
- Collberg, C., Huntwork, A., Carter, E., Townsend, G., & Stepp, M. (2009). More on graph theoretic software watermarks: Implementation, analysis, and attacks. Information and Software Technology, 51(1), 56-67.More infoAbstract: This paper presents an implementation of the watermarking method proposed by Venkatesan et al. in their paper [R. Venkatesan, V. Vazirani, S. Sinha, A graph theoretic approach to software watermarking, in: Fourth International Information Hiding Workshop, Pittsburgh, PA, 2001]. An executable program is marked by the addition of code for which the topology of the control-flow graph encodes a watermark. We discuss issues that were identified during construction of an actual implementation that operates on Java bytecode. We present two algorithms for splitting a watermark number into a redundant set of pieces and an algorithm for turning a watermark number into a control-flow graph. We measure the size and time overhead of watermarking, and evaluate the algorithm against a variety of attacks. © 2008 Elsevier B.V. All rights reserved.
- Zhang, C., Wang, J., Thomborson, C., Wang, C., & Collberg, C. (2009). A semi-dynamic multiple watermarking scheme for java applications. Proceedings of the ACM Conference on Computer and Communications Security, 59-71.More infoAbstract: Software protection and security has been a more and more important issue. In order to prevent software from unauthorized use and modification, a great many techniques have been proposed and developed. In this paper, we address this issue through a prevention technique called software watermarking, and we propose a novel software watermarking scheme, which can embed multiple non-interfering watermarks into the same program. Unlike published schemes, this scheme encodes the watermark into mapping functions and then embeds the mapping codes, which are generated from these functions, into the program at the articulation points of its control flow graph. The extraction in this scheme, which is based on dynamically loading a reconstructed program to recover the watermark, is also a novel approach to the software watermarking field. Experimental results indicate that the size and performance overheads caused by this scheme can keep steady. Copyright 2009 ACM.
- Camenisch, J., Collberg, C., Johnson, N. F., & Sallee, P. (2007). Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics: Preface. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4437 LNCS, v.
- Ceccato, M., Preda, M. D., Nagra, J., Collberg, C., & Tonella, P. (2007). Barrier slicing for remote software trusting. SCAM 2007 - Proceedings 7th IEEE International Working Conference on Source Code Analysis and Manipulation, 27-36.More infoAbstract: Remote trusting aims at verifying the "healthy" execution of a program running on an untrusted client that communicates with a trusted server via network connection. After giving a formal definition of the remote trusting problem and a test to determine whether an attack against a given remote trusting scheme is successful or not, we propose a protection against malicious modification of the client code, based on the replication of a portion of the client on the server. To minimize the size of the code that is replicated, we propose to use barrier slicing. We show the feasibility of our approach on a case study. Our results indicate that a barrier slice is significantly smaller than the corresponding backward slice while providing the same level of protection. © 2007 IEEE.
- Collberg, C. S., Thomborson, C., & Townsend, G. M. (2007). Dynamic graph-based software fingerprinting. ACM Transactions on Programming Languages and Systems, 29(6).More infoAbstract: Fingerprinting embeds a secret message into a cover message. In media fingerprinting, the secret is usually a copyright notice and the cover a digital image. Fingerprinting an object discourages intellectual property theft, or when such theft has occurred, allows us to prove ownership. The Software Fingerprinting problem can be described as follows. Embed a structure W into a program P such that: W can be reliably located and extracted from P even after P has been subjected to code transformations such as translation, optimization and obfuscation; W is stealthy; W has a high data rate; embedding W into P does not adversely affect the performance of P; and W has a mathematical property that allows us to argue that its presence in P is the result of deliberate actions. In this article, we describe a software fingerprinting technique in which a dynamic graph fingerprint is stored in the execution state of a program. Because of the hardness of pointer alias analysis such fingerprints are difficult to attack automatically. © 2007 ACM.
- Collberg, C., Myles, G., & Stepp, M. (2007). An empirical study of Java bytecode programs. Software - Practice and Experience, 37(6), 581-641.More infoAbstract: We present a study of the static structure of real Java bytecode programs. A total of 1132 Java jar-files were collected from the Internet and analyzed. In addition to simple counts (number of methods per class, number of bytecode instructions per method, etc.), structural metrics such as the complexity of control-flow and inheritance graphs were computed. We believe this study will be valuable In the design of future programming languages and virtual machine instruction sets, as well as In the efficient implementation of compilers and other language processors. Copyright © 2006 John Wiley & Sons, Ltd.
- Collberg, C., Nagra, J., & Wang, F. (2007). Surreptitious software: Models from Biology and History. Communications in Computer and Information Science, 1, 1-21.More infoAbstract: Over the last decade a bewildering array of techniques have been proposed to protect software from piracy, malicious reverse engineering, and tampering. While we can broadly classify these techniques as obfuscation, watermarking/fingerprinting, birthmarking, and tamperproofing there is a need for a more constructive taxonomy. In this paper we present a model of Surreptitious Software techniques inspired by defense mechanisms found in other areas: we will look at the way humans have historically protected themselves from each other and from the elements, how plants and animals have evolved to protect themselves from predators, and how secure software systems have been architected to protect against malicious attacks. In this model we identify a set of primitives which underlie many protection schemes. We propose that these primitives can be used to characterize existing techniques and can be combined to construct novel schemes which address a specific set of protective requirements. © Springer-Verlag Berlin Heidelberg 2007.
- Myles, G., & Collberg, C. (2006). Software watermarking via opaque predicates: Implementation, analysis, and attacks. Electronic Commerce Research, 6(2), 155-171.More infoAbstract: Within the software industry software piracy is a great concern. In this article we address this issue through a prevention technique called software watermarking. Depending on how a software watermark is applied it can be used to discourage piracy; as proof of authorship or purchase; or to track the source of the illegal redistribution. In particular we analyze an algorithm originally proposed by Geneviève Arboit in A Method for Watermarking Java Programs via Opaque Predicates. This watermarking technique embeds the watermark by adding opaque predicates to the application. We have found that the Arboit technique does withstand some forms of attack and has a respectable data-rate. However, it is susceptible to a variety of distortive attacks. One unanswered question in the area of software watermarking is whether dynamic algorithms are inherently more resilient to attacks than static algorithms. We have implemented and empirically evaluated both static and dynamic versions within the SandMark framework. © Springer Science + Business Media, LLC 2006.
- Stepp, M., & Collberg, C. (2006). Browser Toolbars. Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft, 493-521.
- Ben-Amram, A., Collberg, C., & Kobourov, S. (2005). Rights and wrongs in scientific publications (multiple letters) [1]. Communications of the ACM, 48(8), 11-.
- Collberg, C., & Kobourov, S. (2005). Self-plagiarism in computer science. Communications of the ACM, 48(4), 88-94.More infoAbstract: The results of an automated search for self-plagiarism in computer science are discussed. Self-plagiarism occurs when authors reuse portions of their previous writings in subsequent research papers. The self-plagiarism can give the idea to the public that, research dollars are spent on rehashing old results rather than an original research, and it can indicate to the colleagues that academic dishonesty is not a big problem. The steps considered to solve the self-plagiarism problem includes the tools such as moss or Glatt Plagiarism services used by the professors to detect plagiarism among students.
- Collberg, C., & Sahoo, T. R. (2005). Software watermarking in the frequency domain: Implementation, analysis, and attacks. Journal of Computer Security, 13(5), 721-755.More infoAbstract: In this paper we analyze the SHKQ software watermarking algorithm, originally due to Stern, Hachez, Koeune and Quisquater. The algorithm has been implemented within the SANDMARK framework, a system designed to allow effective study of software protection algorithms (such as code obfuscation, software watermarking, and code tamper-proofing) targeting Java bytecode. The SHKQ algorithm embeds a watermark in a program using a spread spectrum technique. The idea is to spread the watermark over the entire application by modifying instruction frequencies. Spreading the watermark over the code provides a high level of stealth and some manner of resilience against attack. In this paper we describe the implementation of the SHKQ algorithm, in particular the issues that arise when targeting Java bytecodes. We then present an empirical examination of the robustness of the watermark against a wide variety of attacks. We conclude that SHKQ, while stealthy, is easily attacked by simple distortive transformations. © 2005 - IOS Press and the authors. All rights reserved.
- Myles, G., & Collberg, C. (2005). K-gram software birthmarks. Proceedings of the ACM Symposium on Applied Computing, 1, 314-318.More infoAbstract: Software birthmarking relies on unique characteristics that are inherent to a program to identify the program in the event of suspected theft. In this paper we present and empirically evaluate a novel birthmarking technique which uniquely identifies a program through instruction sequences. To evaluate the strength of the birthmarking technique we examine two properties: credibility and resilience to semantics-preserving transformations. We show that the technique provides both high credibility and resilience. Additionally, it complements previously proposed static birthmarking techniques. Copyright 2005 ACM.
- Myles, G., Collberg, C., Heidepriem, Z., & Navabi, A. (2005). The evaluation of two software watermarking algorithms. Software - Practice and Experience, 35(10), 923-938.More infoAbstract: In this paper we analyze the effectiveness of two different software watermarking algorithms. The first is an algorithm proposed by Akito Monden et al. and the second an algorithm proposed by Robert L. Davidson and Nathan Myhrvold of the Microsoft Corporation. We have implemented these techniques within the SANDMARK framework, a system designed to study the effectiveness of software protection algorithms on Java bytecode. To the best of our knowledge this is the first implementation and empirical evaluation of these algorithms with respect to a set of properties such as bit-rate, stealth, and resilience to attack. We demonstrate through the use of the SANDMARK framework that both of these algorithms have a high bit-rate but are unstealthy and easy to attack. Copyright © 2005 John Wiley & Sons, Ltd.
- Collberg, C. S., & Proebsting, T. A. (2004). Problem identification using program checking. Discrete Applied Mathematics, 144(3), 270-280.More infoAbstract: We describe A λgoVista, a web-based search engine that assists computer scientists find algorithms and implementations that solve specific problems. AλgoVista also allows algorithm designers to advertise their results in a forum accessible to programmers and theoreticians alike. AλgoVista is not keyword based. Rather, users provide inputs⇒output samples that describe the behavior of their needed algorithm. This query-by-example requires no knowledge of specialized terminology - the user only needs an ability to formalize her problem. AλgoVista's search mechanism is based on a novel application of program checking, a technique developed as an alternative to program verification and testing. AλgoVista operates at http://www.algovista.com. © 2004 Elsevier B.V. All rights reserved.
- Collberg, C., Carter, E., Debray, S., Huntwork, A., & Stepp, M. (2004). Dynamic path-based software watermarking. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 107-118.More infoAbstract: Software watermarking is a tool used to combat software piracy by embedding identifying information into a program. Most existing proposals for software watermarking have the shortcoming that the mark can be destroyed via fairly straightforward semantics-preserving code transformations. This paper introduces path-based watermarking, a new approach to software watermarking based on the dynamic branching behavior of programs. The advantage of this technique is that error-correcting and tamper-proofing techniques can be used to make path-based watermarks resilient against a wide variety of attacks. Experimental results, using both Java bytecode and IA-32 native code, indicate that even relatively large watermarks can be embedded into programs at modest cost.
- Collberg, C., Carter, E., Debray, S., Huntwork, A., Kacecioglu, J., Linn, C., & Stepp, M. (2004). Dynamic path-based software watermarking. ACM SIGPLAN Notices, 39(6), 107-118.More infoAbstract: Software watermarking is a tool used to combat software piracy by embedding identifying information into a program. Most existing proposals for software watermarking have the shortcoming that the mark can be destroyed via fairly straightforward semantics-preserving code transformations. This paper introduces path-based watermarking, a new approach to software watermarking based on the dynamic branching behavior of programs. The advantage of this technique is that error-correcting and tamper-proofing techniques can be used to make path-based watermarks resilient against a wide variety of attacks. Experimental results, using both Java bytecode and IA-32 native code, indicate that even relatively large watermarks can be embedded into programs at modest cost.
- Collberg, C., Huntwork, A., Carter, E., & Townsend, G. (2004). Graph Theoretic Software Watermarks: Implementation, Analysis, and Attacks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3200, 192-207.More infoAbstract: This paper presents an implementation of the novel watermarking method proposed by Venkatcsan, Vazirani, and Sinha in their recent paper A Graph Theoretic Approach to Software Watermarking. An executable program is marked by the addition of code for which the topology of the control-flow graph encodes a watermark. We discuss issues that were identified during construction of an actual implementation that operates on Java bytecode. We measure the size and time overhead of watermarking, and evaluate the algorithm against a variety of attacks. © Springer-Verlag 2004.
- Collberg, C., Kobourov, S. G., & Westbrook, S. (2004). AlgoVista: An algorithmic search tool in an educational setting. SIGCSE Bulletin (Association for Computing Machinery, Special Interest Group on Computer Science Education), 36(1), 462-466.More infoAbstract: AλgoVista is a web-based search engine that assists programmers to find algorithms and implementations that solve specific problems. The search engine is not keyword based but rather requires users to provide (input ⇒ output) samples that describe the behavior of their needed algorithm. The system is easy to use. To search for a particular algorithm or classify a combinatorial structure a user simply draws the query in a drawing pane on a web browser. The result of the search is a list of links to web resources describing or providing implementations of the algorithm. AλgoVista has many interesting applications in an educational setting. The search engine can help research students classify obscure problems and locate algorithms that would otherwise be hard to find in textbooks. Students can also add calls in their own programs to AλgoVista's database of executable problem specifications in order to dynamically check the correctness of their programs. Finally, instructors can use AλgoVista to set novel assignments in algorithms and data structures classes. This paper briefly describes AλgoVista and reports on its use in two algorithms and theory classes, one at the undergraduate and one at the graduate level. Copyright 2004 ACM.
- Collberg, C., Kobourov, S. G., & Westbrook, S. (2004). Algovista: An algorithmic search tool in an educational setting. Proceedings of the SIGCSE Technical Symposium on Computer Science Education, 462-466.More infoAbstract: AλgoVista is a web-based search engine that assists programmers to find algorithms and implementations that solve specific problems. The search engine is not keyword based but rather requires users to provide (input ⇒ output) samples that describe the behavior of their needed algorithm. The system is easy to use. To search for a particular algorithm or classify a combinatorial structure a user simply draws the query in a drawing pane on a web browser. The result of the search is a list of links to web resources describing or providing implementations of the algorithm. AλgoVista has many interesting applications in an educational setting. The search engine can help research students classify obscure problems and locate algorithms that would otherwise be hard to find in textbooks. Students can also add calls in their own programs to AλgoVista's database of executable problem specifications in order to dynamically check the correctness of their programs. Finally, instructors can use AλgoVista to set novel assignments in algorithms and data structures classes. This paper briefly describes AλgoVista and reports on its use in two algorithms and theory classes, one at the undergraduate and one at the graduate level.
- Heffner, K., & Collberg, C. (2004). The obfuscation executive. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3225, 428-440.More infoAbstract: Code obfuscations are semantics-preserving code transformations used to protect a program from reverse engineering. There is generally no expectation of complete, long-term, protection. Rather, there is a trade-off between the protection afforded by an obfuscation (i.e. the amount of resources an adversary has to expend to overcome the layer of confusion added by the transformation) and the resulting performance overhead. In this paper we examine the problems that arise when constructing an Obfuscation Executive. This is the main loop in charge of a) selecting the part of the application to be obfuscated next, b) choosing the best transformation to apply to this part, c) evaluating how much confusion and overhead has been added to the application, and d) deciding when the obfuscation process should terminate. © Springer-Verlag 2004.
- Myles, G., & Collberg, C. (2004). Detecting software theft via whole program path birthmarks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3225, 404-415.More infoAbstract: A software birthmark is a unique characteristic of a program that can be used as a software theft detection technique. In this paper we present and empirically evaluate a novel birthmarking technique - Whole Program Path Birthmarking -which uniquely identifies a program based on a complete control flow trace of its execution. To evaluate the strength of the proposed technique we examine two important properties: credibility and tolerance against program transformations such as optimization and obfuscation. Our evaluation demonstrates that, for the detection of theft of an entire program, Whole Program Path birthmarks are more resilient to attack than previously proposed techniques. In addition, we illustrate several instances where a birthmark can be used to identify program theft even when an embedded watermark was destroyed by program transformation. software piracy, copyright protection, software birthmark. © Springer-Verlag 2004.
- Myles, G., & Collberg, C. (2004). Software watermarking through register allocation: Implementation, analysis, and attacks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2971, 274-293.More infoAbstract: In this paper we explore the application of the QP water-marking algorithm proposed by G. Qu and M. Potkonjak to software watermarking. The algorithm was originally proposed as a technique for watermarking the graph coloring problem which can be applied to a variety of media such as FPGA designs and software through register allocation. We implemented the algorithm within the SandMark framework, a system that allows the study of watermarking, tamper-proofing, and obfuscation algorithms for Java bytecode. Through the use of this framework we were able to perform an empirical evaluation of the algorithm. In particular we demonstrate that the use of register allocation, while incurring no performance overhead and being stealthy, is in fact vulnerable to attacks such as decompile/recompile. We also demonstrate that the QP algorithm does not allow for accurate watermark recognition without significant modifications. © Springer-Verlag 2004.
- Collberg, C., Kobourov, S., Carter, E., & Thomborson, C. (2003). Error-correcting graphs for software watermarking. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2880, 156-167.More infoAbstract: In this paper, we discuss graph-theoretic approaches to software watermarking and fingerprinting. Software watermarking is used to discourage intellectual property theft and software fingerprinting is used to trace intellectual property copyright violations. We focus on two algorithms that encode information in software through the use of graph structures. We then consider the different attack models intended to disable the watermark while not affecting the correctness or performance of the program. Finally, we present several classes of graphs that can be used for watermarking and fingerprinting and analyze their properties (resiliency, data rate, performance, and stealthiness). © Springer-Verlag Berlin Heidelberg 2003.
- Collberg, C., Kobourov, S., Carter, E., & Thomborson, C. (2003). Graph-Based Approaches to Software Watermarking. LECTURE NOTES IN COMPUTER SCIENCE, 156-167.
- Collberg, C., Kobourov, S., Nagra, J., Pitts, J., & Wampler, K. (2003). A System for Graph-Based Visualization of the Evolution of Software. Proceedings of ACM Symposium on Software Visualization, 77-86.More infoAbstract: We describe GEVOL, a system that visualizes the evolution of software using a novel graph drawing technique for visualization of large graphs with a temporal component. GEVOL extracts information about a Java program stored within a CVS version control system and displays it using a temporal graph visualizer. This information can be used by programmers to understand the evolution of a legacy program: Why is the program structured the way it is? Which programmers were responsible for which parts of the program during which time periods? Which parts of the program appear unstable over long periods of time and may need to be rewritten? This type of information will complement that produced by more static tools such as source code browsers, slicers, and static analyzers.
- Collberg, C., Myles, G., & Huntwork, A. (2003). Sandmark - A tool for software protection research. IEEE Security and Privacy, 1(4), 40-49.More infoAbstract: The effectiveness of software-based methods for software protection from piracy, tampering and reverse engineering by Sandmark tool is discussed. The Sandmark framework is designed for implementation and evaluation of software-based techniques such as software watermarking and code obfuscation. The control flow analysis is used in several Sandmark algorithms in which it is more natural to modify control-flow graph method than its instruction list representation. The aim of Sandmark watermarking module is to develop techniques which determine embedding and recognition algorithms having smallest performance overhead and highest resilence attacks.
- Collberg, C. S. (2002). A fuzzy visual query language for a domain-specific web search engine. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2317 LNAI, 176-190.More infoAbstract: AλgoVista is a web-based search engine that assists programmers to find algorithms and implementations that solve specific problems. AλgoVista is not keyword based but rather requires users to provide - in a very simple textual language - input⇒output samples that describe the behavior of their needed algorithm. Unfortunately, even this simple language has proven too challenging for casual users. To overcome this problem and make AλgoVista more accessible to novice programmers, we are designing and prototyping a visuall anguage for creating AλgoVista queries. Since web users do not have the patience to learn fancy query languages (be they textual or visual), our goal is to make this language and its implementation natural enough to require virtually no explanation or user training. AλgoVista operates at http://algovista.com. © Springer-Verlag 2002.
- Collberg, C. S. (2002). Automatic derivation of compiler machine descriptions. ACM Transactions on Programming Languages and Systems, 24(4), 369-408.More infoAbstract: We describe a method designed to significantly reduce the effort required to retarget a compiler to a new architecture, while at the same time producing fast and effective compilers. The basic idea is to use the native C compiler at compiler construction time to discover architectural features of the new architecture. From this information a formal machine description is produced. Given this machine description, a native code-generator can be generated by a back-end generator such as BEG or burg. A prototype automatic Architecture Discovery Tool (called ADT) has been implemented. This tool is completely automatic and requires minimal input from the user. Given the Internet address of the target machine and the command-lines by which the native C compiler, assembler, and linker are invoked, ADT will generate a BEG machine specification containing the register set, addressing modes, instruction set, and instruction timings for the architecture. The current version of ADT is general enough to produce machine descriptions for the integer instruction sets of common RISC and CISC architectures such as the Sun SPARC, Digital Alpha, MIPS, DEC VAX and Intel x86.
- Collberg, C. S., & Thomborson, C. (2002). Watermarking, tamper-proofing, and obfuscation - Tools for software protection. IEEE Transactions on Software Engineering, 28(8), 735-746.More infoAbstract: We identify three types of attack on the intellectual property contained in software and three corresponding technical defenses. A defense against reverse engineering is obfuscation, a process that renders software unintelligible but still functional. A defense against software piracy is watermarking, a process that makes it possible to determine the origin of software. A defense against tampering is tamper-proofing, so that unauthorized modifications to software (for example, to remove a watermark) will result in nonfunctional code. We briefly survey the available technology for each type of defense.
- Collberg, C. S., Davey, S., & Proebsting, T. A. (2000). Language-agnostic program rendering for presentation, debugging and visualization. IEEE Symposium on Visual Languages, Proceedings, 183-190.More infoAbstract: We describe a language-independent and specification-driven program rendering tool that is able to produce high-quality code renderings of arbitrary complexity. The tool can incorporate arbitrary types of information together with the program code, allowing it to be used for debugging and profiling as well as for producing beautiful renderings of programs for publication. We also present a model for the rendering of programs and apply it to the design of a rendering of Java control flow.
- Collberg, C., & Thomborson, C. (1999). Software watermarking: Models and dynamic embeddings. Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 311-324.More infoAbstract: Watermarking embeds a secret message into a cover message. In media watermarking the secret is usually a copyright notice and the cover a digital image. Watermarking an object discourages intellectual property theft, or when such theft has occurred, allows us to prove ownership. The Software Watermarking problem can be described as follows. Embed a structure W into a program P such that: W can be reliably located and extracted from P even after P has been subjected to code transformations such as translation, optimization and obfuscation; W is stealthy; W has a high data rate; embedding W into P does not adversely affect the performance of P; and W has a mathematical property that allows us to argue that its presence in P is the result of deliberate actions. In the first part of the paper we construct an informal taxonomy of software watermarking techniques. In the second part we formalize these results. Finally, we propose a new software watermarking technique in which a dynamic graphic watermark is stored in the execution state of a program.
- Collberg, C., Thomborson, C., & Low, D. (1998). Breaking abstractions and unstructuring data structures. Proceedings of the IEEE International Conference on Computer Languages, 28-38.More infoAbstract: To ensure platform independence, mobile programs are distributed in forms that are isomorphic to the original source code. Such codes are easy to decompile, and hence they increase the risk of malicious reverse engineering attacks. Code obfuscation is one of several techniques which has been proposed to alleviate this situation. An obfuscator is a tool which - through the application of code transformations - converts a program into an equivalent one that is more difficult to reverse engineer. In a previous paper [5] we have described the design of a control flow obfuscator for Java. In this paper we extend the design with transformations that obfuscate data structures and abstractions. In particular, we show how to obfuscate classes, arrays, procedural abstractions and built-in data types like strings, integers, and booleans.
- Collberg, C., Thomborson, C., & Low, D. (1998). Manufacturing cheap, resilient, and stealthy opaque constructs. Conference Record of the Annual ACM Symposium on Principles of Programming Languages, 184-196.More infoAbstract: It has become common to distribute software in forms that are isomorphic to the original source code. An important example is Java bytecode. Since such codes are easy to decompile, they increase the risk of malicious reverse engineering attacks. In this paper we describe the design of a Java code obfuscator, a tool which - through the application of code transformations - converts a Java program into an equivalent one that is more difficult to reverse engineer. We describe a number of transformations which obfuscate control-flow. Transformations are evaluated with respect to potency (To what degree is a human reader confused?), resilience (How well are automatic deobfuscation attacks resisted?), cost (How much time/space overhead is added?), and stealth (How well does obfuscated code blend in with the original code?). The resilience of many control-altering transformations rely on the resilience of opaque predicates. These are boolean valued expressions whose values are known to the obfuscator but difficult to determine for an automatic deobfuscator. We show how to construct resilient, cheap, and stealthy opaque predicates based on the intractability of certain static analysis problems such as alias analysis.
- Collberg, C. S. (1997). Reverse Interpretation + Mutation Analysis = Automatic Retargeting. SIGPLAN Notices (ACM Special Interest Group on Programming Languages), 32(5), 57-70.More infoAbstract: There are three popular methods for constructing highly retargetable compilers: (1) the compiler emits abstract machine code which is interpreted at run-time, (2) the compiler emits C code which is subsequently compiled to machine code by the native C compiler, or (3) the compiler's code-generator is generated by a back-end generator from a formal machine description produced by the compiler writer. These methods incur high costs at run-time, compile-time, or compiler-construction time, respectively. In this paper we will describe a novel method which promises to significantly reduce the effort required to retarget a compiler to a new architecture, while at the same time producing fast and effective compilers. The basic idea is to use the native C compiler at compiler construction time to discover architectural features of the new architecture. From this information a formal machine description is produced. Given this machine description, a native code-generator can be generated by a back-end generator such as BEG or burg. A prototype Automatic Architecture Discovery Unit has been implemented. The current version is general enough to produce machine descriptions for the integer instruction sets of common RISC and CISC architectures such as the Sun SPARC, Digital Alpha, MIPS, DEC VAX, and Intel x86. The tool is completely automatic and requires minimal input from the user: principally, the user needs to provide the internet address of the target machine and the command-lines by which the C compiler, assembler, and linker are invoked.
- Collberg, C. S. (1997). Reverse interpretation+mutation analysis = automatic retargeting. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 57-70.More infoAbstract: There are three popular methods for constructing highly retargetable compilers: (1) the compiler emits abstract machine code which is interpreted at run-time, (2) the compiler emits C code which is subsequently compiled to machine code by the native C compiler, or (3) the compiler's code-generator is generated by a back-end generator from a formal machine description produced by the compiler writer. These methods incur high costs at run-time, compile-time, or compiler-construction time, respectively. In this paper we will describe a novel method which promises to significantly reduce the effort required to retarget a compiler to a new architecture, while at the same time producing fast and effective compilers. The basic idea is to use the native C compiler at compiler construction time to discover architectural features of the new architecture. From this information a formal machine description is produced. Given this machine description, a native code-generator can be generated by a back-end generator such as BEG or burg. A prototype Automatic Architecture Discovery Unit has been implemented. The current version is general enough to produce machine descriptions for the integer instruction sets of common RISC and CISC architectures such as the Sun SPARC, Digital Alpha, MIPS, DEC VAX, and Intel x86. The tool is completely automatic and requires minimal input from the user: principally, the user needs to provide the internet address of the target machine and the command-lines by which the C compiler, assembler, and linker are invoked.
- Collberg, C. S., & Krampell, M. G. (1987). DESIGN AND IMPLEMENTATION OF MODULAR LANGUAGES SUPPORTING INFORMATION HIDING.. Conference Proceedings - Annual Phoenix Conference, 224-228.More infoAbstract: The design of a module concept for high-level systems programming languages is discussed. It is shown that a modular language that fully supports D. L. Parnas' (1972) principle of information hiding while at the same time maintaining module independence cannot be translated using standard techniques. An alternative transition process is proposed that is based on partial module binding at the intermediate code level and allows the design of modular languages where implementation details of all types of objects exported from a module may be safely hidden. It is shown that this translation process will make important language constructs such as inline subprograms, iterators, and generic units efficiently implementable without introducing any module interdependencies. This in turn will reduce the amount of recompilation necessary when exported objects are changed.
Proceedings Publications
- Collberg, C. (2018, March). Code Obfuscation: Why is This Still a Thing? (Keynote Address). In Proceedings of the Eighth ACM Conference on Data and Application Security and Privacy.
- Banescu, S., Collberg, C. S., & Pretschner, A. (2017, August). Predicting the Resilience of Obfuscated Code Against Symbolic Execution Attacks via Machine Learning. In USENIX Security Symposium.More infoThis is ranked in category Computer Security on CSRankings.This conference is ranked A* in CORE
- Banescu, S., Collberg, C., Ganesh, V., Newsham, Z., & Pretschner, A. (2016). Code obfuscation against symbolic execution attacks. In Proceedings of the 32nd Annual Conference on Computer Security Applications.
- Collberg", &. (2016, may). Engineering Code Obfuscation (Invited Talk). In EUROCRYPT, 2 page abstract for Invited Presentation.
- Taylor, C., & Collberg, C. (2016). A Tool for Teaching Reverse Engineering. In 2016 USENIX Workshop on Advances in Security Education (ASE 16).
- Kanzaki, Y., Monden, A., & Collberg, C. (2015). Code artificiality: A metric for the code stealth based on an n-gram model. In Proceedings of the 1st International Workshop on Software Protection.
- Kanzaki, Y., Thomborson, C., Monden, A., & Collberg, C. (2015). Pinpointing and Hiding Surprising Fragments in an Obfuscated Program. In Proceedings of the 5th Program Protection and Reverse Engineering Workshop.
- Paul, M., Collberg, C., & Bambauer, D. (2015). A Possible Solution for Privacy Preserving Cloud Data Storage. In Cloud Engineering (IC2E), 2015 IEEE International Conference on.
- Chan, P. P., & Collberg, C. (2014). A Method to Evaluate CFG Comparison Algorithms. In Quality Software (QSIC), 2014 14th International Conference on.
- Collberg, C., Gibson, A., Martin, S., Shinde, N., Herzberg, A., & Shulman, H. (2013). Provenance of exposure: Identifying sources of leaked documents. In Communications and Network Security (CNS), 2013 IEEE Conference on.
Presentations
- Collberg, C. S. (2018, March). Tigress: A Source-to-Source-ish Obfuscation Tool (Keynote Address). Software Security, Protection and Reverse Engineering Workshop. Puerto Rico: ACSAC.
- Collberg, C. S. (2017, December). Dare to Share: Risks and Rewards of Artifact Sharing in Computer Science (Invited Essayist Keynote). ACSAC. Orlando, FL: ACM.More infoInvited Essayist Keynote
- Collberg, C. S. (2017, July). Probabilistic Obfuscation through Covert Channels. Dagstuhl. Germany.
- Collberg, C. S. (2017, October). FindResearch.org: How to Encourage Sharing of Research Artifacts. ACM SIGCOMM 2017 Reproducibility Workshop (Reproducibility’17).More infoInvited presentation
- Bambauer, D. E., Collberg, C. S., & Paul, M. (2015, Summer). A Possible Solution for Privacy Preserving Cloud Data Storage. 2015 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E). New York, NY: IC2E.
Others
- Debray, S. K., Collberg, C. S., Debray, S. K., & Collberg, C. S. (2014, June). Code Obfuscation: An Informal Assessment. White paper provided at the request of Dr. Mike Hsieh (Program Manager at DARPA).
- Debray, S. K., Giacobazzi, R., Collberg, C. S., Debray, S. K., Giacobazzi, R., & Collberg, C. S. (2014, June). Strength of Obfuscation. White paper created at the request of Dr. Michael Hsieh (Program Manager, DARPA).