Past, Present and Future of Effective C

Dennis Ritchie and Ken Thompson invented the C Programming Language at Bell Telephone Laboratories  in 1972 [Ritchie 1993]. The C Language is a highly successful system programming language that can work with a wide range of computing hardware and architectures. Nearly 50 years later, C remains as vital and popular as ever.

System languages are designed for performance and ease of access to the underlying hardware while providing high-level programming features. While other languages offer newer language features, their compilers and libraries are often written in C. C is layered directly on top of the hardware, making it more sensitive to evolving hardware features, such as vectorized instructions, than higher-level languages that usually rely on C for their efficiency.

In this short blog post, I describe the history of the C language and C language standardization, survey some of the exciting innovations being worked on today particularly with respect to security, and describe some C language and library features that are on the radar for C23. I also describe why I wrote Effective C: An Introduction to Professional C Programming [Seacord 2020], who I wrote it for, and how you might benefit from reading it.

1969-1972

C is used for a wide variety of applications. Development of the UNIX operating system development started in 1969, and its code was rewritten in C in 1972. Most operating systems—including the Microsoft Windows kernel, Linux, the macOS kernel and the iOS, Android, and Windows Phone kernels are largely written in C. Most embedded system and Internet-of-Things (IoT) devices are programmed in C. Many of the Google open source community’s 2,000-plus projects are written in C or C++ as are most desktop applications.

1978-1985

Brian Kernighan and Dennis Ritchie published the C Programming Language [Kernighan 78] on February 22, 1978. Frequently referred to as K&R C (after the authors), this was the first widely available book on the subject.

By 1982 it was clear that C needed formal standardization. The first edition of K&R was the best approximation of a standard, but it no longer described the language in actual use. For example, it did not mention either the void nor enum types. ANSI established the X3J11 committee in the summer of 1983, with the goal of producing a C standard. From the beginning, the X3J11 committee took a cautious, conservative view of language extensions [Ritchie 93]. Their goal was: “to develop a clear, consistent, and unambiguous Standard for the C programming language which codifies the common, existing definition of C and which promotes the portability of user programs across C language environments.”

I bought my first copy of K&R in 1985 when I was learning to program in C. I was part of a team using C to develop the Interactive Context Editor (ICE) at IBM Kingston in the mid-1980s. ICE supported syntactic-based editing of the Software Engineering Design Language (SEDL); a design language based on Ada.

1988-1999

In April 1988, the second edition of the K&R book was published, updated to cover the then-new ANSI C standard, particularly with the inclusion of reference material on the standard library. I bought a copy of this edition as well, but it was never as well-thumbed as my original.

X3J11 produced its report [ANSI 89] at the end of 1989 and it was ratified as ANSI X3.159-1989, “Programming Language C.” This 1989 version is referred to as ANSI C or C89. In 1990, the ANSI C Standard was adopted (unchanged) by a joint technical committee of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC) and published as the first edition of the C Standard, C90 [ISO/IEC 9899:1990]. C89/C90 incorporated the types of formal arguments in the type signature of a function (using syntax borrowed from C++), the type qualifiers const and volatile, and slightly different type promotion rules.

C continued to improve both in practice and in its formalization in the C Standard. ISO/IEC published the second edition of the C Standard (C99) in 1999 [ISO/IEC 9899:1999]. C99introduced restricted pointers, variable length arrays, mixed declarations and statements, the long long int type, and designated initializers (among other language and library features) and removed some things too, such as implicit int.

2001-2020

According to the TIOBE index, C has been either the first or second-most popular programming language since 2001; it was also TIOBE’s programming language of the year in 2019. C has remained successful by staying true to its principles. C is meant to be a small, simple language. Consequently, the language strives to provide only a single way to perform any operation (known also as conservation of mechanism). C programs should be fast, even if they are not guaranteed to be portable and the language attempts to stay out of the way of the programmer, and not prevent them from doing what needs to be done. More recently, C has worked towards making support for safety and security demonstrable.

In 2003, I joined the CERT division of the SEI at Carnegie Mellon University and began work on Secure Coding in C and C++ [Seacord 2013]. During this time, I worked with Randy Meyers (the INCITS J11 chair) to improve the specification for the bounds-checked interfaces (now Annex K of the C Standard). I was subsequently invited to attend the C Standards meeting in Lillehammer, Norway in the Spring of 2005. I’ve participated in the WG14 committee as an expert ever since. I completed the first edition of Secure Coding in C and C++ soon afterwards, followed a few years later by the first edition of the CERT C Secure Coding Standard [Seacord 2014].

ISO/IEC published the third edition of the C Standard (C11) in 2011 [ISO/IEC 9899:2011]. C11 introduced support for multiple threads of execution including an improved memory sequencing model, atomic objects, and thread-local storage. C11 also added conditional support for bounds-checking interfaces to help with security, and static assertions. C11 also removed the insecure gets function.

The latest version of the C Standard (as of this writing) is the fourth edition, published in 2018 as C17 [ISO/IEC 9899:2018]. It only consists of technical corrections and clarifications, so it is largely the same as C11 but with fewer defects.

Having established a reputation as a C language expert because of the C Language books I had published, my Secure Coding in C and C++ course, and my long-time participation on the C Standards Committee, I have often been asked for recommendations. Many programming books are fairly amateurish, so my tendency was to recommend K&R although it was becoming increasingly obsolete. When Bill Pollack approached me at the REcon conference in Montreal with the idea of writing Effective C, I could see the need for an introduction to the C language that was accessible to anyone who wanted to learn C programming, but not oversimplified in a manner that promulgates the development of incorrect and insecure code.

I could see the need for an introduction to the C language that was accessible to anyone who wanted to learn C programming, but not oversimplified in a manner that promulgates the development of incorrect and insecure code.

Robert C. Seacord

2020-

A major revision referred to as C23 is currently under development. The goal of the C Standards Committee is not to innovate but to standardize on existing practice. Before new features are incorporated into the language and library, there has to be sufficient implementation experience to show that these features are being successfully incorporated by C language programmers and that they their benefits outweigh their costs.

Innovations can come from academia and industry. Peter Sewell is a Professor of Computer Science at the University of Cambridge Computer Laboratory. Peter has done significant work along with collaborator Kayvan Memarian in exploring C semantics and pointer provenance [Memarian 2019]. Intel has developed a set of special integer types spelled as _ExtInt(N), where N is an integral constant expression representing the number of bits to be used to represent the type [Ballman 2020b]. Aaron Ballman labored tirelessly to introduce attributes to C23.  Attributes are a mechanism by which the developer can attach extra information to language entities with a generalized syntax, instead of introducing new syntactic constructs or keywords for each feature [Ballman 2019]. This work in turn is based on implementation experience using the Microsoft __declspec and GNU __attribute__ features.  All of this work shows how the C language continues to evolve and improve, albeit in a steady and deliberate fashion.

One new proposal for C23, “Defer Mechanism for C” is a collaborative proposal between several researchers and members of the C Standards Committee [Ballman 2020a]. This proposal describes an attempt to adopt the defer statement from Go programming language to the C language. This involves the introduction of the defer statement, guard statement, a panic method, and recover function. A defer statement defers the execution of a deferred statement until the containing guarded block terminates. A defer statement is associated with its nearest enclosing guarded block, and its deferred statement is sequenced in last-in-first-out order after all statements that are contained in that guarded block and before the guarded block itself terminates. A guard statement indicates that any deferred statements within the guarded block are executed just before the guarded block terminates. The compound statement of a guard statement is called a guarded block.

The following code fragment:

 guard {
  void * const p = malloc(25);
  if (!p) break;
  defer {
    free(p);
  }
 
  void * const q = malloc(25);
  if (!q) break;
  defer {
    free(q);
  }
 
  if (something_bad) {
    break; // initiates execution of deferred statements
  }
  ...
}

is equivalent to:

{
  void * const p = malloc(25);
  if (!p) goto GUARD_END;
  if (false) {
	DEFER0_START:;
	free(p);
	goto GUARD_END;
  }
 
  void * const q = malloc(25);
  if (!q) goto DEFER0_START;
  if (false) {
	DEFER1_START:;
	free(q);
	goto DEFER0_START;
  }
 
  if (something_bad) {
	goto DEFER1_START;
  }
  ...
  goto DEFER1_START;
  GUARD_END:;
 }

Here, the different break statements that end the execution of the guarded block take the different defer statements as they were met into account. The first, without any defer so far, just leaves the block, the second jumps to the first deferred statement and the third break jumps to the second deferred statement and continues with the first. Thereby, the cleanup code for the two memory allocations is executed only as necessary.

The panic macro is called to indicate an abnormal execution condition. It triggers the execution of all active deferred statements of the current thread in the reverse order they are encountered.  A panic can be recovered by a call to the recover function within the deferred statements.

The defer mechanism is mean to address several shortcomings in the C language. The C language lacks a generic mechanism for releasing allocated resources. Various techniques are used in practice, including the use of a goto chain [Effective C].

A reference implementation is available at https://gitlab.inria.fr/gustedt/defer.

The defer proposal is just that—a proposal. It still needs to go through review and likely further evolution, after which it may or may not be adopted by the C Standards committee for inclusion in C23. However, the reference implementation is real and can be used today. An important consideration in the adoption of new features for the C language is developer experience using these features, so please download and try the reference implementation, and let us know what you think.

Future

C has been around and successful for a long, long  time, and there is no indication this is going to change any time soon. Memory safe languages such as Go and Rust are gaining in popularity, but combined, still have less than 2% of the marketplace. C has a considerable advantage in existing code and compiler support for a wide variety of architectures and embedded platforms.

There are still significant ways in which the C language, library, and ecosystem can be improved. There is still a considerable amount of work required to more precisely define the behavior for parallel execution in C. One effort I was intimately involved with was to develop a coding standard that addressed both security and safety. Somewhat surprisingly, these two communities appear to lack common ground although there is clearly a need to build systems which are both safe and secure. Another effort that sputtered out in a study group meant to produce extensions to the C language to simplify writing parallel programs. Such an extension could be valuable if it can simplify the development of parallel programs and reduce data races.  

The future evolution of the C language requires maintaining a balance between preserving existing code and the adoption of modern language features that can be beneficially adopted to the C language domain.

References

[Ballman 2019] Aaron Ballman. N2335 Attributes in C. March, 2019. http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2335.pdf

[Ballman 2020a] Aaron Ballman, Alex Gilding, Jens Gustedt, Robert C. Seacord, Martin Uecker, Freek Wiedijk, Radboud Universiteit Nijmegen. WG14 n2542 Defer Mechanism for C. 2020-07-19

[Ballman 2020b] Aaron Ballman, Melanie Blower, Tommy Hoffner, Erich Keane. N2590 Adding a Fundamental Type for N-bit integers. 2020-10-30.

[Kernighan 1978] B. W. Kernighan and D. M. Ritchie. 1978. The C programming language. Prentice-Hall, Inc., USA.

[Kernighan 1988] Kernighan, Brian; Ritchie, Dennis M. (March 1988). The C Programming Language (2nd ed.). Englewood Cliffs, NJ: Prentice Hall. ISBN 0-13-110362-8.

[ISO/IEC 9899:1990] ISO/IEC. 1990. “Programming Languages—C,” 1st ed. ISO/IEC 9899:1990.

[ISO/IEC 9899:1999] ISO/IEC. 1999. “Programming Languages—C,” 2nd ed. ISO/IEC 9899:1999.

[ISO/IEC 9899:2011] ISO/IEC. 2011. “Programming Languages—C,” 3rd ed. ISO/IEC 9899:2011.

[ISO/IEC 9899:2018] ISO/IEC. 2018. “Programming Languages—C,” 4th ed. ISO/IEC 9899:2018.

[Memarian 2019] Kayvan Memarian, Victor B. F. Gomes, Brooks Davis, Stephen Kell, Alexander Richardson, Robert N. M. Watson, and Peter Sewell. 2019. Exploring C semantics and pointer provenance. Proc. ACM Program. Lang. 3, POPL, Article 67 (January 2019), 32 pages. DOI:https://doi.org/10.1145/3290380

[Ritchie 1993] Dennis M. Ritchie. 1993. The development of the C language. In The second ACM SIGPLAN conference on History of programming languages (HOPL-II). Association for Computing Machinery, New York, NY, USA, 201–208. DOI:https://doi.org/10.1145/154766.155580 

[Seacord 2013] Seacord, Robert C. 2013. Secure Coding in C and C++, 2nd ed. Boston: Addison-Wesley Professional.

[Seacord 2014] Seacord, Robert C. 2014. The CERT C Coding Standard: 98 Rules for Developing Safe, Reliable, and Secure Systems, 2nd ed. Boston: Addison-Wesley Professional.

[Seacord 2020] Robert C. Seacord. Effective C: An Introduction to Professional C Programming. August 2020. ISBN-13: 9781718501041.