C considered dangerous

45 points by johnramsden 8 years ago · 65 comments

Reader

rwmj 8 years ago

> He asked: why is there no argument to memcpy() to specify the maximum destination length?

I'm confused by this. The third argument provides the destination length, so what good would a "maximum destination length" do? I guess he must mean that because the length is often computed, you'd need a fourth argument to ensure the length isn't greater than some sane upper bound. But you can easily fix that using an if statement around the memcpy.

vardump 8 years ago
Perhaps because the memory buffers might be of different size.
Maybe memcpy_oobp (out of bounds protection) signature could be:
```
  memcpy_oobp(void* dst, size_t dst_size, void* src, size_t src_size);
```
Then again, I guess you could just as well do:
```
  memcpy(dst, src, min(dst_size, src_size));
```
But having to explicitly specify both destination and source sizes might have prevented a lot of buffer overwrite bugs.
- sebcat 8 years ago
  > But having to explicitly specify both destination and source sizes might prevented a lot of buffer overwrite bugs.
  A good way to prevent this is to have a buffer abstraction, where the size is a property of the type, e.g.,
  typedef struct { size_t bytes_used; size_t capacity; void *data; } buf_t; int buf_init(buf_t *buf); void buf_cleanup(buf_t *buf); void buf_copy(buf_t *dst, buf_t *src); /* ... */
  Of course, it doesn't prevent people from using memcpy directly.
- rwmj 8 years ago
  I guess so. One of the LWN comments mentions a Microsoft function memcpy_s defined as:
  memcpy_s (void *dest, size_t destSize, const void *src, size_t count);
  which is effectively equivalent to your memcpy_oobp function.
  However the Microsoft function also returns an error code which must be checked (because count might be larger than destSize), thus providing another way for the programmer to screw up. I'm not sure if this is better or worse than just copying the min() as in your second example. It probably depends on the situation.
  - lomnakkus 8 years ago
    
    Using min() seems like it could be incredibly dangerous as an "implicit" behavior, not to mention surprising.
    I'd wager it'd be much better to just specify that abort() gets called in the "overflow" case. (Given that overflow is basically never what you want anyway.)
    Yeah, it'll crash but at least it won't be suprising/undefined behavior.
    
    rwmj 8 years ago
    
    For extra fun, the Microsoft implementation of memcpy_s returns an error instead of crashing if either of the pointers is NULL (thankfully doesn't apply if the copy size is 0). There's a reason I don't like writing software for Windows ...
rurban 8 years ago

Just use memcpy_s. This has the destbuf size argument. It's even in C11, but you need the safeclib or MSVC, as no libc cares about the safety annex.

deng 8 years ago

Thankfully, compiler warnings and static analyzers have become much better in recent years. For instance, gcc can now warn about a missing 'break;' mentioned in the article (you need to add a special comment like '/* fall through */' if it's intentional). Also, clang-tidy is getting better with each release. I highly recommend using it, although the initial configuration will take some time, depending on the code base.

xroche 8 years ago

Alas! strlcpy and strlcat are still not present in the glibc, despite numerous attempts, mainly for religious reasons (ie. "BSD sucks").

And yes, having something like "if (strlcat(buffer, src, sizeof(buffer) >= sizeof(buffer)) { abort(); } " is much better than buffer overrun. But security does not always seem to be a real concern, compared to politics.

yason 8 years ago

C is dangerous partly because assembly language is dangerous. We will always need some layer on top of assembly that is mostly unchecked and reflects back to how cpu instructions work. This is probably something we must live with until we have processors with the notion of type checking.

C is dangerous partly because of swaths of undefined behaviour and loose typing. Eliminating much of undefined behaviour either by defining the behaviour or forcing the compiler to refuse compile undefined behaviour could be of some help. There are still classes of undefined behaviour that cannot be worked around but narrowing that down to a minimal set would make it easier to deal with it. Strong typing would help build programs that won't compile unless they are correct at least in terms of types of values.

C is dangerous partly because of the stupid standard library which isn't necessarily a core language problem as other libraries can be used. The standard library should be replaced with any of the sane libraries that different projects have written for themselves to avoid using libc. It's perfectly possible not to have memcpy() or strcpy() like minefields or strtok() or strtol() which introduce the nice invisible access to internal static storage, fixed by a re-entrant variant like strtok_r(), or require you to do multiple checks to determine how the function actually failed. The problem here is that if there are X standards, adding one to replace them all will make it X+1 standards.

Yet, good programmers already avoid 99% of the problems by manually policing themselves. For them, C is simple, productive, and manageable in a lot more cases and domains than it is for the less experienced programmers.

pjmlp 8 years ago

Ironically other systems programming languages developed outside AT&T walls since 1961 did not suffer from the majority of C's pain points regarding memory corruption.
I really wish Bell Labs had been allowed to sell UNIX.

IshKebab 8 years ago

Terrible title. It's not remotely news that C is dangerous. This talk seems to be about ways of mitigating the dangers. Why not call it "Mitigating the dangers of C" or something else that is less of a tired cliche?

pjmlp 8 years ago

Because "Making C Less Dangerous" is the actual title of the talk, and "Towards less dangerous C" is part of the agenda?

fithisux 8 years ago

The title is completely misleading.

vardump 8 years ago

I write a ton of C and I completely agree with the title. With 20+ years of experience.
Kernel drivers and embedded system bare metal firmware.
The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests. And that something can lead to disastrous crashes and security vulnerabilities.
- FraKtus 8 years ago
  
  Don't you think that with the tools we have now it's easier to control the quality of code produced (Clang memory sanitizers and so on)? I feel more at ease to ship C code today after instrumenting it than a few years ago...
  - vardump 8 years ago
    
    Tooling absolutely helps to reduce defects. That's why you use them.
    That said, sometimes I'm shocked what kind of disasters get past the analyzers.
    Stakes are higher than ever. It's not just about functional correctness and avoiding crashes anymore. Your code needs to be secure against outside world malicious actions. Getting rid of counterintuitive security vulnerabilities is very, very hard.
    
    pjmlp 8 years ago
    
    I would say that is why security conscious developers use them.
    Sadly we are a very very tiny percentage, as proven by Herb Sutter question to the audience at CppCon (1% of the audience answered positively), and CVE frequent updates.
  - pjmlp 8 years ago
    
    Not really, as it is proven almost on daily basis.
    https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=memory+corr...
    
    FraKtus 8 years ago
    
    How do you know that developers working on those used tools such as the Clang Memory Sanitizer?
    
    pjmlp 8 years ago
    
    Because many on that list are well known FOSS projects that supposedly have such processes in place, including manual review before accepting patches into mainline, like the Linux kernel being discussed here.
- raxxorrax 8 years ago
  
  For embedded systems I mostly go with "dynamic memory allocation of any kind is evil" and that solves a lot of issues already.
  You can still overwrite memory but it suddenly became much less likely.
  - vardump 8 years ago
    
    > For embedded systems I mostly go with "dynamic memory allocation of any kind is evil" and that solves a lot of issues already.
    Yeah, bare metal systems often don't allocate at all. Although one sin they often do commit is using same buffer for multiple purposes. What could go wrong...
    Perhaps even more common is allocating a buffer on stack and writing past bounds somehow. Also DMA to/from stack is usually not a great idea...
    Above things sound dumb, but can easily happen when you build your abstraction layers and use them carelessly.
    
    MrBuddyCasino 8 years ago
    
    > DMA to/from stack
    wait what oh my god
  - maccard 8 years ago
    
    That only eliminates a certain case of bugs. There are still plenty of foot-shotguns available - memcpy/memset, strlen, gets/puts, printf, any file IO, networking calls, etc.
- AlotOfReading 8 years ago
  
  This is my view as well, from the same industry. However, the quality of the tools available in C to deal with its issues far exceed those in any other language. I would love to drop C from all my systems, but the alternatives simply aren't there.
  - pjmlp 8 years ago
    
    The alternatives were there before UNIX took over server room and workstation market.
    Just imagine how many millions the IT industry and PhD research have spent developing solutions that would improve C's safety, many of them largely ignored by most C developers.
- abainbridge 8 years ago
  
  > The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests.
  That's also true of all the other languages.
- yaris 8 years ago
  
  Well, I can say the same about Python, Erlang, Lua, in addition to C and C++. I believe C is not worse than these languages, only that C requires different (sometimes very different) skills and discipline.
  - vardump 8 years ago
    
    I'm absolutely sure same skill level programmer will create less defects in Python, Erlang and Lua than in C. You really have to try to overwrite memory in those languages.
    Of course you can shoot yourself into foot with stuff like metatables in Lua and Python metaclasses and whatnot. Then again you should see some C macro messes around...
    Anyways I don't like when people defend C with that age old argument it requires a clever disciplined programmer that never makes mistakes. Because either such programmers don't exist or they're very rare.
    
    notacoward 8 years ago
    
    > I'm absolutely sure same skill level programmer will create less defects in Python, Erlang and Lua than in C.
    Fewer defects, or just different (arguably less severe) defects? It's great that you're sure, but evidence would be even better.
    
    vardump 8 years ago
    
    Ok, that's a fair point. I don't have the evidence for that.
    Scripting languages do have their pitfalls. Lua and python can have type mismatches and even typos causing misbehavior, things that usually aren't issues with C.
    However, you do need significantly less code than in C.
  - pjmlp 8 years ago
    
    Python, Erlang, Lua = Logic Errors
    C and C++ = Logic Errors + Memory Corruption + UB
    From this point of view,
    Σ Logic Errors < Σ (Logic Errors + Memory Corruption + UB)
    
    jcelerier 8 years ago
    
    hmmm... In my experience, I have had much less logic errors in C++ than in Python or JS because I tend to try to encode the domain logic into the types as much as possible, so that I can piggyback on the compiler.
    
    pjmlp 8 years ago
    
    And how many memory corruption and UB errors did your Python and JS code had?
    
    jcelerier 8 years ago
    
    none, but I hardly ever encounter them in C++ too since I always develop in debug mode with sanitizers and debug std containers, so they blow up immediately. In C that's another story...
    
    pjmlp 8 years ago
    
    So you are part of the 1% audience crowd from CppCon, developing software alone in C++ without any third party binary libs. :)
jojoo 8 years ago

It's probably a reference to https://en.m.wikipedia.org/wiki/Considered_harmful
aogl 8 years ago

I agree, it's very click-baity. C is actually great and is only really dangerous because it gives the programmer so much control.
- simias 8 years ago
  
  I'm a C coder first and foremost and I strongly disagree with this mentality (even though I know it's extremely pervasive in our circles). "Footguns don't make bugs, coders do" is technically true but if we could keep the footguns at a minimum and only get them out of the locker when truly necessary instead of having them spread all over the place all the time I'm sure it wouldn't hurt.
  C is a very useful language and one you basically have to know if you're interested in low level software but it's very, very far from flawless.
  If you look at many high profile software vulnerabilities of late (heartbleed, goto fail, etc...) many can be traced to the lack of safety and/or bad ergonomics of the C language.
  We need to grow up as an industry and accept that using a seatbelt doesn't mean that you're a bad driver. Shit happens.
- pietroglyph 8 years ago
  
  > C is actually great and is only really dangerous because it gives the programmer so much control.
  This doesn't actually refute the assertion that C is dangerous :)
  Control and increased safety are not mutually exclusive. I'll take safe-by-default, unsafe-when-asked any day. It's not 1972 anymore.
- buboard 8 years ago
  
  "Programmers using C are considered dangerous"
- willtim 8 years ago
  
  C actually gives one rather limited control over modern hardware with it's memory hierarchies and superscaler CPUs. Programming language research has also moved on a lot since the 70's, which is why we should be considering less dangerous languages (e.g. better type systems and less undefined behaviour). Languages like ATS and Rust also support explicit memory management, whilst being a whole lot safer.
  - vardump 8 years ago
    
    C alone doesn't provide the control directly, but you as a programmer can absolutely leverage C to take control of the memory hierarchies by controlling your data access patterns. IOW, high locality of reference.
    Good C-compilers will most of the time take care of the superscalar CPU friendliness. When they don't, you can always drop down to the assembler level, and it'll mesh well with C.
    
    willtim 8 years ago
    
    High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell has support for this). But this is a long way from having complete control how each memory heirarchy is used.
    Likewise most static languages defer to the compiler for CPU-specific performance optimisations and will permit foreign native calls into C or ASM where necessary. So I don't see how this is an argument in C's favour.
    
    vardump 8 years ago
    
    > High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell supports this).
    You often also need correct alignment. Cache-line or page. Your unboxed access across two pages can cause two TLB misses, L1 misses etc. Not to mention two page faults.
    Sometimes you need to ensure two (or more) buffers are NOT aligned in a particular way to avoid interfering with CPU caching mechanisms.
    
    willtim 8 years ago
    
    The only support C is giving you for this is that it has sized unboxed types (and raw pointer access). Even then, you'd have to trust the compiler and take measurements to be sure.
    
    unwind 8 years ago
    
    That's not true, since C11 we have:
    #include <stdlib.h> void *aligned_alloc(size_t alignment, size_t size);
    which works like malloc() but lets you specify the required alignment.
    
    willtim 8 years ago
    
    Could I not just call such a function from my alternative language?
    
    rurban 8 years ago
    
    Only for sizes <4KB
    
    vardump 8 years ago
    
    That's not true. You can also control data alignment in various ways. For example, you don't need to write to the beginning of an allocated buffer, but skip to the point where low order address bits are what you want.
    Something like this for example:
    char* aligned_buf; char* buf; size_t max_align_offset = (1<<align) - 1; buf = malloc(length_needed + max_align_offset); aligned_buf = (buf + max_align_offset) & ~max_align_offset;
    In the example, if align==8, you have 256 byte alignment. If it's 12, 4kB alignment.
    
    pjmlp 8 years ago
    
    Good luck ensuring that doesn't trigger UB across all target architectures and compilers being used.
    
    vardump 8 years ago
    
    I agree, but that's besides the point.
    It was just an example to show one way how C can control alignment.
    
    willtim 8 years ago
    
    This example looks like it should be turned into a malloc variant, i.e. needs only to be written once. Raw pointer access is also available in other languages of course, albeit it is usually made much more difficult.
  - pjmlp 8 years ago
    
    Even in the 70's there was NEWP, PL/I, PL/S, PL/8, Concurrent Pascal, Mesa, BLISS, Modula-2, ....
    C wins them all in implicit conversions and opportunities for memory corruption.
    Their major sin was to be tied to commercial OSes, instead of one with source code available for a symbolic price to universities.
  - millstone 8 years ago
    
    Are you suggesting that other languages provide more control over modern hardware?
    
    willtim 8 years ago
    
    Yes. Currently access to modern hardware features are either via cumbersome APIs (e.g. NUMA, AVX intrinsics), handled via the OS (e.g. paging, scheduling), or handled via the hardware itself (cache memory hierarchy). The problem will get worse as modern CPUs and machines continue to diverge from those originally targetted by C in the 1970s.

xvilka 8 years ago

Hopefully Zig [1] language will become a better alternative to C in upcoming years. Not talking about higher level code where Rust or Go can be a better choice.

[1] https://ziglang.org/

pjmlp 8 years ago

No language can become an alternative to C in the context of UNIX like OS because no one is going to re-write them from scratch, given their symbiotic nature.
Even if the complete userspace of Aix, HP-UX, *BSD, GNU/Linux, OS X, iOS, Solaris,.... gets re-writen in something else, there will always be the kernel written in C.
Hence why improving C's lack of safety is so important to get a proper IT stack.
abainbridge 8 years ago

The problem with Zig is that they changed almost everything. I think there's a high risk they introduced new design problems that we won't know about fully until Zig has been used in anger for 10 years.
I've always felt that C is near the sweet spot. I'd rather see a minimal change to C that broke backwards compatibility (because it has to) and fixed the top ten simple problems.

amelius 8 years ago

Why don't they use valgrind?

deng 8 years ago

The kernel has CONFIG_HAVE_DEBUG_KMEMLEAK.

Settings

C considered dangerous

Keyboard Shortcuts