C considered dangerous
lwn.net> He asked: why is there no argument to memcpy() to specify the maximum destination length?
I'm confused by this. The third argument provides the destination length, so what good would a "maximum destination length" do? I guess he must mean that because the length is often computed, you'd need a fourth argument to ensure the length isn't greater than some sane upper bound. But you can easily fix that using an if statement around the memcpy.
Perhaps because the memory buffers might be of different size.
Maybe memcpy_oobp (out of bounds protection) signature could be:
Then again, I guess you could just as well do:memcpy_oobp(void* dst, size_t dst_size, void* src, size_t src_size);
But having to explicitly specify both destination and source sizes might have prevented a lot of buffer overwrite bugs.memcpy(dst, src, min(dst_size, src_size));> But having to explicitly specify both destination and source sizes might prevented a lot of buffer overwrite bugs.
A good way to prevent this is to have a buffer abstraction, where the size is a property of the type, e.g.,
Of course, it doesn't prevent people from using memcpy directly.typedef struct { size_t bytes_used; size_t capacity; void *data; } buf_t; int buf_init(buf_t *buf); void buf_cleanup(buf_t *buf); void buf_copy(buf_t *dst, buf_t *src); /* ... */I guess so. One of the LWN comments mentions a Microsoft function memcpy_s defined as:
which is effectively equivalent to your memcpy_oobp function.memcpy_s (void *dest, size_t destSize, const void *src, size_t count);However the Microsoft function also returns an error code which must be checked (because count might be larger than destSize), thus providing another way for the programmer to screw up. I'm not sure if this is better or worse than just copying the min() as in your second example. It probably depends on the situation.
Using min() seems like it could be incredibly dangerous as an "implicit" behavior, not to mention surprising.
I'd wager it'd be much better to just specify that abort() gets called in the "overflow" case. (Given that overflow is basically never what you want anyway.)
Yeah, it'll crash but at least it won't be suprising/undefined behavior.
For extra fun, the Microsoft implementation of memcpy_s returns an error instead of crashing if either of the pointers is NULL (thankfully doesn't apply if the copy size is 0). There's a reason I don't like writing software for Windows ...
Just use memcpy_s. This has the destbuf size argument. It's even in C11, but you need the safeclib or MSVC, as no libc cares about the safety annex.
Thankfully, compiler warnings and static analyzers have become much better in recent years. For instance, gcc can now warn about a missing 'break;' mentioned in the article (you need to add a special comment like '/* fall through */' if it's intentional). Also, clang-tidy is getting better with each release. I highly recommend using it, although the initial configuration will take some time, depending on the code base.
Alas! strlcpy and strlcat are still not present in the glibc, despite numerous attempts, mainly for religious reasons (ie. "BSD sucks").
And yes, having something like "if (strlcat(buffer, src, sizeof(buffer) >= sizeof(buffer)) { abort(); } " is much better than buffer overrun. But security does not always seem to be a real concern, compared to politics.
C is dangerous partly because assembly language is dangerous. We will always need some layer on top of assembly that is mostly unchecked and reflects back to how cpu instructions work. This is probably something we must live with until we have processors with the notion of type checking.
C is dangerous partly because of swaths of undefined behaviour and loose typing. Eliminating much of undefined behaviour either by defining the behaviour or forcing the compiler to refuse compile undefined behaviour could be of some help. There are still classes of undefined behaviour that cannot be worked around but narrowing that down to a minimal set would make it easier to deal with it. Strong typing would help build programs that won't compile unless they are correct at least in terms of types of values.
C is dangerous partly because of the stupid standard library which isn't necessarily a core language problem as other libraries can be used. The standard library should be replaced with any of the sane libraries that different projects have written for themselves to avoid using libc. It's perfectly possible not to have memcpy() or strcpy() like minefields or strtok() or strtol() which introduce the nice invisible access to internal static storage, fixed by a re-entrant variant like strtok_r(), or require you to do multiple checks to determine how the function actually failed. The problem here is that if there are X standards, adding one to replace them all will make it X+1 standards.
Yet, good programmers already avoid 99% of the problems by manually policing themselves. For them, C is simple, productive, and manageable in a lot more cases and domains than it is for the less experienced programmers.
Ironically other systems programming languages developed outside AT&T walls since 1961 did not suffer from the majority of C's pain points regarding memory corruption.
I really wish Bell Labs had been allowed to sell UNIX.
Terrible title. It's not remotely news that C is dangerous. This talk seems to be about ways of mitigating the dangers. Why not call it "Mitigating the dangers of C" or something else that is less of a tired cliche?
Because "Making C Less Dangerous" is the actual title of the talk, and "Towards less dangerous C" is part of the agenda?
The title is completely misleading.
I write a ton of C and I completely agree with the title. With 20+ years of experience.
Kernel drivers and embedded system bare metal firmware.
The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests. And that something can lead to disastrous crashes and security vulnerabilities.
Don't you think that with the tools we have now it's easier to control the quality of code produced (Clang memory sanitizers and so on)? I feel more at ease to ship C code today after instrumenting it than a few years ago...
Tooling absolutely helps to reduce defects. That's why you use them.
That said, sometimes I'm shocked what kind of disasters get past the analyzers.
Stakes are higher than ever. It's not just about functional correctness and avoiding crashes anymore. Your code needs to be secure against outside world malicious actions. Getting rid of counterintuitive security vulnerabilities is very, very hard.
I would say that is why security conscious developers use them.
Sadly we are a very very tiny percentage, as proven by Herb Sutter question to the audience at CppCon (1% of the audience answered positively), and CVE frequent updates.
Not really, as it is proven almost on daily basis.
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=memory+corr...
How do you know that developers working on those used tools such as the Clang Memory Sanitizer?
Because many on that list are well known FOSS projects that supposedly have such processes in place, including manual review before accepting patches into mainline, like the Linux kernel being discussed here.
For embedded systems I mostly go with "dynamic memory allocation of any kind is evil" and that solves a lot of issues already.
You can still overwrite memory but it suddenly became much less likely.
> For embedded systems I mostly go with "dynamic memory allocation of any kind is evil" and that solves a lot of issues already.
Yeah, bare metal systems often don't allocate at all. Although one sin they often do commit is using same buffer for multiple purposes. What could go wrong...
Perhaps even more common is allocating a buffer on stack and writing past bounds somehow. Also DMA to/from stack is usually not a great idea...
Above things sound dumb, but can easily happen when you build your abstraction layers and use them carelessly.
> DMA to/from stack
wait what oh my god
That only eliminates a certain case of bugs. There are still plenty of foot-shotguns available - memcpy/memset, strlen, gets/puts, printf, any file IO, networking calls, etc.
This is my view as well, from the same industry. However, the quality of the tools available in C to deal with its issues far exceed those in any other language. I would love to drop C from all my systems, but the alternatives simply aren't there.
The alternatives were there before UNIX took over server room and workstation market.
Just imagine how many millions the IT industry and PhD research have spent developing solutions that would improve C's safety, many of them largely ignored by most C developers.
> The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests.
That's also true of all the other languages.
Well, I can say the same about Python, Erlang, Lua, in addition to C and C++. I believe C is not worse than these languages, only that C requires different (sometimes very different) skills and discipline.
I'm absolutely sure same skill level programmer will create less defects in Python, Erlang and Lua than in C. You really have to try to overwrite memory in those languages.
Of course you can shoot yourself into foot with stuff like metatables in Lua and Python metaclasses and whatnot. Then again you should see some C macro messes around...
Anyways I don't like when people defend C with that age old argument it requires a clever disciplined programmer that never makes mistakes. Because either such programmers don't exist or they're very rare.
> I'm absolutely sure same skill level programmer will create less defects in Python, Erlang and Lua than in C.
Fewer defects, or just different (arguably less severe) defects? It's great that you're sure, but evidence would be even better.
Ok, that's a fair point. I don't have the evidence for that.
Scripting languages do have their pitfalls. Lua and python can have type mismatches and even typos causing misbehavior, things that usually aren't issues with C.
However, you do need significantly less code than in C.
Python, Erlang, Lua = Logic Errors
C and C++ = Logic Errors + Memory Corruption + UB
From this point of view,
Σ Logic Errors < Σ (Logic Errors + Memory Corruption + UB)
hmmm... In my experience, I have had much less logic errors in C++ than in Python or JS because I tend to try to encode the domain logic into the types as much as possible, so that I can piggyback on the compiler.
And how many memory corruption and UB errors did your Python and JS code had?
none, but I hardly ever encounter them in C++ too since I always develop in debug mode with sanitizers and debug std containers, so they blow up immediately. In C that's another story...
So you are part of the 1% audience crowd from CppCon, developing software alone in C++ without any third party binary libs. :)
It's probably a reference to https://en.m.wikipedia.org/wiki/Considered_harmful
I agree, it's very click-baity. C is actually great and is only really dangerous because it gives the programmer so much control.
I'm a C coder first and foremost and I strongly disagree with this mentality (even though I know it's extremely pervasive in our circles). "Footguns don't make bugs, coders do" is technically true but if we could keep the footguns at a minimum and only get them out of the locker when truly necessary instead of having them spread all over the place all the time I'm sure it wouldn't hurt.
C is a very useful language and one you basically have to know if you're interested in low level software but it's very, very far from flawless.
If you look at many high profile software vulnerabilities of late (heartbleed, goto fail, etc...) many can be traced to the lack of safety and/or bad ergonomics of the C language.
We need to grow up as an industry and accept that using a seatbelt doesn't mean that you're a bad driver. Shit happens.
> C is actually great and is only really dangerous because it gives the programmer so much control.
This doesn't actually refute the assertion that C is dangerous :)
Control and increased safety are not mutually exclusive. I'll take safe-by-default, unsafe-when-asked any day. It's not 1972 anymore.
"Programmers using C are considered dangerous"
C actually gives one rather limited control over modern hardware with it's memory hierarchies and superscaler CPUs. Programming language research has also moved on a lot since the 70's, which is why we should be considering less dangerous languages (e.g. better type systems and less undefined behaviour). Languages like ATS and Rust also support explicit memory management, whilst being a whole lot safer.
C alone doesn't provide the control directly, but you as a programmer can absolutely leverage C to take control of the memory hierarchies by controlling your data access patterns. IOW, high locality of reference.
Good C-compilers will most of the time take care of the superscalar CPU friendliness. When they don't, you can always drop down to the assembler level, and it'll mesh well with C.
High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell has support for this). But this is a long way from having complete control how each memory heirarchy is used.
Likewise most static languages defer to the compiler for CPU-specific performance optimisations and will permit foreign native calls into C or ASM where necessary. So I don't see how this is an argument in C's favour.
> High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell supports this).
You often also need correct alignment. Cache-line or page. Your unboxed access across two pages can cause two TLB misses, L1 misses etc. Not to mention two page faults.
Sometimes you need to ensure two (or more) buffers are NOT aligned in a particular way to avoid interfering with CPU caching mechanisms.
The only support C is giving you for this is that it has sized unboxed types (and raw pointer access). Even then, you'd have to trust the compiler and take measurements to be sure.
That's not true, since C11 we have:
which works like malloc() but lets you specify the required alignment.#include <stdlib.h> void *aligned_alloc(size_t alignment, size_t size);Could I not just call such a function from my alternative language?
Only for sizes <4KB
That's not true. You can also control data alignment in various ways. For example, you don't need to write to the beginning of an allocated buffer, but skip to the point where low order address bits are what you want.
Something like this for example:
In the example, if align==8, you have 256 byte alignment. If it's 12, 4kB alignment.char* aligned_buf; char* buf; size_t max_align_offset = (1<<align) - 1; buf = malloc(length_needed + max_align_offset); aligned_buf = (buf + max_align_offset) & ~max_align_offset;Good luck ensuring that doesn't trigger UB across all target architectures and compilers being used.
I agree, but that's besides the point.
It was just an example to show one way how C can control alignment.
This example looks like it should be turned into a malloc variant, i.e. needs only to be written once. Raw pointer access is also available in other languages of course, albeit it is usually made much more difficult.
Even in the 70's there was NEWP, PL/I, PL/S, PL/8, Concurrent Pascal, Mesa, BLISS, Modula-2, ....
C wins them all in implicit conversions and opportunities for memory corruption.
Their major sin was to be tied to commercial OSes, instead of one with source code available for a symbolic price to universities.
Are you suggesting that other languages provide more control over modern hardware?
Yes. Currently access to modern hardware features are either via cumbersome APIs (e.g. NUMA, AVX intrinsics), handled via the OS (e.g. paging, scheduling), or handled via the hardware itself (cache memory hierarchy). The problem will get worse as modern CPUs and machines continue to diverge from those originally targetted by C in the 1970s.
Hopefully Zig [1] language will become a better alternative to C in upcoming years. Not talking about higher level code where Rust or Go can be a better choice.
No language can become an alternative to C in the context of UNIX like OS because no one is going to re-write them from scratch, given their symbiotic nature.
Even if the complete userspace of Aix, HP-UX, *BSD, GNU/Linux, OS X, iOS, Solaris,.... gets re-writen in something else, there will always be the kernel written in C.
Hence why improving C's lack of safety is so important to get a proper IT stack.
The problem with Zig is that they changed almost everything. I think there's a high risk they introduced new design problems that we won't know about fully until Zig has been used in anger for 10 years.
I've always felt that C is near the sweet spot. I'd rather see a minimal change to C that broke backwards compatibility (because it has to) and fixed the top ten simple problems.
Why don't they use valgrind?
The kernel has CONFIG_HAVE_DEBUG_KMEMLEAK.