Cdecl – Turns English phrases into C declarations

150 points by dammitcoetzee 7 years ago · 55 comments

Reader

Here's an easy way to understand how these things work: in C, the type of a pointer/function/array mess is declared by how it's used. For a declaration like "int ( * ( * foo)(void))[3]", you can read it as "for a variable foo, after computing the expression ( * ( * foo)(void))[3], the result is an int."

So one way to read C "gibberish" is to ignore the type at the beginning and parse the rest as an expression like a normal parse tree. First we take foo. Then we dereference it (so foo is a pointer). Next we call it as a function with no arguments (so foo is a pointer to a function that takes no arguments). Next, we dereference it again. Then we index into the result as an array. Finally, we reach the end, so we look at what the declared type and find that this type is an int. So foo is a pointer to a function that takes no arguments and returns a pointer to an array of 3 ints.

You can also use this to go backwards. What's the syntax for a function that takes an integer argument returns a pointer to an array of function pointers taking no arguments and returning integers? Well, we want to take foo, call it, dereference it, then index into an array, then dereference it again, then call it again, then return an int. Or int (* (* (foo)(int))[5])(void).

exitcode00 7 years ago

How about just using Ada? It has the added bonus of not being a gimmick (depending on who you ask I suppose ; )
Ada: type Ret_Typ is array (1..3) of Integer; Foo : access function return not null access Ret_Typ := null;
C: int ((foo)(const void *))[3]
Cdecl: declare foo as pointer to function (pointer to const void) returning pointer to array 3 of int
- tomjakubowski 7 years ago
  In the interest of furthering annoying language smuggery, the rough Rust equivalent:
  foo: fn() -> Box<[i32; 3]>
  Alternately, if the pointer is into static memory and not something allocated on the heap:
  foo: fn() -> &'static <[i32; 3]>;
  That's pretty nice to look at and not too hard to read. In my opinion, for commonly used syntax (like fn decls), some well-chosen punctuation marks (', ->, :, in this case) are often boon to readability compared to keywords. So I think the Rust syntax in this case is nicer than Ada's.
  But in any case, while complicated C declarations may be uglier and take more effort to read than those in other languages, they are at least tractable once you learn the trick of "declaration follows use" and working backwards as GP describes.
  Separately, though, what do you mean by your "gimmick" comment?
  - exitcode00 7 years ago
    
    > Separately, though, what do you mean by your "gimmick" comment?
    Just meaning the website CDecl - its a neat tool to make C readable in English, but Ada is a real language used to make planes fly etc. Many people have contempt for it though which is why I made the joke xD
    Interesting that you can note which type of memory an anonymous type comes from in Rust. I suppose its for optimization purposes? Doesn't seem that helpful from a pure typing perspective.
    As an aside I doubt a layman would be able to understand that notation in Rust, whereas my girlfriend might be able to grasp or read Ada code or the output of CDecl.
  - masklinn 7 years ago
    
    > Alternately, if the pointer is into static memory and not something allocated on the heap
    There's really no need to box a 12-bytes array in the first place.
- jcranmer 7 years ago
  
  I'm not defending C's syntax as sane here, because it's not. It boils down to have two problems:
  1. The syntax isn't "type id, id, id;", it's "type expr, expr, expr;" The trend for C-style languages have been to move to the former type syntax, so C/C++ is the anomaly here.
  2. Pointer declarators show up to the left of the name while function and array declarators show up to the right of the name. This means you can't figure out the type by scanning in one direction. Contrast this with LLVM, where function arguments and pointer types both go to the right of the leaf type (while arrays are infix), or Rust, where they both live on the left of the leaf type.
  - exitcode00 7 years ago
    
    Reading for both ends is maddening for seasoned devs, but in a general sense most languages are symbol salads these days for arbitrary, subjective reasons. The decisions made during the C development to squeeze the juice out of 60 character wide terminals haunt us to this day... like case sensitivity
    
    kouteiheika 7 years ago
    
    > like case sensitivity
    What's wrong with case sensitivity?
    
    exitcode00 7 years ago
    
    Sure the current doctrine of programming says its good, but lets take a step back. First of all its counter intuitive to writing English and it damages readability so you may have naming conflicts without realizing it or the compiler being able to warn you.
    Also, in addition to remembering what a function is called you have to also remember its casing which different libraries may want to be phrased differently (like C libraries versus C++ libraries).
    Bypassing the very real issues it may cause in the design of your software it also may lead to silly library cruft (see Java's Color class) - how many ways can you spell blue?
    https://docs.oracle.com/javase/7/docs/api/java/awt/Color.htm...
    
    naniwaduni 7 years ago
    
    Case-sensitivity is, more or less, a default: you have different strings that have distinct encodings, so you treat them as different identifiers.
    The alternative to case-sensitivity requires your compiler to know about case, and, more importantly, how to do case-folding. At that point, you can either choose to (a) restrict identifiers to a Some (probably ASCII) limited subset of characters, (b) only make some subset of acceptable characters (reliably) case-insensitive, (c) require every compiler to have tables for case-folding.
    That's before we get into the locale-dependence of case-folding, which makes the letter "i" unreliable.
    And you still have to distinguish Color and Colour.
    
    exitcode00 7 years ago
    
    It is actually possible to define all of those things in a compiler standard and force them to do it certain ways or document possible "implementation permissions", but, again unicode identifiers are rare in practice because programming is designed for English speakers like it or not.
    http://www.ada-auth.org/standards/2xaarm/html/AA-2-3.html
    > And you still have to distinguish Color and Colour.
    Still have to do that with case sensitive langs - whats the point here?
    
    jcranmer 7 years ago
    
    Case insensitivity sounds good except it quickly runs afoul of "language isn't so simple."
    If I define a variable as "groß", does "GROSS" or "GROẞ" match it (or both, which probably implies "gross" would match as well)? What about "ê" and "E"? Or the infamous i/I/İ/ı debacle, which could make matching "insane" to "INSANE" locale-dependent? How do you define case-insensitivity in a way that makes sense?
    
    exitcode00 7 years ago
    
    These are solved problems though and unicode identifiers are rare in practice...
    See Normalization Form KC and Clause 21 of ISO/IEC 10646:2017.
    "Normalization forms are the mechanisms allowing the selection of a unique coded representation among alternative; but equivalent coded text representations of the same text. Normalization forms for use with ISO/IEC 10646 are specified in the Unicode Standard UAX#15..." yada yada
    
    jcranmer 7 years ago
    
    Unicode normalization doesn't actually solve a single problem I mentioned. All of the listed characters are equal to themselves in both NFC and NFKC.
    Also Unicode identifiers aren't rare in terms of language support. Most of the popular languages support them--C/C++, C#, Java, PHP, Python, Perl, Swift, Go, Rust, Ruby, JavaScript, even Ada. It's actually difficult to find a popular language that prohibits Unicode identifiers entirely (MATLAB does, not sure about Visual Basic).
userbinator 7 years ago

The other important part is to remember that precedence follows the same precedence as ordinary expressions, i.e. array subscripting and function call have higher precedence than pointer dereference.
It is notable that the chapter in K&R which discusses declarations also presents a partial version of the cdecl program and one of the exercises is for the reader to complete it --- really helping to dispel the notion that compilers are not mysterious magic. In my experience, it's rare for an introductory book on a programming language to also contain such "hints" on how it could be implemented.
ambrop7 7 years ago
However the declaration-mirrors-use idea does not apply to function arguments. If you have "void (* f)(int * arg)", you would not use it like "(* f)(* arg)" unless your arg is actually "int * * ".
This could be fixed. Instead of "void (* f)(int * x)" we would write "void (* f)(x &int)". Now it makes sense, the declaration says that we could call the function if we pass the address of some int y, as if by "(* f)(&y)". The specific syntax "x &int" says that the address of an int is x, the same way as "int * x" says that dereferenced x is an int.
What about "void (* f)(int x[10])" (pretending arrays could actually be passed)? With the pointer we relied on the existing opposite of the dereference operator, but there is nothing like that for arrays, that would make an array out of an element. Let's look to Python for inspiration, where the expression "[y]* N" will make a list of N elements with the value y. This gives us: "void (* f)(x [int]* N)". See how the declaration tells us that we could call the function using "(* f)([y]* N)" for some int y.
There's one more we need to solve: "void (* f)(void (* g)(int))". Since the parameter g of * f is a function pointer, we need to pass the address of a function, so clearly & will be involved. But we need a function to take the address of, and we don't have any available. Inspired by the C++ lambda syntax, let's invent function conjuration: "(Args) -> Ret" is an expression that conjures a function taking Args and returning Ret. Hence the solution: "void (* f)(g &(int) -> void)". It says that you could write "(* f)(&(int) -> void)", to call * f with the address of a conjured function taking an int and returning void.
We do need to be aware that the syntax for arguments in function conjuration expressions is the same as in top-level declarations. So we would need to rewrite "void (* f)(void (* g)(void (* h)(int * x)))" as "void (* f)(g &(void (* h)(x &int)) -> void)". So for each function pointer, its arguments must be declared in the other declaration mode.
Since this makes no sense at all, we have to conclude that the original C declaration syntax forms needs to be deprecated and only the newly invented syntax forms should be used.
```
  x &int;   (int * x)
  x &&int;   (int * * x)
  f &(x &int) -> void;   (void (* f)(int * x))
  f &(x [int]* 10) -> void;   (void (* f)(int x[10]))
```
The new syntax can also be used for function declarations:
```
  main (argc int, argv [&char]*?) -> int
  {
      return 0;
  }
```
See how we've invented a different declaration syntax (some sort of dual of C's current syntax), that actually respects "declaration-mirrors-use" better than C does and makes much more sense to humans.
- watergatorman 7 years ago
  
  1) The use of the Python feature for arrays I find confusing as it is not orthogonal to the rest of your new and improved syntax for C.
  Everywhere else, you change C's declaration order of <declaration-specifier> <declarator>, in your new syntax to place the identifier of the declarator first, followed by any pointer ops, and lastly the type. You are changing the pointer op "" from a prefix that needed to be read right-to-left, after locating the identifier of the declarator, into a suffix "&" following the identifier, to be read left-to-right.
  I agree that your change to left-to-right declaration order is definitely more readable.
  2) But in your array syntax, borrowed from Python, the type is placed inside the array brackets, which used to hold the constant-expression denoting the array size. The array size is moved from within the brackets to be last, instead of the type being last, as in all your other syntax "rules". So, for arrays, the declaration syntax no longer reads simply left-to-right, since type is between declarator identifier and array size.
  Wouldn't this be clearer, to have the type last and the constant-expression remain inside the array brackets? C syntax: (void ( f)(int x[10]))
  use this instead for your new C syntax: f &(x [10] int) -> void;
  3) I have a similiar problem with your function syntax:
  instead of:
  main (argc int, argv [&char]*?) -> int { return 0; }
  why not put the type last, so as to be consistent with all your other syntax?
  main (argc int, argv [] &char]) -> int { return 0; }
  This is how the Go programming language does it, except for the preceding "func" reserved word and "string" in place of pointer to char: func main(argc int, argv [] string) int ...
  5) The biggest problem I have is with adding "C++ lambda syntax" to C, to solve the problem of passing a function as actual parameter argument. That would mean you have 2 styles of pointers, one as a prefix and one as a suffix to the declarator identifier. So you now have to read both right-to-left and left-to-right, which seems to cancel out the benefits of only reading declarations in left-to-right order!
  Would it be simpler, and preserve left-to-right declaration order, to provide a FunctionType as in the Go programming language? A parameter that is passed a function as argument is declared to have a FunctionType. Pointers to function are not apparently needed, at least not at the user level.
  6) Q: How do these proposed changes affect the parsing of the new C syntax? Current C syntax can be parsed with predictive, non-backtracking parsers, in linear-time. I don't want to use backtracking, GLR, or other complex methods, if they are avoidable. At least C can now be parsed with with Yacc or Bison. (See A13 Grammar in K&R, "The C Programming Language" or Jacques-Henri Jourdan, François Pottier "A Simple, Possibly Correct LR Parser for C11")
  - ambrop7 7 years ago
    
    For the arrays, I agree "[10] int" is better.
    For functions, I think the -> syntax is the only thing that makes sense. It's just natural, first you need the arguments then you get the return value.
    > That would mean you have 2 styles of pointers, one as a prefix and one as a suffix to the declarator identifier. So you now have to read both right-to-left and left-to-right, which seems to cancel out the benefits of only reading declarations in left-to-right order!
    I'm not following. There are not two styles of pointers, a pointer is declared like "&type". Functions are declared like "(args) -> ret" which is read left-to-right (function taking such arguments and returning such value). A function pointer is simply a pointer to a function like "&(args) -> ret".
    > Q: How do these proposed changes affect the parsing of the new C syntax?
    I guess I should have added </sarcasm>? C would never adopt such a radical change. In any case, I don't see how it would be fundamentally more difficult to parse than the current declaration syntax. There would be problems disambiguating the two (what is "foo bar;" if foo and bar are both typedefs?). Maybe changing to require a colon after the name would make that simpler "fun: (arg: int) -> int".
    
    watergatorman 7 years ago
    
    By 2 styles of pointers used in functions, see below taken from your examples.
    By reading both left-to-right and right-to-left I mean: "star" pointer in front of the identifier read right-to-left with return function type on the left and ampersand pointer read left-to-right with return function type on the right of "->"
    Here are 2 of your function examples you gave:
    This example has both an outermost "void" function return type on the left and another to the right of "->" "void (* f)(g &(int) -> void)" Instead use the following to always read left-to-right and get rid of the "star" pointer: "f &(g &(int) -> void) -> void"
    Your next example has 2 "void" function return types on the left, and one "void" to the right of "->" "void (* f)(g &(void (* h)(x &int)) -> void)" Instead use the following to always read left-to-right: "f &(g &(h &(x &int) -> void) -> void) -> void"
    
    ambrop7 7 years ago
    
    I did say "original C declaration syntax forms needs to be deprecated and only the newly invented syntax forms should be used". So the new invented syntax alone is complete (you can express anything with &, (args)->ret and [count]type).
    
    watergatorman 7 years ago
    
    Ok, I got it now. Thanks. My faulty understanding.
    I believe you have a clean, readable syntax for C declarations.
    The Go Programming Language's declaration syntax is very similiar, except that "star" is retained and acts just as your "&"
    I don't have a complete list, but have been looking at classifying programming languages into one of 2 categories:
    Category 1 declaration syntax: Type identifier ;
    or
    Category 2 declaration syntax: identifier Type ; with perhaps a colon or other punctuation between the identifier and Type.
  - watergatorman 7 years ago
    
    I did something wrong and the asterisk or star, representing C's pointer op, has been dropped from my prior posting. I apologize, also for poor formatting.
Hex08 7 years ago

*An array of 4 ints
Great explanation though, it really helps to read things inside-out
- ramshorns 7 years ago
  
  Which part are you correcting?
  The declaration "int bar[3];" is an array of 3 ints, which are bar[0], bar[1] and bar[2]. Declaration mimics use but it's not exactly the same; in this case the size replaces the indices, which are all less than it.

ridiculous_fish 7 years ago

Hey, this is my site, first published 2009! This is the venerable cdecl enhanced with blocks support.

It used to be a shared host with a PHP script shelling out to the cdecl executable, written in K&R C. Now it's that same executable running on AWS Lambda.

Yes Lambda really will run arbitrary ELF binaries.

buboard 7 years ago

Didnt know what blocks are , it seems it's an apple extension.
- saagarjha 7 years ago
  
  Yup, they’re an extension to C/Objective-C/C++ implemented in Clang: https://en.m.wikipedia.org/wiki/Blocks_(C_language_extension...
erroneousboat 7 years ago

Thanks for creating it, it really helps with learning C.
kitd 7 years ago

Nice work.
Could you run it as a preprocessor macro?

saagarjha 7 years ago

For all of its simplicity, the syntax for complex types in C is pretty horrible. Yes, I know the "inside out" rule, and can usually read these, but that doesn't make it any less bad.

TorKlingberg 7 years ago

I've been a professional C programmer for years, but I rarely find cdecl useful (command line or website). Not because complex C declarations are intuitive to me, but because cdecl fails on any unknown types. Real world C code is full of typedefs.

kitd 7 years ago

Could you not substitute in a known type, get the result and insert the unknown type back in afterwards?

nurettin 7 years ago

Back in early 2000s, we had bots on IRC doing this. My favorite technique was to pass the type to a template function, assign it to an integer and then parse the compile time error produced by gcc to extract the type.

Eli_P 7 years ago

If I recall correctly, this one came as an exercise in Knuth's book of C programming, ibidem were C declarations and priorities explained.

userbinator 7 years ago

The K in K&R is for (Brian) Kernighan, not Knuth.
Knuth does not use C in his books.

pkaye 7 years ago

Its better to create a series of typedef and build up the declaration. Most of the time you need those sub typedef anyway.

nwmcsween 7 years ago

Extremely sparingly, typedefs like in glib are a nightmare and just arbitrary typedefs like char -> char_t are just useless

valerij 7 years ago

on topic of function pointers, is there a template to turn

  std::funtion<foo(bar, baz)>

into

  foo(*)(bar, baz)

?

bartbes 7 years ago

Sure. Here's one: https://godbolt.org/z/vWl4NE
- valerij 7 years ago
  
  huh. this was easier than expected

mey 7 years ago

Need this for bash and by proxy regex.

bewuethr 7 years ago

There is https://explainshell.com - not Bash specific, though.
drewsberry 7 years ago

There's https://regex101.com/ for regex, does an excellent job, can't really ask for more

SidiousL 7 years ago

The actual principle behind the C type declarations is "declaration follows use". Let me explain what this means. Take this declaration

   int *pi;

Means that when I dereference the variable pi, I get an int. This also explains why

   int *pi, i;

declares `pi` as a pointer to `int` and `i` as an `int`. From this point of view it makes sense stylistically to put * near the variable.

Declaration of array types is similar. For example,

   int arr[10];

means that when I take an element of `arr`, I obtain an `int`. Hence, `arr` is an array of ints.

Pointers to functions work the same way. For example,

   int (*f)(char, double);

means that if I dereference the variable `f` and I evaluate it on a `char` and on a `double`, then I get an `int`. Hence, the type of `f` is "pointer to function which takes as arguments a char and a double and returns an int".

skookumchuck 7 years ago

The way to make complex C declarations legible is to use typedefs for the subtypes (like function pointers).

Jerry2 7 years ago

Tried it on this gibberish but it complains about syntax:

((void(*)(void))0)();

poizan42 7 years ago

Besides not being a declaration it really is gibberish - calling a null pointer is undefined behaviour. I believe the correct way of calling a function at address zero is ((void()(void))(intptr_t)0)(); which is merely implementation defined.
C has the weird thing that a literal zero in a pointer context becomes a null-pointer which may not actually have the bit pattern 0. And when the optimizer sees a guaranteed null pointer it tends to optimize the whole branch away since entering that would be UB. So if you try calling address zero with "((void()(void))0)();" you might end up with the whole function optimized away as well as any function it gets inlined into.
tntn 7 years ago

That's not a declaration.

unnouinceput 7 years ago

tried: declare xxx as integer pointer to array of string equal to "mumu" and "kaka"

got: bad character '"'...apostrophe instead of double quote has the same result...well, I guess I expected too much

ComputerGuru 7 years ago

You are mixing type declarations and values. Foo equals bar is not a constraint that can be specified via the type system (generally speaking).

Settings

Cdecl – Turns English phrases into C declarations

Keyboard Shortcuts