Hacking the Python syntax: Ternary operator

11 min read Original article ↗

Press enter or click to view image in full size

Photo by Our Life in Pixels on Unsplash

In 5 lines of code

Raimi Karim

Introduction | Ternary operator | Alternate lambda syntax | No return keyword in function (coming soon)| List comprehension++ (coming soon)

Changelogs:
31 Dec 2022 — Use Medium’s new code block for syntax highlighting
5 Jan 2022 — Fix typos and improve clarity

Table of contents

  1. Ternary operator
  2. Current syntax
  3. Target syntax
  4. Adding tokens
  5. Changing the grammar
  6. Examples

For environment setup, read Part 0 — Introduction. For the completed code, see my fork here.

1. Ternary operator

The ternary operator is common in many languages. It is used to express a simple conditional expression concisely, usually in one line.

2. Current syntax

In Python, we already have the conditional expression:

"good" if x==100 else "bad"

that follows the rule

<consequent> if <test> else <alternative>

I don’t prefer reading the conditional expression in this manner.

I prefer to first read the condition/test, then read the 2 outcomes together. (I only realise this after learning about JavaScript’s ternary operator.)

3. Target syntax

So the ternary operator in the conditional expression looks like this:

score>=50 ? "pass" : "fail"

where the rule now becomes

<test> ? <consequent> : <alternative>

It’s a nice-to-have syntax because I think it’s more readable — first, ask a yes-no question, secondly, indicate it with a ? (which semantically means a question in most natural languages anyway), and finally give the 2 different outcomes, separated by a colon :.

Some languages that implement this ternary operator syntax include C, Java, JavaScript and Swift.

4. Adding tokens

The ? is not an existing token as of Python v3.11.0a2.

Press enter or click to view image in full size

Photo by Brett Jordan on Unsplash

📜 Source code compilation: Generating tokens

In lexical analysis, a token is a sequence of characters (“token value”) attributed to an assigned meaning (“token name”).

Tokenisation is the process of identifying tokens from a string of input characters. It is the first step of compiling our Python source code (see here).

Press enter or click to view image in full size

Snippet from the Tokens file, which includes token names like NUMBER, STRING, LPAR and RPAR

The Grammar/Tokens file defines all the tokens allowed in Python. Here are some examples of tokens:

  • token name: NUMBER; token values are number literals like 123 and 0xff
  • token name: STRING; token values are string literals like "hello!" and 'hello!'
  • token name: LPAR; token value: '('
  • token name: RPAR; token value: ')'

The ternary operator has 2 symbols: ? and :. The : is a token value for COLON but the question mark ? is not an existing token in Python so let’s add it below the list of existing tokens

QUESTIONMARK            '?'

Press enter or click to view image in full size

then run make regen-token to regenerate the tokens.

💡 What does make regen-token do?

This command runs a Python script (yes, a Python script!) that reads the Grammar/Tokens file and generates several files including the token.c file. The C files are later compiled and used during tokenisation.

To understand more, start with the regen-token target in the ./Makefile file.

To learn more about Makefiles, here is a good introduction.

5. Changing the grammar

In this section we’ll cover the following:

5.1 Add an alternative

5.2 Update variables and grammar actions

5.3 (Optional) Conditional expression AST node

5.4 How to raise an error

5.5 Improve error messages

Now that we’ve added the token, let’s change the grammar of Python to include the ternary operator syntax for conditional expression.

Wait… what’s grammar?

Press enter or click to view image in full size

Photo by Brett Jordan on Unsplash

A grammar is a set of rules that defines the combinations of tokens considered to be correct in the language.

We want to define what it means to be writing the correct ternary operator syntax in this existing set of rules.

Let’s head over to the grammar file Grammar/python.gram. In this file, what you see is a multitude of rules, some funny looking syntax, and some familiar keywords like 'return'.

Since we’re working off the conditional expression which has the if and else keywords, let’s find that using regex: 'if'.*'else' (cos it’s cool to do that but otherwise you could do a Cmd+F for 'if').

In the python.gram file:

expression[expr_ty] (memo):

| invalid_expression

| a=disjunction 'if' b=disjunction 'else' c=expression {
_PyAST_IfExp(b, a, c, EXTRA) }

| disjunction

| lambdef

and we find that the if-else rule is parked under expression.

An expression is defined to be one of these 5 things (“alternatives”):

  1. invalid_expression (ignore this for now)
  2. invalid_legacy_expression (ignore this for now)
  3. a=disjunction 'if' b=disjunction 'else' c=expression ... (what we want to start with!)
  4. disjunction (ignore this for now)
  5. lambdef (ignore this for now)

where the | means ‘or.’

5.1 Add an alternative

To add a conditional expression for the ternary operator, let’s hastily make a duplicate of the current conditional expression, and only replace 'if' with '?', and 'else' with ':'. Then we’ll see where this goes.

Add the following rule

a=disjunction '?' b=disjunction ':' c=expression {
_PyAST_IfExp(b, a, c, EXTRA) }

in expression[expr_ty]:

expression[expr_ty] (memo):

| invalid_expression

| a=disjunction 'if' b=disjunction 'else' c=expression {
_PyAST_IfExp(b, a, c, EXTRA) }

| a=disjunction '?' b=disjunction ':' c=expression {
_PyAST_IfExp(b, a, c, EXTRA) }

| disjunction

| lambdef

Re-run using make regen-pegen && make -j4.

💡 What does make regen-pegen do?
This command regenerates the parser source code, the parser.c file, which will later be compiled and used to parse our Python code.

How is the parser.c generated? There is a Python module Tools/peg_generator/pegen that reads the python.gram file, and finally generates parser.c.

To understand more, start by looking at the regen-pegen target in the ./Makefile.

Let’s try something in the REPL by running the ./python.exe executable and test our conditional expression using the ternary operator:

>>> x = 100
>>> x>=50 ? "pass" : "fail"
"fail"

It works 🎉!

Get Raimi Karim’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Just that… we’re expecting a "pass" instead of a "fail" 😞. This brings us to the next section.

5.2 Update variables and grammar actions

Recall that the only difference between the previous conditional expression and our version is the position of <consequent> and <test>.

Current conditional expression:

a=<consequent> if b=<test> else c=<alternative>

Conditional expression using ternary operator:

a=<test> ? b=<consequent> : c=<alternative>

In the alternative we have added, let’s switch the positions of b and a in the ‘function argument’ from

_PyAST_IfExp(b, a, EXTRA)

to

_PyAST_IfExp(a, b, EXTRA)

The final expression[expr_ty] should look like this:

expression[expr_ty] (memo):

| invalid_expression

| a=disjunction 'if' b=disjunction 'else' c=expression {
_PyAST_IfExp(b, a, c, EXTRA) }

| a=disjunction '?' b=disjunction ':' c=expression {
_PyAST_IfExp(a, b, c, EXTRA) }

| disjunction

| lambdef

and rerun make -j4 && ./python.exe.

In the Python REPL:

>>> x = 100
>>> x==100 ? "good" : "bad"
"good"

Whew, it works 🎉🎉! But why does it work?

Let’s first understand grammar variables and grammar actions at a high level, then look at how they relate to the AST and _PyAST_IfExp.

expression[expr_ty] (memo):

| invalid_expression

| invalid_legacy_expression

| a=disjunction 'if' b=disjunction 'else' c=expression {
_PyAST_IfExp(b, a, c, EXTRA) }

| disjunction

| lambdef

In the alternative that we added, we have

  • the a=, b= and c=, and
  • the curly braces { ... } which contain some kind of function _PyAST_IfExp with arguments a, b and c.

The a, b and c's in this rule are variables that capture what is assigned (via the =). These variables are used in grammar actions.

A grammar action tells the parser what AST node to generate if the alternative is successfully parsed.

📜 Source code compilation: Generating ASTs

AST stands for abstract syntax tree and it’s a high-level representation of your source code in a tree data structure.

Generating ASTs is the second step in source code compilation.

Examples of AST nodes generated are _PyAST_IfExp etc.

📜 Source code compilation: Beyond ASTs

What happens after the AST is generated? It is transformed into a graph data structure that represents the flow of your program. Based on this graph data structure, emit bytecode instructions.

To tie it up altogether, once we successfully parse the ternary operator alternative, we will generate an AST node for the conditional expression, called _PyAST_IfExp via the grammar action.

5.3 (Optional) Conditional expression AST node

We said that the _PyAST_IfExp was “some kind of function.” Well, it is indeed a C function!

In the header file Include/internal/pycore_ast.h, we see its function declaration:

expr_ty _PyAST_IfExp(expr_ty test, expr_ty body, expr_ty orelse, 
int lineno, int col_offset, int end_lineno,
int end_col_offset, PyArena *arena);

💡 C header file
In C and C++, a
header file is a place to define function signatures without their bodies.

In the function declaration, the first 3 arguments represent what we wrote in the grammar file in the following order:

  1. test — the condition,
  2. body — the consequent if the condition is true, and
  3. orelse — the consequent if the condition is false.

The rest of the arguments are injected by the EXTRA macro. These are what the authors call automatic variables. The variables _start_line_no, etc. are automatically injected by the parser.

#define EXTRA _start_lineno, _start_col_offset, \
_end_lineno, _end_col_offset, p->arena

💡 C macro
In C and C++, a
macro is loosely defined as the “search and replace.” Here, every time EXTRA is encountered in the C source code, it will be expanded to _start_lineno, _start_col_offset, _end_lineno, _end_col_offset, p->arena.

5.4 How to raise an error

Let’s now test our syntax in the REPL by writing an incomplete conditional expression:

>>> x = 100
>>> x==100 ? "good"
SyntaxError: invalid syntax

Great! It threw a SyntaxError as expected but the error message just said… invalid syntax? That’s… not a useful error message. It’s always a good idea to give more useful messages to help programmers in debugging.

💡 Better error messages in Python 3.10
One of the features of the Python 3.10 release is to provide users with more useful error messages for errors related to syntax, indentation, attribute and variable names (and even give possible suggestions!). See the release notes here.

So the question now is… can we raise an error while parsing? Yes, we can! This is through the invalid_expression alternative.

But you might think, wait… what how is that a rule?

Remember previously we said that grammar actions generate AST nodes upon successful parsing? Well… there’s a little more to it.

Press enter or click to view image in full size

python.gram: invalid_expression rule

Have a look at the invalid_expression rule. Notice that the grammar actions in every alternative are all RAISE_SYNTAX_ERROR_KNOWN_RANGE. If you look at all other rules that start with invalid_, you’ll see that their grammar actions are RAISE_*_ERROR_*.

So to raise syntax error, we define an alternative, that will raise an error (instead of creating an AST node).

💡 How syntax errors are generated
Read here to read more on how syntax errors are generated.

5.5 Improve error messages

So instead of just telling the programmer that their syntax is wrong, we reassure them that hey it’s no biggie, you only forgot a colon.

Let’s add the following alternative under the invalid_expression rule:

| a=disjunction '?' b=disjunction !':' {
RAISE_SYNTAX_ERROR_KNOWN_RANGE(
b, a, "expected ':' after '?' expression") }

The invalid_expression should look like this:

invalid_expression:
# !(NAME STRING) is not matched so we don't show this error with some invalid string prefixes like: kf"dsfsdf"
# Soft keywords need to also be ignored because they can be parsed as NAME NAME
| !(NAME STRING | SOFT_KEYWORD) a=disjunction b=expression_without_invalid {
_PyPegen_check_legacy_stmt(p, a) ? NULL : p->tokens[p->mark-1]->level == 0 ? NULL :
RAISE_SYNTAX_ERROR_KNOWN_RANGE(a, b, "invalid syntax. Perhaps you forgot a comma?") }
| a=disjunction 'if' b=disjunction !('else'|':') {
RAISE_SYNTAX_ERROR_KNOWN_RANGE(a, b, "expected 'else' after 'if' expression") }
| a=disjunction '?' b=disjunction !':' {
RAISE_SYNTAX_ERROR_KNOWN_RANGE(b, a, "expected ':' after '?' expression") }

Let’s compile make -j4 && ./python.exe and run the following in the REPL:

>>> 10>2 ? "correct"

File "<stdin>", line 1
10>2 ? "correct"
^
SyntaxError: expected ':' after '?' expression

Voilà! We get a more meaningful error message!

6. Examples

Ternary operator in a return statement:

>>> def evaluate(score):
... return score >= 50 ? "pass" : "fail"
>>> evaluate(100)
"pass"

Ternary operator in a list comprehension

>>> [x%2 ? "odd" : "even" for x in range(5)]
['even', 'odd', 'even', 'odd', 'even']

Ternary operator in f-string without parentheses

>>> score = 100
>>> f'{score >= 50 ? "pass" : "fail"}'
File "sdin>, line 1
(score >= 50 ? 'pass' )
^
SyntaxError: f-string: expected ':' after '?' expression

Ternary operator in f-string with parentheses

>>> score = 100
>>> f'{(score >= 50 ? "pass" : "fail")}'
"pass"

That’s all for now, folks! Stay tuned for more, like Alternate lambda syntax, No return keyword in function and List comprehension++!

I post on Artificial Intelligence, Machine Learning, Programming Languages, and Productivity.

If you like to read more content on programming languages, you can either subscribe to receive updates whenever I publish or sign up via my referral link! Note that a portion of your membership fees will be apportioned to me as referral fees.