Human Thinking vs Parser Thinking

May 12, 2016 / Parser, Compiler, ECMAScript, JavaScript

Here’s an interesting discussion that came up during a code review at my company Kash.

One reviewer said the following JavaScript code looks weird and was surprised that it even works:

   new ClassName().something()

and that the following coding convention is more preferable and more readable:

   (new ClassName()).something()

The author of the change disagrees and wonders why there is confusion about that. The code says to instantiate a class and then call a function of the instance. Seems pretty self-explanatory.

I was then asked for my opinion.

At first my human brain agrees that (new ClassName()).something() reads more intuitively. However, new ClassName().something() is obviously valid in JavaScript grammar, so I wondered how it worked exactly. I pulled out the ECMAScript 1 parser I wrote and figured it out. Below I’ll outline the difference between how we humans think and how parsers think about this code.

How Humans Think

   new ClassName().something()

When us humans look at the code above, we probably think in terms of two parts and from left to right. This is because there is a space after the keyword new, so we automatically want to separate into two parts.

We either group the instantiation together as part 1, and the invocation as part 2:

//  _____________  ___________
// |   part 1    ||   part2   |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾
   new ClassName().something()

or we see that maybe it could be interpreted as follows:

//  ____________  _____________________
// |   part 1   ||   part2             |
//  ‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
   new           ClassName().something()

Since there are two ways to look at the code, our brain all of a sudden sees some ambiguity and the brackets help avoid dealing with it.

With the brackets added, nobody is confused and seems more natural in helping us decipher code.

//  _______________  ___________
// |   part 1      ||   part2   |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾
   (new ClassName()).something()

The brackets have a similar effect as when we use them in math expressions. Kind of like 1 + (2 * 3) might be easier to read, than 1 + 2 * 3. There are lots of spaces and we’re trying to group things so we can comprehend it. Although we might be tempted to read it as (1 + 2) * 3, instead we have to look at the rest of the expression and understand some rules.

Looking at rest of the expression is exactly how a parser would look at things.

How Parsers Think

Using my parser I figured out the resulting Abstract Syntax Tree (AST), which is the result after a parser decided what the source code means.

A JavaScript parser would implement the ECMAScript grammar specification in order to determine if a group of text is infact valid JavaScript source code.

The process of determining what the source code means is similar to humans reading code. i.e. one keyword or a group of text at a time

The difference is that a parser might need to finish looking at a complete expression before knowing what the interpretation should be, but a human would be quick to judge. When we see new ClassName().something() there is an urge to immediately start separating it into logical parts with the help of the space character.

You’ll see how our human way of looking at things is different than how a parser ultimately sees things. The parser’s resulting AST is shown below:

//  ______________________________________________________________________________________________
// |   ExpressionStatement                                                                        |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓
//  ______________________________________________________________________________________________
// |   CallExpression                                                                             |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓                                                                              ↓
//  _____________________________________________________________________________  _______________
// |   MemberExpression                                                          ||   Arguments   |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓                                                   ↓        ↓                 ↓
//  __________________________________________________  _______  ________________  _______________
// |   MemberExpression                               ||   .   ||   Identifier   ||   Arguments   |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓                                                   ↓        ↓                 ↓
//  __________________________________________________  _______  ________________  _______________
// |   NewExpression                                  ||   .   ||   Identifier   ||   Arguments   |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                       ↓                ↓        ↓                 ↓
//  _________  ______________________  _______________  _______  ________________  _______________
// |   new   ||   MemberExpression   ||   Arguments   ||   .   ||   Identifier   ||   Arguments   |
//  ‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                       ↓                ↓        ↓                 ↓
//  _________  ______________________  _______________  _______  ________________  _______________
// |   new   ||   Identifier         ||   Arguments   ||   .   ||   Identifier   ||   Arguments   |
//  ‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                       ↓                ↓        ↓                 ↓
       new        ClassName               ()               .        something         ()

That was probably a lot of mumbo jumbo if you’re not familiar with programming language grammar. Fear not, what you really need to know is that the parser in the end see the code more like this:

//  _________________________
// |   part 2                | <- part 2 = CallExpression
// |_____________            |
// |   part 1    |           | <- part 1 = NewExpression
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
   new ClassName().something()

It sees that whole thing (represented with part 2) as what’s called a CallExpression, which means an invocation of something. Within that CallExpression, there is a NewExpression (part 1) that needs to be evaluated in order to know what we’re doing an invocation on.

So how the parser ultimately represents the code is different than how humans think. We think we need to do part 1 and then do part 2. Parser thinks there’s a part 2 and in order to figure out part 2, it needs to figure out part 1.

The difference is subtle eh?

Is There Ambiguity?

Going back to what’s ambiguous for us humans. One might ask, isn’t it ambiguous to the parser too? Could it read the same source code as follows?

//  ____________  _____________________
// |   part 1   ||   part2             |
//  ‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
   new           ClassName().something()

Is there ambiguity in the language? How does that work?

Well, let’s check with the grammar and see. What we want to see is if we could have NewExpression as the overall expression instead of CallExpression. We want to answer whether the following is possible according to the grammar:

//  __________________________  
// |   part 2                 | <- part 2 = NewExpression
// |    ______________________|
// |   |   part 1             | <- part 1 = CallExpression
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  
   new  ClassName().something()

To understand rest of the exploration, you need to know that a programming language grammar specifies what different things could be expanded into. I will be exploring all possible expansions based on the grammar to see if our desired outcome is possible.

We’ll start the expansion with what we want the top level to be, a NewExpression:

//  _________________________
// |   ExpressionStatement   |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓
//  _________________________
// |   NewExpression         |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾

Now the grammar tells us that a NewExpression could either be expanded into new NewExpression or MemberExpression. Note that the new NewExpression expansion is recursive. That’s not a mistake.

Let’s first see if expanding to new NewExpression would work.

//  ____________________________________________________________
// |   NewExpression                                            |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓
//  _________  _________________________________________________
// |   new   ||   NewExpression                                 |
//  ‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓
//  _________  _________________________________________________
// |   new   ||   MemberExpression                              |
//  ‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                       ↓        ↓
//  _________  ______________________  _______  ________________
// |   new   ||   MemberExpression   ||   .   ||   Identifier   |
//  ‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                       ↓        ↓
       new        ?                       .        something

We first did the expansion from NewExpresion to new NewExpression. Then taking the NewExpression after new and expanding that to MemberExpression.

From here, the language grammar says MemberExpression could be expanded to MemberExpression . Identifier. Trying that we see it’s not possible for this to have . Identifier Arguments at the end, which is necessary to match .something().

Going back one step, the grammar also says MemberExpression can also become PrimaryExpression. If you go through that path, you’ll also see that doesn’t work in the end.

Now let’s see if NewExpression expanding to MemberExpression immediately would work.

//  _____________________________________________________________________________
// |   NewExpression                                                             |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓
//  _____________________________________________________________________________
// |   MemberExpression                                                          |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                                                  ↓
//  _________  _________________________________________________  _______________
// |   new   ||   MemberExpression                              ||   Arguments   |
//  ‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                       ↓        ↓                 ↓
//  _________  ______________________  _______  ________________  _______________
// |   new   ||   MemberExpression   ||   .   ||   Identifier   ||   Arguments   |
//  ‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//     ↓          ↓                       ↓        ↓                 ↓
       new        ?                       .        something         ()

We follow the possible expansions as far as we can go and gets closer to what we need. However, we’re left with one final MemberExpression and it’s not possible to expand that to Identifier Arguments, which we need to represent ClassName().

Therefore, when encountering new ClassName().something(), the only way the parser is going to interpret the code is that it is an overall CallExpression. There is no confusion here.

But What About new (ClassName().something())?

One small question remains and that’s how would new (ClassName().something()) work? Is that allowed?

We can see how the parsing will play out below:

//  ________________________________________________________________________________
// | NewExpression                                                                  |
//  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//   ↓      ↓
//  _____  _________________________________________________________________________
// | new || NewExpression                                                           |
//  ‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//   ↓      ↓
//  _____  _________________________________________________________________________
// | new || MemberExpression                                                        |
//  ‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//   ↓      ↓
//  _____  _________________________________________________________________________
// | new || PrimaryExpression                                                       |
//  ‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
//   ↓      ↓        ↓                                                            ↓
//  _____  ___  _______________________________________________________________  ___
// | new || ( || Expression                                                    || ) |
//  ‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾
//   ↓      ↓        ↓                                                            ↓
//  _____  ___  _______________________________________________________________  ___
// | new || ( || CallExpression                                                || ) |
//  ‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾
//   ↓      ↓    ↓                                                   ↓            ↓
//  _____  ___  __________________________________________________  ___________  ___
// | new || ( || CallExpression                                   || Arguments || ) |
//  ‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾
//   ↓      ↓    ↓                                ↓    ↓             ↓            ↓
//  _____  ___  _______________________________  ___  ____________  ___________  ___
// | new || ( || CallExpression                || . || Identifier || Arguments || ) |
//  ‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾
//   ↓      ↓    ↓                   ↓            ↓    ↓             ↓            ↓
//  _____  ___  __________________  ___________  ___  ____________  ___________  ___
// | new || ( || MemberExpression || Arguments || . || Identifier || Arguments || ) |
//  ‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾
//   ↓      ↓    ↓                   ↓            ↓    ↓             ↓            ↓
//  _____  ___  __________________  ___________  ___  ____________  ___________  ___
// | new || ( || Identifier       || Arguments || . || Identifier || Arguments || ) |
//  ‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾‾‾‾‾‾‾‾‾  ‾‾‾
//   ↓      ↓    ↓                   ↓            ↓    ↓             ↓            ↓
     new    (    ClassName           ()           .    something     ()           )

Great! So the grammar says that’s valid JavaScript.

In this case, the parent AST node is a NewExpression and has a CallExpression as a child of it. This is reversed from previously where CallExpression is the parent and NewExpression is the child.

Conclusion

So in the end there is no ambiguity of how JavaScript parser see new ClassName().something(). Our brain will probably still be a bit hesitant from time to time, but just remember to see it as an overall CallExpression like the parser does, or just put brackets around to help yourself.

How you should deal with the difference between human thinking and parser thinking depends on your team’s coding convention and decision.

But either way, hopefully once you understood how the parser see things in the end, you’ll never forget that. I sure won’t.