Rust-Infused Compilers: Nushell

Posted Jan 15, 2024. Updated Feb 01, 2025 ‐ 6 min read

(2025-02-01) I wanted to write a series of articles about compilers that are written in Rust. But I'm not sure if I have the will and strength to push it forward. I'm publishing this not-finished article because I don't want to delete it. At least not now. Maybe later ;). I'm good at writing and then not publishing and deleting ;)

Nushell scripting language is a language of Nushell. It sounds a little weird, but there is a name conflict with existing Nu language. Sophia June Turner the primary author of a Nushell (shell) and the main architect of the Nushell (language) has suggested to just use Nushell name :).

Ok, so we have explained language naming. Lets focus on a language features.

Basic language features

There is a long chaper about custom commands in Nushell documentation. It describes in deepth all the details.

Let's create a math.nu file.

def "operation" [] {
  ["you can call the following subcommands: +, -, *, /, **"]
}

def "operation +" [a, b] {
  let c = $a + $b
  [$a + $b = $c]
}

def "operation -" [a: int, b: int] {
  let c = $a - $b
  [$a - $b = $c]
}

def "operation *" [a: float, b: float] {
  let c = $a * $b
  [$a * $b = $c]
}

def "operation /" [a: float, b: int] {
  let c = $a / $b
  [$a / $b = $c]
}

def "operation **" [a: float, b: int = 2] {
  let c = $a ** $b
  [$a ** $b = $c]
}

We have here:

Now lets check how it works.

source math.nu

operation
╭───┬────────────────────────────────────────────────────────╮
│ 0 │ you can call the following subcommands: +, -, *, /, ** │
╰───┴────────────────────────────────────────────────────────╯

operation + 2 3
╭───┬───╮
│ 0 │ 2 │
│ 1 │ + │
│ 2 │ 3 │
│ 3 │ = │
│ 4 │ 5 │
╰───┴───╯

operation - 20 3
╭───┬────╮
│ 0 │ 20 │
│ 1 │ -  │
│ 2 │  3 │
│ 3 │ =  │
│ 4 │ 17 │
╰───┴────╯

operation * 3.5 2.5
╭───┬──────╮
│ 0 │ 3.50 │
│ 1 │ *    │
│ 2 │ 2.50 │
│ 3 │ =    │
│ 4 │ 8.75 │
╰───┴──────╯

operation / 3.5 2.5
Error: nu::parser::parse_mismatch

  × Parse mismatch during operation.
   ╭─[entry #112:1:1]
 1 │ operation / 3.5 2.5
   ·                 ─┬─
   ·                  ╰── expected int
   ╰────

operation / 3.5 2
╭───┬──────╮
│ 0 │ 3.50 │
│ 1 │ /    │
│ 2 │    2 │
│ 3 │ =    │
│ 4 │ 1.75 │
╰───┴──────╯

operation ** 5
╭───┬───────╮
│ 0 │  5.00 │
│ 1 │ ^     │
│ 2 │     2 │
│ 3 │ =     │
│ 4 │ 25.00 │
╰───┴───────╯

operation ** 5 5
╭───┬─────────╮
│ 0 │    5.00 │
│ 1 │ ^       │
│ 2 │       5 │
│ 3 │ =       │
│ 4 │ 3125.00 │
╰───┴─────────╯

Let's create a color.nu file.

# tell me your favorite color function
def color [
name?: string # name of a color
] {
  if ($name == null) {
    "tell me what color you like"
  } else {
    $"you like ($name) color"
  }
}

def "color hex" [
name: string # name of a color
--hex: string # hex value of a color
] {
  if ($hex == null) {
    $"your favorite color is ($name)"
  } else {
    $"your favorite color is ($name) and it's hex value is ($hex)"
  }
}

def "color others" [
  ...names: string # names of other colors
] {
  print "your other preferred colors are:"
  for $name in $names {
    print $name
  }
}

We have here:

Now lets check how it works.

source color.nu

color
tell me what color you like

color red
you like red color

color hex red
your favorite color is red

color hex red --hex
Error: nu::parser::missing_flag_param

  × Missing flag argument.
   ╭─[entry #151:1:1]
 1 │ color hex red --hex
   ·               ──┬──
   ·                 ╰── flag missing string argument
   ╰────

color hex red --hex #ff0000
Error: nu::parser::missing_flag_param

  × Missing flag argument.
   ╭─[entry #148:1:1]
 1 │ color hex red --hex #ff0000
   ·               ──┬──
   ·                 ╰── flag missing string argument
   ╰────

color hex red --hex "#ff0000"
your favorite color is red and it's hex value is #ff0000

color others black white blue
your other preferred colors are:
black
white
blue

help color
tell me your favorite color function

Usage:
  > color (name)

Subcommands:
  color hex -
  color others -

Flags:
  -h, --help - Display the help message for this command

Parameters:
  name <string>: name of a color (optional)

Input/output types:
  ╭───┬───────┬────────╮
  │ # │ input │ output │
  ├───┼───────┼────────┤
  │ 0 │ any   │ any    │
  ╰───┴───────┴────────╯

help color hex
Usage:
  > color hex {flags} <name>

Flags:
  --hex <String> - hex value of a color
  -h, --help - Display the help message for this command

Parameters:
  name <string>: name of a color

Input/output types:
  ╭───┬───────┬────────╮
  │ # │ input │ output │
  ├───┼───────┼────────┤
  │ 0 │ any   │ any    │
  ╰───┴───────┴────────╯

Let's create a pipe.nu file.

def lightest_processes [] {
  ps | sort-by mem | sort-by virtual | take 10
}

def "print all" [] {
  each { |process| $"light process is ($process)" }
}

def "print name" [] {
  each { |process| $"light process is ($process.name)"  }
}

We have here:

Now lets check how it works.

source pipe.nu

lightest_processes
╭───┬─────┬──────┬─────────────────────────────┬──────────┬──────┬─────┬─────────╮
│ # │ pid │ ppid │            name             │  status  │ cpu  │ mem │ virtual │
├───┼─────┼──────┼─────────────────────────────┼──────────┼──────┼─────┼─────────┤
│ 0 │   2 │    0 │ kthreadd                    │ Sleeping │ 0.00 │ 0 B │     0 B │
│ 1 │   3 │    2 │ rcu_gp                      │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 2 │   4 │    2 │ rcu_par_gp                  │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 3 │   5 │    2 │ slub_flushwq                │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 4 │   6 │    2 │ netns                       │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 5 │   8 │    2 │ kworker/0:0H-events_highpri │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 6 │  10 │    2 │ mm_percpu_wq                │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 7 │  11 │    2 │ rcu_tasks_kthread           │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 8 │  12 │    2 │ rcu_tasks_rude_kthread      │ Unknown  │ 0.00 │ 0 B │     0 B │
│ 9 │  13 │    2 │ rcu_tasks_trace_kthread     │ Unknown  │ 0.00 │ 0 B │     0 B │
╰───┴─────┴──────┴─────────────────────────────┴──────────┴──────┴─────┴─────────╯

lightest_processes | print name
╭───┬──────────────────────────────────────────────╮
│ 0 │ light process is kthreadd                    │
│ 1 │ light process is rcu_gp                      │
│ 2 │ light process is rcu_par_gp                  │
│ 3 │ light process is slub_flushwq                │
│ 4 │ light process is netns                       │
│ 5 │ light process is kworker/0:0H-events_highpri │
│ 6 │ light process is mm_percpu_wq                │
│ 7 │ light process is rcu_tasks_kthread           │
│ 8 │ light process is rcu_tasks_rude_kthread      │
│ 9 │ light process is rcu_tasks_trace_kthread     │
╰───┴──────────────────────────────────────────────╯

lightest_processes | print all
╭──────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                0 │ light process is {pid: 2, ppid: 0, name: kthreadd, status: Sleeping, cpu: 0, mem: 0 B, virtual: 0 B}                                 │
│                1 │ light process is {pid: 3, ppid: 2, name: rcu_gp, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                                    │
│                2 │ light process is {pid: 4, ppid: 2, name: rcu_par_gp, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                                │
│                3 │ light process is {pid: 5, ppid: 2, name: slub_flushwq, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                              │
│                4 │ light process is {pid: 6, ppid: 2, name: netns, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                                     │
│                5 │ light process is {pid: 8, ppid: 2, name: kworker/0:0H-events_highpri, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}               │
│                6 │ light process is {pid: 10, ppid: 2, name: mm_percpu_wq, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                             │
│                7 │ light process is {pid: 11, ppid: 2, name: rcu_tasks_kthread, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                        │
│                8 │ light process is {pid: 12, ppid: 2, name: rcu_tasks_rude_kthread, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                   │
│                9 │ light process is {pid: 13, ppid: 2, name: rcu_tasks_trace_kthread, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B}                  │
╰──────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Ok, now we know a little about basic Nushell features.

There are other chapters in documentation that explains other topics:

There is also a nice repository that contains a lot of sample scripts.

Let's assume that we have a basic understanding of language features and syntax. Let's try to figureout how it works under the hood.

Parser crate

Here you can find source code of a parser crate. There is also a new-nu-parser, but I'll not describe it in this article.

There is a grat description how Nushell parser works in it's README.md file. Please take a moment and read it before reading the rest of this article.

It divides interpretation process on three main phases:

Entry point

Let's start analyzing how parser works from reading parse function. This function is called from nu-command crate and is main entry point of an interpretation process. Parse function after setting up a few things related to StateWorkingSet from nu-protocol crate quickly calls lexer. It's doing it by calling lex function.

Lexer

As you could read in lexing description it's whole purpose is to determine where tokens starts and ends. Lex function calls lex_internal function. This function gets a reference to an array with code plus a few other parameters and needs to return a tuple containing vector of Token structs and optional ParseError enum. (BTW. This error handling is preety neat. I usually use Result<T, E> in 99,99% of the cases, because I want result or error. But this handling may be useful in some cases.)

Here is how Token struct looks like.

pub struct Token {
    pub contents: TokenContents,
    pub span: Span,
}

TokenContents enum gives an information about type of token.

pub enum TokenContents {
    Item,
    Comment,
    Pipe,
    PipePipe,
    Semicolon,
    OutGreaterThan,
    OutGreaterGreaterThan,
    ErrGreaterThan,
    ErrGreaterGreaterThan,
    OutErrGreaterThan,
    OutErrGreaterGreaterThan,
    Eol,
}

Span field gives an information where token is located.

pub struct Span {
    pub start: usize,
    pub end: usize,
}

Ok. We know more or less what parameters lex_internal function takes and what should return. Let's have a look how it works. The main part is a long while loop. It goes through all chars in input array and tries to determinate what each char means.

Let's have a look at super simple fragment.

} else if c == b'\r' {
    // Ignore a stand-alone carriage return
    curr_offset += 1;
}

When it finds a carriage return char it just increments curr_offset and this way carriage return char is not classified as any token and ignored.

Let's have a look at something a little more complicated.

} else if c == b'\n' {
    // If the next character is a newline, we're looking at an EOL (end of line) token.
    let idx = curr_offset;
    curr_offset += 1;
    if !additional_whitespace.contains(&c) {
        output.push(Token::new(
            TokenContents::Eol,
            Span::new(span_offset + idx, span_offset + idx + 1),
        ));
    }
}

In this situation when new line char \n will be detected, than Token with type TokenContents::Eol will be added to returned vector of Tokens.

lex_internal function has checks for the following chars: '|', ';', '\r', '\n', '#', ' ', '\t'. If char is not on the list, than lex_item function will be called.

This function is designed to identify a single token. It returns a tuple that contains Token and Option<ParseError>.

Just like in lex_internal function there is a long while loop. This time we have a nice comment about how it's intended to work.

// The process of slurping up a baseline token repeats:
//
// - String literal, which begins with `'` or `"`, and continues until
//   the same character is encountered again.
// - Delimiter pair, which begins with `[`, `(`, or `{`, and continues until
//   the matching closing delimiter is found, skipping comments and string
//   literals.
// - When not nested inside of a delimiter pair, when a terminating
//   character (whitespace, `|`, `;` or `#`) is encountered, the baseline
//   token is done.
// - Otherwise, accumulate the character into the current baseline token.

lex_item uses two helper functions: is_item_terminator and is_special_item.

It also tries to determine if we are not trying to use bash like redirections that are not legal in Nushell.

b"2>&1" => {
    err = Some(ParseError::ShellOutErrRedirect(span));
    Token {
        contents: TokenContents::Item,
        span,
    }
}

I think these are the most important parts and functions of lexer.

After execution of lex function parse_block function is called.

Lite parsing

One of the parameters parse_block function takes is a reference to array of Tokens. It returns Block structure. At the beginning lite_parse function is called.