Rust-Infused Compilers: Nushell
Posted Jan 15, 2024. Updated Feb 01, 2025 ‐ 6 min read
(2025-02-01) I wanted to write a series of articles about compilers that are written in Rust. But I'm not sure if I have the will and strength to push it forward. I'm publishing this not-finished article because I don't want to delete it. At least not now. Maybe later ;). I'm good at writing and then not publishing and deleting ;)
Nushell scripting language is a language of Nushell. It sounds a little weird, but there is a name conflict with existing Nu language. Sophia June Turner the primary author of a Nushell (shell) and the main architect of the Nushell (language) has suggested to just use Nushell name :).
Ok, so we have explained language naming. Lets focus on a language features.
Basic language features
There is a long chaper about custom commands in Nushell documentation. It describes in deepth all the details.
Let's create a math.nu file.
def "operation" [] {
["you can call the following subcommands: +, -, *, /, **"]
}
def "operation +" [a, b] {
let c = $a + $b
[$a + $b = $c]
}
def "operation -" [a: int, b: int] {
let c = $a - $b
[$a - $b = $c]
}
def "operation *" [a: float, b: float] {
let c = $a * $b
[$a * $b = $c]
}
def "operation /" [a: float, b: int] {
let c = $a / $b
[$a / $b = $c]
}
def "operation **" [a: float, b: int = 2] {
let c = $a ** $b
[$a ** $b = $c]
}
We have here:
Now lets check how it works.
source math.nu
operation
โญโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ 0 โ you can call the following subcommands: +, -, *, /, ** โ
โฐโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
operation + 2 3
โญโโโโฌโโโโฎ
โ 0 โ 2 โ
โ 1 โ + โ
โ 2 โ 3 โ
โ 3 โ = โ
โ 4 โ 5 โ
โฐโโโโดโโโโฏ
operation - 20 3
โญโโโโฌโโโโโฎ
โ 0 โ 20 โ
โ 1 โ - โ
โ 2 โ 3 โ
โ 3 โ = โ
โ 4 โ 17 โ
โฐโโโโดโโโโโฏ
operation * 3.5 2.5
โญโโโโฌโโโโโโโฎ
โ 0 โ 3.50 โ
โ 1 โ * โ
โ 2 โ 2.50 โ
โ 3 โ = โ
โ 4 โ 8.75 โ
โฐโโโโดโโโโโโโฏ
operation / 3.5 2.5
Error: nu::parser::parse_mismatch
ร Parse mismatch during operation.
โญโ[entry #112:1:1]
1 โ operation / 3.5 2.5
ยท โโฌโ
ยท โฐโโ expected int
โฐโโโโ
operation / 3.5 2
โญโโโโฌโโโโโโโฎ
โ 0 โ 3.50 โ
โ 1 โ / โ
โ 2 โ 2 โ
โ 3 โ = โ
โ 4 โ 1.75 โ
โฐโโโโดโโโโโโโฏ
operation ** 5
โญโโโโฌโโโโโโโโฎ
โ 0 โ 5.00 โ
โ 1 โ ^ โ
โ 2 โ 2 โ
โ 3 โ = โ
โ 4 โ 25.00 โ
โฐโโโโดโโโโโโโโฏ
operation ** 5 5
โญโโโโฌโโโโโโโโโโฎ
โ 0 โ 5.00 โ
โ 1 โ ^ โ
โ 2 โ 5 โ
โ 3 โ = โ
โ 4 โ 3125.00 โ
โฐโโโโดโโโโโโโโโโฏ
Let's create a color.nu file.
# tell me your favorite color function
def color [
name?: string # name of a color
] {
if ($name == null) {
"tell me what color you like"
} else {
$"you like ($name) color"
}
}
def "color hex" [
name: string # name of a color
--hex: string # hex value of a color
] {
if ($hex == null) {
$"your favorite color is ($name)"
} else {
$"your favorite color is ($name) and it's hex value is ($hex)"
}
}
def "color others" [
...names: string # names of other colors
] {
print "your other preferred colors are:"
for $name in $names {
print $name
}
}
We have here:
Now lets check how it works.
source color.nu
color
tell me what color you like
color red
you like red color
color hex red
your favorite color is red
color hex red --hex
Error: nu::parser::missing_flag_param
ร Missing flag argument.
โญโ[entry #151:1:1]
1 โ color hex red --hex
ยท โโโฌโโ
ยท โฐโโ flag missing string argument
โฐโโโโ
color hex red --hex #ff0000
Error: nu::parser::missing_flag_param
ร Missing flag argument.
โญโ[entry #148:1:1]
1 โ color hex red --hex #ff0000
ยท โโโฌโโ
ยท โฐโโ flag missing string argument
โฐโโโโ
color hex red --hex "#ff0000"
your favorite color is red and it's hex value is #ff0000
color others black white blue
your other preferred colors are:
black
white
blue
help color
tell me your favorite color function
Usage:
> color (name)
Subcommands:
color hex -
color others -
Flags:
-h, --help - Display the help message for this command
Parameters:
name <string>: name of a color (optional)
Input/output types:
โญโโโโฌโโโโโโโโฌโโโโโโโโโฎ
โ # โ input โ output โ
โโโโโผโโโโโโโโผโโโโโโโโโค
โ 0 โ any โ any โ
โฐโโโโดโโโโโโโโดโโโโโโโโโฏ
help color hex
Usage:
> color hex {flags} <name>
Flags:
--hex <String> - hex value of a color
-h, --help - Display the help message for this command
Parameters:
name <string>: name of a color
Input/output types:
โญโโโโฌโโโโโโโโฌโโโโโโโโโฎ
โ # โ input โ output โ
โโโโโผโโโโโโโโผโโโโโโโโโค
โ 0 โ any โ any โ
โฐโโโโดโโโโโโโโดโโโโโโโโโฏ
Let's create a pipe.nu file.
def lightest_processes [] {
ps | sort-by mem | sort-by virtual | take 10
}
def "print all" [] {
each { |process| $"light process is ($process)" }
}
def "print name" [] {
each { |process| $"light process is ($process.name)" }
}
We have here:
Now lets check how it works.
source pipe.nu
lightest_processes
โญโโโโฌโโโโโโฌโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโฌโโโโโโฌโโโโโโโโโโฎ
โ # โ pid โ ppid โ name โ status โ cpu โ mem โ virtual โ
โโโโโผโโโโโโผโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโผโโโโโโผโโโโโโโโโโค
โ 0 โ 2 โ 0 โ kthreadd โ Sleeping โ 0.00 โ 0 B โ 0 B โ
โ 1 โ 3 โ 2 โ rcu_gp โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 2 โ 4 โ 2 โ rcu_par_gp โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 3 โ 5 โ 2 โ slub_flushwq โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 4 โ 6 โ 2 โ netns โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 5 โ 8 โ 2 โ kworker/0:0H-events_highpri โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 6 โ 10 โ 2 โ mm_percpu_wq โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 7 โ 11 โ 2 โ rcu_tasks_kthread โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 8 โ 12 โ 2 โ rcu_tasks_rude_kthread โ Unknown โ 0.00 โ 0 B โ 0 B โ
โ 9 โ 13 โ 2 โ rcu_tasks_trace_kthread โ Unknown โ 0.00 โ 0 B โ 0 B โ
โฐโโโโดโโโโโโดโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโดโโโโโโดโโโโโโโโโโฏ
lightest_processes | print name
โญโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ 0 โ light process is kthreadd โ
โ 1 โ light process is rcu_gp โ
โ 2 โ light process is rcu_par_gp โ
โ 3 โ light process is slub_flushwq โ
โ 4 โ light process is netns โ
โ 5 โ light process is kworker/0:0H-events_highpri โ
โ 6 โ light process is mm_percpu_wq โ
โ 7 โ light process is rcu_tasks_kthread โ
โ 8 โ light process is rcu_tasks_rude_kthread โ
โ 9 โ light process is rcu_tasks_trace_kthread โ
โฐโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
lightest_processes | print all
โญโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ 0 โ light process is {pid: 2, ppid: 0, name: kthreadd, status: Sleeping, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 1 โ light process is {pid: 3, ppid: 2, name: rcu_gp, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 2 โ light process is {pid: 4, ppid: 2, name: rcu_par_gp, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 3 โ light process is {pid: 5, ppid: 2, name: slub_flushwq, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 4 โ light process is {pid: 6, ppid: 2, name: netns, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 5 โ light process is {pid: 8, ppid: 2, name: kworker/0:0H-events_highpri, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 6 โ light process is {pid: 10, ppid: 2, name: mm_percpu_wq, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 7 โ light process is {pid: 11, ppid: 2, name: rcu_tasks_kthread, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 8 โ light process is {pid: 12, ppid: 2, name: rcu_tasks_rude_kthread, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โ 9 โ light process is {pid: 13, ppid: 2, name: rcu_tasks_trace_kthread, status: Unknown, cpu: 0, mem: 0 B, virtual: 0 B} โ
โฐโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Ok, now we know a little about basic Nushell features.
There are other chapters in documentation that explains other topics:
There is also a nice repository that contains a lot of sample scripts.
Let's assume that we have a basic understanding of language features and syntax. Let's try to figureout how it works under the hood.
Parser crate
Here you can find source code of a parser crate. There is also a new-nu-parser, but I'll not describe it in this article.
There is a grat description how Nushell parser works in it's README.md file. Please take a moment and read it before reading the rest of this article.
It divides interpretation process on three main phases:
Entry point
Let's start analyzing how parser works from reading parse function. This function is called from nu-command crate and is main entry point of an interpretation process. Parse function after setting up a few things related to StateWorkingSet from nu-protocol crate quickly calls lexer. It's doing it by calling lex function.
Lexer
As you could read in lexing description it's whole purpose is to determine where tokens starts and ends. Lex function calls lex_internal function. This function gets a reference to an array with code plus a few other parameters and needs to return a tuple containing vector of Token structs and optional ParseError enum. (BTW. This error handling is preety neat. I usually use Result<T, E> in 99,99% of the cases, because I want result or error. But this handling may be useful in some cases.)
Here is how Token struct looks like.
pub struct Token {
pub contents: TokenContents,
pub span: Span,
}
TokenContents enum gives an information about type of token.
pub enum TokenContents {
Item,
Comment,
Pipe,
PipePipe,
Semicolon,
OutGreaterThan,
OutGreaterGreaterThan,
ErrGreaterThan,
ErrGreaterGreaterThan,
OutErrGreaterThan,
OutErrGreaterGreaterThan,
Eol,
}
Span field gives an information where token is located.
pub struct Span {
pub start: usize,
pub end: usize,
}
Ok. We know more or less what parameters lex_internal function takes and what should return. Let's have a look how it works. The main part is a long while loop. It goes through all chars in input array and tries to determinate what each char means.
Let's have a look at super simple fragment.
} else if c == b'\r' {
// Ignore a stand-alone carriage return
curr_offset += 1;
}
When it finds a carriage return char it just increments curr_offset and this way carriage return char is not classified as any token and ignored.
Let's have a look at something a little more complicated.
} else if c == b'\n' {
// If the next character is a newline, we're looking at an EOL (end of line) token.
let idx = curr_offset;
curr_offset += 1;
if !additional_whitespace.contains(&c) {
output.push(Token::new(
TokenContents::Eol,
Span::new(span_offset + idx, span_offset + idx + 1),
));
}
}
In this situation when new line char \n will be detected, than Token with type TokenContents::Eol will be added to returned vector of Tokens.
lex_internal function has checks for the following chars: '|', ';', '\r', '\n', '#', ' ', '\t'. If char is not on the list, than lex_item function will be called.
This function is designed to identify a single token. It returns a tuple that contains Token and Option<ParseError>.
Just like in lex_internal function there is a long while loop. This time we have a nice comment about how it's intended to work.
// The process of slurping up a baseline token repeats:
//
// - String literal, which begins with `'` or `"`, and continues until
// the same character is encountered again.
// - Delimiter pair, which begins with `[`, `(`, or `{`, and continues until
// the matching closing delimiter is found, skipping comments and string
// literals.
// - When not nested inside of a delimiter pair, when a terminating
// character (whitespace, `|`, `;` or `#`) is encountered, the baseline
// token is done.
// - Otherwise, accumulate the character into the current baseline token.
lex_item uses two helper functions: is_item_terminator and is_special_item.
It also tries to determine if we are not trying to use bash like redirections that are not legal in Nushell.
b"2>&1" => {
err = Some(ParseError::ShellOutErrRedirect(span));
Token {
contents: TokenContents::Item,
span,
}
}
I think these are the most important parts and functions of lexer.
After execution of lex function parse_block function is called.
Lite parsing
One of the parameters parse_block function takes is a reference to array of Tokens. It returns Block structure. At the beginning lite_parse function is called.