🛠️BlogTechHareIntro To Slices

Work in Progress


OK, actually I think I want to split this up into a few pieces so I can focus better on a specific audience. I should also probably not bother with all the screenshots as I go, and just make notes on what I want to show. Then at the end I can get all the shots in one "take".

I think the more useful/interesting addition this page can have over existent content elsewhere is visualizing the references in more complex slicing situations. This was prompted by strings::split et. co. after all.

  • slice of bytes

  • str

    • vs a slice of runes

  • slice of strings

    • getting the strings stored in memory out of order

    • from split()

    • from dupall

  • slice of padded structs


This page is a beginner's deep dive into Slices in the Hare programming language. I will assume some basic familiarity with (fancy-pants words incoming) "computer programming in an imperative paradigm" - if you know a little Python, C, Go, Java, JavaScript (no relation), Rust, etc. then what follows will probably be familiar to you.

Context: Languages

blog/tech/computer_languages/many

There have been many, many programming languages created over the years. Computers capable of being programmed entirely within themselves (as opposed to using separate physical media like punched tape, or physical connections like a plugboard or Core Rope Memory) have only existed since the 1940s, but we've already created over 700 hundred different programming languages!

And naturally with so many languages, there is quite a lot of diversity in the choices of names made by the designers of each language. For example: in JavaScript an Array can change length with the push() method, but in Go (and Hare) Arrays have fixed lengths, determined before the computer starts executing the program.

So, we have also developed various ways of talking about common patterns across programming languages, tied to various theoretical (mathematical) descriptions of computers, programs, and data. Naturally, these theories and ways of talking also have many different names for the same concepts, and occasionally conflicting or confusing uses of the same names...

The result: "Array" can refer to the abstract Array "data type", the Array "data structure", or the set of behaviors and symbols associated with a particular language's notion of an Array, depending on the context. Keep that in mind as we move along.

Slices, Preamble

In the past couple decades, "slice types" have been featured by several popular programming languages, but most notable for us is Go. These types get their name from the conceptual action of "slicing" into existing data. Unlike in the analogy of slicing a loaf of bread "slicing" data is not destructive. Instead, a more descriptive name might be "windowing" as the result of slicing is a "window" into a "range of contiguous data." Lets break that down...

Contiguous Data, aka. Arrays, Vectors, etc.

Hare is a "systems programming" language, so using it often involves thinking explicitly about the layout of data in the computer's memory. "Contiguous data" here refers to multiple pieces of data arranged one after the other in memory, with no other data in between them. The most straightforward way to get contiguous data in Hare is by declaring an Array, like so:

let array_of_ints: [3]int = [3, 2, 1];

This code tells us: when our program is run, there is a region of memory in which the three integers are stored, and the symbol array_of_ints refers to the address of that memory.

blog/tech/computers/memory

Heads Up!

If this is your first time thinking about memory, I recommend jumping down to the very bottom of how computer memory works. Here is a pretty decent basic description.

Then I recommend learning more about memory addresses. This video lecture series offers a thorough introduction with C++ code examples.

If we throw in a dummy main() funciton then we can look at our array with a debugger. With the file main.ha:

export fn main() void = void;
let array_of_ints: [3]int = [3, 2, 1];

running the terminal commands:

hare build -lc -o bin
gdb -q ./bin

gives us a gdb prompt.

the gdb command

info variable array_of_ints

then tells us the address of our array:

blog/tech/hare/intro_to_slices/gdb1

and now let's take a look at our array in memory. At this point the gdb commands will get more esoteric, but thankfully there are many tutorials and reference sheets, like this one or this one.

With the command x/3xw 0x76c28 we can see the three integers comprising our array.

blog/tech/hare/intro_to_slices/gdb2

If instead we use x/12xb 0x76c28 we see the same data, but divided into individual bytes.

blog/tech/hare/intro_to_slices/gdb3

Here we can more clearly see that for the platform I'm building this code, each int is 4 bytes long and ordered with the "least significant byte" first (aka "little endian").

Now that we've got some contiguous data, lets slice it!

Slices

Slicing in Hare involves appending '[' '..' and ']' to an expression, with some optional indices in there too. For example, array_of_ints[0..2] slices the array we just made, starting at the 0-index (the first item) and stopping before the 2-index (the third item). It gives us a window into our array that shows the first two elements!

We can demonstrate this with this code, which functions just as we'd expect:

use fmt;

let array_of_ints: [3]int = [3, 2, 1];

export fn main() void = {
    let slice: []int = array_of_ints[0..2];
    fmt::println("slice has len:", len(slice))!;
    fmt::println("the first element is", slice[0])!;
    fmt::println("the second element is", slice[1])!;
};
blog/tech/hare/intro_to_slices/gdb4

But what have we really done here?

The representation of a slice in Hare is defined to be of this form:

type types::slice = struct {
    data: nullable *opaque,
    length: size,
    capacity: size,
}

We can perform an easy sanity check for this like so:

export fn main() void = {
    let slice: []int = array_of_ints[0..2];
    ...
    fmt::println("this slice has size", size(slice))!;
};

Which prints 24, since (on this platform...) pointers and sizes both take up 8 bytes each. Here it is in the debugger (please ignore that the address of array_of_ints changed, I've had to modify the code from the above to make the slice variable visible to gdb):

blog/tech/hare/intro_to_slices/gdb5

We can see from gdb that our slice variable is equivalent to the following:

let slice = types::slice {
    data = 0x0076cc8,
    length = 2,
    capacity = 3,
};

Subhyphae