Syscalls in Go
  |   Source

syscalls in Go

I was reviewing a PR to a project I know little about the other day, and one of the test was marked with Skip and a comment that it takes too long. This piqued my curiosity, so I took a deeper look. The function in question takes a library of valid characters and produces a random string of some length from that library. It is very straightforward, and would have gone completely unnoticed had it not been for a bit of developer paranoia. The author wrote a test invoking the function 1 million times, and it takes a handful of seconds. "That's odd!", I thought. This is basically string and number manipulation; even inefficient code tends to run quickly on modern machines. So began my journey under the hood of the crypto/rand package.

Here is the original function:

func genRandString(n int) string {
        pool := []rune(SOME_ODD_CHARS)
        b := make([]rune, n)
        for i := range b {
                r, err := rand.Int(rand.Reader, big.NewInt(int64(len(pool))))
                if err != nil {
                        log.Fatalf("encountered an error %v", err)
                }
                b[i] = pool[r.Int64()]
        }
        return string(b)
}

I was in a hurry and not paying attention. My original glance at the code had me thinking that it was searching the space of valid permutations. This was an ignorant thought, but it made me wonder if the necessary bytes could be read at once. As it turns out, there is!

Before we get to that, let me explain what the previous function does; it wasn't obvious to me at first.

  • Initialize a slice of default bytes (value 0) of length n
  • For every byte in the slice
    • Read a random int in the range of the library.
    • Update this byte with the random value
  • Return the mutated slice

I dug into the crypto/rand library a bit and found that you can read the random bytes in one call. Here is the updated function:

func genRandStringV2(n int) string {
        pool := []rune(SOME_ODD_CHARS)
        b := make([]rune, n)
        r := make([]byte, n)
        rand.Read(r)
        for i := range r {
                b[i] = pool[int(r[i])%len(pool)]
        }
        return string(b)
}

Time to test! This won't run on the playground because it relies on time and it takes too long.

Test code

package main

import (
        "crypto/rand"
        "fmt"
        "log"
        "math/big"
        "os"
        "time"
)

const SOME_ODD_CHARS string = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&*?()"

func genRandString(n int) string {
        pool := []rune(SOME_ODD_CHARS)
        b := make([]rune, n)
        for i := range b {
                r, err := rand.Int(rand.Reader, big.NewInt(int64(len(pool))))
                if err != nil {
                        log.Fatalf("encountered an error %v", err)
                }
                b[i] = pool[r.Int64()]
        }
        return string(b)
}

func genRandStringV2(n int) string {
        pool := []rune(SOME_ODD_CHARS)
        b := make([]rune, n)
        r := make([]byte, n)
        rand.Read(r)
        for i := range r {
                b[i] = pool[int(r[i])%len(pool)]
        }
        return string(b)
}

func main() {
        if len(os.Args) != 2 {
                log.Fatalf("usage: script [v1, v2]")
        }
        if os.Args[1] == "v1" {
                start := time.Now()
                for i := 0; i < 1000000; i++ {
                        genRandString(10)
                }
                fmt.Printf("First version took %v\n", time.Since(start))
        } else {
                start := time.Now()
                for i := 0; i < 1000000; i++ {
                        genRandStringV2(10)
                }
                fmt.Printf("Second version took %v\n", time.Since(start))
        }
}

Performance

 go run cmd/main.go v1
First version took 5.958197642s

❯ go run cmd/main.go v2
Second version took 375.601598ms

Awesome, mission accomplished! Leave a note on the PR and walk away...

As I was writing the note, I realized that I completely misread the original function in the first place. Functionally speaking, there is no difference between the two versions. Rolling a fair die n times is statistically no different than rolling n dice one time. In my naïve view of computing things, I saw bunch of stuff flying around on the heap, some CPU, and dash of magic.

I sat back and thought about this for a bit. Cryptographic randomness requires a bit of entropy, so maybe the difference is side effects. Maybe there is some setup overhead in the crypto library.

There is an init function in the library, but that is only called once. The folks in our internal chat had to draw me a picture before the light went off, but the basic idea was "it's faster to read n bytes at once than to read 1 byte n times." Again, my head was stuck in the land of pure functions; why should this matter?! ... unless there are side effects.

The right idea finally made its way into my thick skull. I did some interweb searching and figured out how to use strace. I first searched for entropy, but that came up empty. Next, I tried random. Eureka!

Tracing syscalls

 strace -T ./main v1 2>&1 | rg "random" | wc -l
2979598 strace -T ./main v2 2>&1 | rg "random" | wc -l
432364

Anyone who does frequent systems programming probably stopped reading long ago. My domain is network automation, so the majority of my code is talking to slow network devices or making calls to APIs over the network which tend to be slow as well. In other words, this overhead is so bad that scripts and programs often take several seconds to run. What I learned from this is that syscalls are black boxes just like the network. You are making an API call and turning over control to another entity. Who knows what the kernel is going to do or when it will get around to doing it.

I am glad I chased this rabbit down the hole because I feel like it was a valuable lesson. No, I don't need to prematurely optimize away syscalls in network-heavy code, but I do need to expand my view of the computing universe and how things work under the hood. Supply chains have been disrupted, inflation is crazy high, and companies are looking to cut costs at every turn. As developers, we should probably try to be a bit more efficient in the systems we create because economic laws may bite us long before Moore's law does!