groot: reading ROOT data, with Go, faster than ROOT

JI-2021

16 November 2021

Sebastien Binet

CNRS/IN2P3/LPC-Clermont

(a brief) History of software in HEP

2

50's-90's: FORTRAN77

c     == hello.f ==
      program main
      implicit none
      write ( *, '(a)' ) 'Hello from FORTRAN'
      stop
      end
$ gfortran -c hello.f && gfortran -o hello hello.o
$ ./hello
Hello from FORTRAN
3

90's-...: C++

#include <iostream>
int main(int, char **) {
  std::cout << "Hello from C++" << std::endl;
  return EXIT_SUCCESS;
}
$ c++ -o hello hello.cxx && ./hello
Hello from C++
4

00's-...: python

print "Hello from python"
$ python ./hello.py
Hello from python
5

ROOT

6

ROOT: a file format

ROOT is a set of C++ libraries and a data analysis framework to analyze High Energy Physics data.

But it also comes with a file format.

Basically, all the LHC data (including the one used to discover the Higgs boson) is stored using ROOT.

It's TTrees all the way down

7

Reading ROOT data

C++ is currently the lingua franca of HEP software.

Python is coming up fast on its heels.

What should another framework or set of libraries, written in an experimental, novative or brand new language, do to read/write ROOT data?

To reuse the library, wrap it (manually, or via SWIG):

or:

8

Reading ROOT data

I initially tried the latter avenue:

Unfortunately for Go, calling C from Go is quite expansive (~10 fct calls). Let alone calling C++ from C from Go.

It's also quite inconvenient to require to install ROOT (and all its dependencies), and this breaks the seamless experience of installing Go code:

## may fail because of missing dependencies
$> go get github.com/go-hep/croot

⇒ Implement reading/writing ROOT data from Go, from first principles.

9

Reading ROOT data w/o ROOT

There are actually many attempts/libraries to read (and sometimes write too) ROOT data w/o ROOT:

The last 4 projects have been sharing tips and knowledge on:

10

How to read ROOT files w/o ROOT?

11

ROOT file format

ROOT files are binary files:

These issues are usually addressed by a document specifying the file format.

Unfortunately, ROOT doesn't provide such a clear description of its on-disk format...

12

Reading back binary files

Once saved, float64, float32 or your favorite struct are all just a sequence of bytes. How does one chop and chunkify back this stream into structured data?

ie: how to read back:

[]byte{42, 42,  1, 154,  2, 0,  0,  0, 4,  0,  0,   0,  0, 0,  0, 24,
       45, 68, 84, 251, 33, 9, 64,
}

how does one know that it's a (uint8,uint16,uint32,int64,float64) ? and not, e.g. 23x(uint8) ? (or any other combination)

This is usually addressed with:

One nice property of TLV: allows to skip unknown data.

13

ROOT uses a kind of TLV scheme.

14

ROOT file

At a very high-level, a ROOT file can be seen as:

[file header]
[record1]
[record2]
[...]
[file footer]

The ROOT file header encodes:

The ROOT file footer holds the streamers (metadata about types).

15

A ROOT record consists of its key (TKey) and its associated data (an opaque binary blob).

16

TKey

The on-disk representation of a TKey is:

| Data Member | Explanation |
|-------------|-------------|
|  fNbytes    | Number of bytes for the compressed object and key. |
|  fObjlen    | Length of uncompressed object. |
|  fDatime    | Date/Time when the object was written. |
|  fKeylen    | Number of bytes for the key structure. |
|  fCycle     | Cycle number of the object. |
|  fSeekKey   | Address of the object on file (points to fNbytes).
|             | This is a redundant information used to cross-check the
|             | data base integrity. |
|  fSeekPdir  | Pointer to the directory supporting this object.|
|  fClassName | Object class name. |
|  fName      | Name of the object. |
|  fTitle     | Title of the object. |
17

Deserialization

Knowing the position+length of the user data to read, ROOT knows what to do to be able to unmarshal/deserialize that user data. The "how-to" part is done by cross-referencing the name of the user data class with the streamers list (that is stored within the ROOT file footer.)

ROOT stores metadata about the types that are stored in a ROOT file within that ROOT file. ie: ROOT files are self-describing. You don't need a priori to know anything about the data between stored inside a ROOT file to be able to correctly interpret all its bytes.

One "just" needs to be able to correctly interpret and decode/encode:

and the bootstrap process is complete.

18

TStreamerInfo & TStreamerElement

A TStreamerInfo encodes metadata about a class:

e.g.: ROOT describes the C++ type P3 below, as:

struct P3 {
    int    Px;
    double Py;
    int    Pz;
};

as:

StreamerInfo for "P3" version=1 title=""
 int    Px      offset=  0 type=  3 size=  4  
 double Py      offset=  4 type=  8 size=  8  
 int    Pz      offset= 12 type=  3 size=  4  
19

The streamer elements vocabulary is quite exhaustive and allows to represent pretty much all of possible C++ classes:

StreamerElement              // encodes name, size, shape, min/max, ... 
|
+-- StreamerBase             // base class
+-- StreamerBasicType        // builtin type (char, uint16, ...)
+-- StreamerBasicPointer     // pointer to a builtin type
+-- StreamerLoop             // repeats of a type
+-- StreamerObject           // a TObject
+-- StreamerObjectAny        // a user class
+-- StreamerObjectAnyPointer // pointer to a user class
+-- StreamerObjectPointer    // pointer to a TObject
+-- StreamerSTL              // STL container (vector/set/map/pair/unordered_xxx/...)
+-- StreamerSTLstring        // std::string
+-- StreamerString           // TString

ROOT can support reading multiple versions of a type, through this version field in the TStreamerInfo.

MyClass at version 1 may have 2 fields f1 and f2 of types float and at version 2 have those 2 fields w/ types float and double.

20

Reading ROOT files w/o ROOT

Pretty simple, right? right... (w/o having to reverse engineer the file format, it might have been.)

But that doesn't cover the main physicist use case: TTrees.

The (most used) ROOT API to read/write user data is through:

ROOT stores data in binary files, organized into TDirectories and TKeys.

21

TFile + TKey

With TFile and TKey, one could already address the following physicist use-case:

we could do something like:

f, err := groot.Create("out.root")
if err != nil {
    panic(err)
}
defer f.Close()

for i, evt := range detector.Readout() {
  log.Printf("recording event %d...", i)
  key := fmt.Sprintf("evt-%03d", i)
  err := f.Put(key, evt)
  if err != nil {
      panic(err)
  }
}
22

TFile + TKey

But:

It's doable (that's more or less what Python guys do with pickles.) But it's no panacea.

Enters TTree...

23

TTree

TTree is an API to:

Once a TTree is filled and written to a TFile, one can read it back, re-attaching variables to each of its branches (or a subset thereof), to inspect the stored data.

A TTree is kind of like a database, where you can store:

24

void write() {
    auto f = TFile::Open("out.root", "RECREATE");
    auto t = new TTree("t", "title");

    int32_t n = 0;
    double  px = 0;
    double  arr[10];
    double  vec[20];

    t->Branch("n",   &n,  "n/I");
    t->Branch("px",  &px, "px/D");
    t->Branch("arr", arr, "arr[10]/D");
    t->Branch("vec", vec, "vec[n]/D");

    for (int i = 0; i < NEVTS; i++) {
        // fill data: n, px, arr, vec with some values
        fill_data(&n, &px, &arr, &vec);

        t->Fill(); // commit data to tree.
    }

    f->Write(); // commit data to disk.
    f->Close();

    exit(0);
}
25

26

TTree (x-ray) scan

TTree reuses much of the TKey + TStreamers infrastructure.

When one connects a branch with some user data:

at that point, the TTree knows how to serialize/deserialize the user data into chunks of bytes, TBasket in ROOT speak.

To support arbitrarily nested containers/user-data, ROOT introduces the notion of branches with sub-branches with sub-branches, ... that, in fine have leaves.

This is controlled by the "split level" of a tree.

27

TTree writing

auto t = new TTree("t", "my tree");
auto n = int32_t(0);
auto d = struct{
    int32_t i32;
    int64_t i64;
    double  f64;
};
t->Branch("n", &n, "n/I");
t->Branch("d", &d);

// -> leaf_n = TLeaf<int32_t>(t, "n");
// -> leaf_d = TLeaf<struct> (t, "d");
28

TTree writing (modes)

[n, d], [n, d], [n, d], [n, d], ...
[n, n, n, n, ...], [d, d, d, d, ...]
[n, n, n, n, ...], [d.i32, d.i32, d.i32, d.i32, ...], 
[d.i64, d.i64, d.i64, d.i64, ...], [d.f64, d.f64, d.f64, d.f64, ...]

All these different ways of storing data are, ultimately, represented as TBaskets holding the serialized representation of these [...] user data as bytes.

Each TBasket associated payload, is compressed (or not). A TBasket payload may contain data from multiple entries.

29

TTree reading

auto f = TFile::Open("out.root", "READ");
auto t = f->Get<TTree>("t");

auto n = int32_t(0);
auto d = struct{
    int32_t i32;
    int64_t i64;
    double  f64;
};

t->SetBranchAddress("n", &n);
t->SetBranchAddress("d", &d);

for (int64_t i = 0; i < t->GetEntries(); i++) {
    t->GetEntry(i);
    printf("evt=%d, n=%d, d.i32=%d, d.i64=%d, d.f64=%f\n",
            i, n, d.i32, d.i64, d.f64);
}
30

TTree reading

Once a TTree is requested, ROOT needs to locate it on disk and then deserialize it (only the "metadata", not the full associated dataset payload) using the usual ROOT machinery (streamers+TKey).

A TTree knows:

A TBranch knows:

31

TTree reading

Whenever somebody asks to read entry n from disk:

And voilà, you know how (at a very coarse level) TTrees read and present data to users.

32

Go-HEP/groot

33

groot

groot is a pure-Go implementation of (a subset of) ROOT.

34

groot reading

    f, err := groot.Open("f.root")
    defer f.Close()
    o, err := f.Get("t")

    v := struct {
        N int32 `groot:"n"`
        D struct {
            I32 int32   `groot:"i32"`
            I64 int64   `groot:"i64"`
            F64 float64 `groot:"f64"`
        } `groot:"d"`
    }{}

    r, err := rtree.NewReader(o.(rtree.Tree), rtree.ReadVarsFromStruct(&v))
    defer r.Close()

    err = r.Read(func(ctx rtree.RCtx) error {
        fmt.Printf(
            "evt=%d, n=%d, d.i32=%d, d.i64=%d, d.f64=%v\n",
            ctx.Entry, v.N, v.D.I32, v.D.I64, v.D.F64,
        )
        return nil
    })
35

groot reading speed

Reading some ATLAS data, with Go-HEP v0.26, compared to ROOT/C++ 6.20

5.2 ms/kEvt (3.7 s for 720 kEvts)  [groot v0.26]
2.6 ms/kEvt (1.9 s for 720 kEvts)  [ROOT  v6.20]

"Only" ~2 times slower, w/o any optimization wrt baskets buffering, TTreeCache, ...

And that was like that since the inception of groot.

36

groot reading speed

Reading some ATLAS data, with Go-HEP v0.26, compared to ROOT/C++ 6.20

5.2 ms/kEvt (3.7 s for 720 kEvts)  [groot v0.26]
2.6 ms/kEvt (1.9 s for 720 kEvts)  [ROOT  v6.20]

And that was like that since the inception of groot.

Until, v0.27 (released May-2020):

1.6 ms/kEvt (1.1 s for 720 kEvts)  [groot v0.27]
2.6 ms/kEvt (1.9 s for 720 kEvts)  [ROOT  v6.20]

Almost twice faster than ROOT :)

See, for more informations:

How come groot is faster than ROOT to read ROOT data?

Thanks to Go's lightweight goroutines...

37

groot & Go

Go is known to be very fast to compile and relatively fast to execute. But at the moment, Go binaries are usually slower than a C++ one for number crunching.

How could a Go binary be faster than a C++ one?

Reading a TTree is basically:

38

Latency

           0.5 ns - CPU L1 dCACHE reference
           1   ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance
           5   ns - CPU L1 iCACHE Branch mispredict
           7   ns - CPU L2  CACHE reference
          71   ns - CPU cross-QPI/NUMA best  case on XEON E5-46*
         100   ns - MUTEX lock/unlock
         100   ns - own DDR MEMORY reference
         135   ns - CPU cross-QPI/NUMA best  case on XEON E7-*
         202   ns - CPU cross-QPI/NUMA worst case on XEON E7-*
         325   ns - CPU cross-QPI/NUMA worst case on XEON E5-46*
      10,000   ns - Compress 1K bytes with Zippy PROCESS
      20,000   ns - Send 2K bytes over 1 Gbps NETWORK
     250,000   ns - Read 1 MB sequentially from MEMORY
     500,000   ns - Round trip within a same DataCenter
  10,000,000   ns - DISK seek
  10,000,000   ns - Read 1 MB sequentially from NETWORK
  30,000,000   ns - Read 1 MB sequentially from DISK
 150,000,000   ns - Send a NETWORK packet CA -> Netherlands
|   |   |   |
|   |   | ns|
|   | us|
| ms|
39

groot rtree Reader

With the new re-engineered rtree.Reader, groot can infer:

and thus, for each requested branch:

So when one requests entry N, everything is already in memory, ready to be used.

40

41

groot rtree

An additional concurrency axis (not yet implemented) would be to have N concurrent goroutines each requesting/handling one entry of the tree (and filling in turn the user data)...

but being already ~2x faster than ROOT isn't too bad.

Now, the same kind of optimization should also be applied to writing...

That's all folks
42

Conclusions & Prospects

It is possible to read ROOT data faster than ROOT itself:

## CMS data

name                                  time/op
ReadCMSScalar/GoHEP/Zlib-8            3.92s ± 2%  // only read scalar data
ReadCMSScalar/ROOT-TreeBranch/Zlib-8  7.98s ± 2%  // ditto
ReadCMSScalar/ROOT-TreeReader/Zlib-8  6.60s ± 2%  // ditto

name                                  time/op
ReadCMSAll/GoHEP/Zlib-8               18.4s ± 1%  // read all branches
ReadCMSAll/ROOT-TreeBranch/Zlib-8     30.4s ± 2%  // ditto
ReadCMSAll/ROOT-TreeReader/Zlib-8     [N/A]       // comparison meaningless (b/c of loading-on-demand)

## ATLAS data

1.6 ms/kEvt (1.1 s for 720 kEvts)  [groot v0.27]
2.6 ms/kEvt (1.9 s for 720 kEvts)  [ROOT  v6.20]

(inspite of the lack of specifications for ROOT's file format)

43

Conclusions & Prospects

ROOT-7 is slated to ship with a new file format: RNTuple.

Specifications are a bit more fleshed out:

and there are plans to provide a C API (as well as a non .root file format).


groot will probably try to implement that format as well.

44

Thank you

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)