ROOT I/O & groot

LPC-Dev, 2020-06-26

Sebastien Binet





ROOT is many things:

and... a way to write and read user data.

The (most used) ROOT API to read/write user data is through:

(ROOT-7 will most probably introduce a new API based on rntuple)

ROOT stores data in binary files, organized into TDirectories and TKeys.


Reading ROOT files w/o ROOT

Pretty simple, right? right... (w/o having to reverse engineer the file format, it might have been.)

But that doesn't cover the main physicist use case: TTrees.


TFile + TKey

With TFile and TKey, one could already address the following physicist use-case:

we could do something like:

f, err := groot.Create("out.root")
if err != nil { panic(err) }
defer f.Close()

for i, evt := range detector.Readout() {
  log.Printf("recording event %d...", i)
  key := fmt.Sprintf("evt-%03d", i)
  err := f.Put(key, evt)
  if err != nil { panic(err) }

TFile + TKey


It's doable (that's more or less what Python guys do with pickles.) But it's no panacea.

Enters TTree...



TTree is an API to:

Once a TTree is filled and written to a TFile, one can read it back, re-attaching variables to each of its branches (or a subset thereof), to inspect the stored data.

A TTree is kind of like a database, where you can store:


void write() {
    auto f = TFile::Open("out.root", "RECREATE");
    auto t = new TTree("t", "title");

    int32_t n = 0;
    double  px = 0;
    double  arr[10];
    double  vec[20];

    t->Branch("n",   &n,  "n/I");
    t->Branch("px",  &px, "px/D");
    t->Branch("arr", arr, "arr[10]/D");
    t->Branch("vec", vec, "vec[n]/D");

    for (int i = 0; i < NEVTS; i++) {
        // fill data: n, px, arr, vec with some values
        fill_data(&n, &px, &arr, &vec);

        t->Fill(); // commit data to tree.

    f->Write(); // commit data to disk.



TTree (x-ray) scan

TTree reuses much of the TKey + TStreamers infrastructure.

When one connects a branch with some user data:

at that point, the TTree knows how to serialize/deserialize the user data into chunks of bytes, TBasket in ROOT speak.

To support arbitrarily nested containers/user-data, ROOT introduces the notion of branches with sub-branches with sub-branches, ... that, in fine have leaves.

This is controlled by the "split level" of a tree.


TTree writing

auto t = new TTree("t", "my tree");
auto n = int32_t(0);
auto d = struct{
    int32_t i32;
    int64_t i64;
    double  f64;
t->Branch("n", &n, "n/I");
t->Branch("d", &d);

// -> leaf_n = TLeaf<int32_t>(t, "n");
// -> leaf_d = TLeaf<struct> (t, "d");

TTree writing (modes)

All these different ways of storing data are, ultimately, represented as TBaskets holding the serialized representation of these [...] user data as bytes.

Each TBasket associated payload, is compressed (or not). A TBasket payload may contain data from multiple entries.


TTree reading

auto f = TFile::Open("out.root", "READ");
auto t = f->Get<TTree>("t");

auto n = int32_t(0);
auto d = struct{
    int32_t i32;
    int64_t i64;
    double  f64;

t->SetBranchAddress("n", &n);
t->SetBranchAddress("d", &d);

for (int64_t i = 0; i < t->GetEntries(); i++) {
    printf("evt=%d, n=%d, d.i32=%d, d.i64=%d, d.f64=%f\n",
            i, n, d.i32, d.i64, d.f64);

TTree reading

Once a TTree is requested, ROOT needs to locate it on disk and then deserialize it (only the "metadata", not the full associated dataset payload) using the usual ROOT machinery (streamers+TKey).

A TTree knows:

A TBranch knows:


TTree reading

Whenever somebody asks to read entry n from disk:

And voilĂ , you know how (at a very coarse level) TTrees read and present data to users.





groot is a pure-Go implementation of (a subset of) ROOT.


groot reading

    f, err := groot.Open("f.root")
    defer f.Close()
    o, err := f.Get("t")

    v := struct {
        N int32 `groot:"n"`
        D struct {
            I32 int32   `groot:"i32"`
            I64 int64   `groot:"i64"`
            F64 float64 `groot:"f64"`
        } `groot:"d"`

    r, err := rtree.NewReader(o.(rtree.Tree), rtree.ReadVarsFromStruct(&v))
    defer r.Close()

    err = r.Read(func(ctx rtree.RCtx) error {
            "evt=%d, n=%d, d.i32=%d, d.i64=%d, d.f64=%v\n",
            ctx.Entry, v.N, v.D.I32, v.D.I64, v.D.F64,
        return nil

groot reading speed

Reading some ATLAS data, with Go-HEP v0.26, compared to ROOT/C++ 6.20

5.2 ms/kEvt (3.7 s for 720 kEvts)  [groot v0.26]
2.6 ms/kEvt (1.9 s for 720 kEvts)  [ROOT  v6.20]

And that was like that since the inception of groot. Until, v0.27 (released May-2020):

1.6 ms/kEvt (1.1 s for 720 kEvts)  [groot v0.27]
2.6 ms/kEvt (1.9 s for 720 kEvts)  [ROOT  v6.20]

Almost twice faster than ROOT :)

See, for more informations:

How come groot is faster than ROOT to read ROOT data?

Thanks to Go's lightweight goroutines...


groot & Go

Go is known to be very fast to compile and relatively fast to execute. But at the moment, Go binaries are usually slower than a C++ one for number crunching.

How could a Go binary be faster than a C++ one?

Reading a TTree is basically:



           0.5 ns - CPU L1 dCACHE reference
           1   ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance
           5   ns - CPU L1 iCACHE Branch mispredict
           7   ns - CPU L2  CACHE reference
          71   ns - CPU cross-QPI/NUMA best  case on XEON E5-46*
         100   ns - MUTEX lock/unlock
         100   ns - own DDR MEMORY reference
         135   ns - CPU cross-QPI/NUMA best  case on XEON E7-*
         202   ns - CPU cross-QPI/NUMA worst case on XEON E7-*
         325   ns - CPU cross-QPI/NUMA worst case on XEON E5-46*
      10,000   ns - Compress 1K bytes with Zippy PROCESS
      20,000   ns - Send 2K bytes over 1 Gbps NETWORK
     250,000   ns - Read 1 MB sequentially from MEMORY
     500,000   ns - Round trip within a same DataCenter
  10,000,000   ns - DISK seek
  10,000,000   ns - Read 1 MB sequentially from NETWORK
  30,000,000   ns - Read 1 MB sequentially from DISK
 150,000,000   ns - Send a NETWORK packet CA -> Netherlands
|   |   |   |
|   |   | ns|
|   | us|
| ms|

groot rtree Reader

With the new re-engineered rtree.Reader, groot can infer:

and thus, for each requested branch:

So when one requests entry N, everything is already in memory, ready to be used.


groot rtree

An additional concurrency axis (not yet implemented) would be to have N concurrent goroutines each requesting/handling one entry of the tree (and filling in turn the user data)...

but being already ~2x faster than ROOT isn't too bad.

Now, the same kind of optimization should also be applied to writing...

That's all folks

Thank you

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)