database/sql: provide optional way to mitigate convT2E allocations #6918

bradfitz · 2013-12-09T11:47:00Z

In debugging the performance of Camlistore start-up (which slurps the index from disk to
memory), I compared two popular MySQL drivers' allocations.  Even the one that was being
careful to not allocate (go-mysql-driver) was still allocating memory proportional to
the data size (169.55 bytes in MysQL and 153.36 allocated):

2013/12/09 03:32:55 2512532 rows, 177786156 bytes; took 2.375200197s, 945ns each, 71.38
MB/s
2013/12/09 03:32:55 allocated = 160809408

I look into where the allocations were coming from.

In the go-mysql-driver's Rows.Next (http://golang.org/pkg/database/sql/driver/#Rows)
call:

     .  153.5   597: dest[i], isNull, n, err = readLengthEncodedString(data[pos:])
....
     .      .      45d2bd: MOVQ AX,18(SP)
     .  153.5      45d2c2: CALL runtime.convT2E(SB)


It's exclusively from runtime.convT2E calls, assigning a []byte to an interface{}
(driver.Value)

160809408 bytes / 2512532 rows / 2 columns = 32 bytes allocated per scanned string
[]byte column, even when the user of database/sql is trying to reduce allocations with
sql.RawBytes.

Proposal:

Right now, the http://golang.org/pkg/database/sql/driver/#Rows Next method's dest
[]Value slice is purely an output that the driver is expected to write into.

If we change it to also be an optional input to the driver, we could supply it with
*[]byte and drivers can either work as they do today and write to dest, or new/updated
drivers can notice a *[]byte in dest and instead using that pointer to assign directly,
and signal that they've done so with a sentinel value:

   *(dest[i]) = rawByteSlice
   dest[i] = driver.SentinelValue

driver.SentinelValue can be a pointer, so it fits into the empty interface without
allocation.

The only other driver.Value type (http://golang.org/pkg/database/sql/driver/#Value) that
is bigger than a word is time.Time.  If we care about that (and perhaps we should), we
could extend this and say that instead of a *[]byte being supplied in the dest slice of
[]driver.Value, we instead populate it with *Sink pointers.

package driver
type Sink struct {
    ....
}

func (s *Sink) SetBytes(b []byte) { ... }
func (s *Sink) SetTime(t time.Time)

And then calling a Set method on a Sink replaces the need to have a sentinel value.

If we do this, all existing drivers are still compatible.  New drivers can type-assert
for the *driver.Sink and call the Set method instead.

bradfitz · 2013-12-09T11:49:57Z

Comment 1:

Er, typo: there are 169.55 MB in MySQL and Go allocated 153.36 MB to read them, even
with sql.RawBytes:
        rows, err := db.Query("SELECT k, v FROM rows")
        if err != nil {
                log.Fatal(err)
        }
        var k, v sql.RawBytes
        n := 0
        var bytes int64
        for rows.Next() {
                if err := rows.Scan(&k, &v); err != nil {
                        log.Fatal(err)
                }
                bytes += int64(len(k) + len(v))
                n++
        }
        if err := rows.Err(); err != nil {
                log.Fatal(err)
        }

julienschmidt · 2013-12-09T18:55:55Z

Comment 2:

I think a big problem is, that the row values are fetched before the destination type is
known ( http://golang.org/src/pkg/database/sql/sql.go?s=38680:38720#L1502 ).
So we don't know when to pass Sink pointers.
Maybe I understand something wrong, but if we do this for every value, this will likely
cause a lot of additional (and waste) allocations.
In my opinion the best option would be to allow the drivers to bypass the converter of
database/sql and somehow pass the args of Scan directly to the driver.
This has several benefits:
1. It allows type aware conversions:
Currently the drivers "guess" the destination type for best results. For example numeric
values are converted to int64 or float64. Date and Time values lead to problems in this
case. If the value is converted to time.Time, the value can not be scanned to string /
[]byte / sql.RawBytes, if the value is returned as []byte, it can not be scanned into a
time.Time var.
If the driver knows the destination type, it can directly convert to the correct type.
For example if the destination type is sql.RawBytes, the driver can just pass a slice of
its net buffer. The driver does not need to convert numeric values first, which the
database/sql package would convert back. This can get very expensive, e.g. when the
converter uses something like []byte(fmt.Sprintf(...)).
For date and time values, the driver can also choose the right type.
2. It is more flexible and allows to support additional types:
Currently the database/sql package is VERY restrictive. It only allows the driver.Value
types, which makes it impossible to even support uint64 with the high bit set (since it
must be converted to int64):
http://golang.org/src/pkg/database/sql/convert.go?s=2152:2306#L70
But this would also allow the driver to support additional types like complex numbers or
even database specific types.
3. It allows performance improvements and requires less allocations:
I think this point is obvious.
To be clear, this is a optimization for a rather high hanging fruit. I don't intend that
every single driver implements this. This adds responsibility to the driver to support
all types correctly. But this path is purely optional. Driver authors could still choose
the easy way and let the database/sql package convert the values.
We would still have the problem that the destination type is unknown before rows.Scan.
Therefore, this requires 2 steps: 1) In rows.Next ask the driver if more rows are
available or an error occurred 2) In rows.Scan pass the args to the driver and let the
driver copy the values into it, converting where necessary.

julienschmidt · 2013-12-09T19:17:22Z

Comment 3:

And just for the reference:
I did also a lot of benchmarking an profiling a few weeks ago. 2 MySQL drivers are
compared in our benchmark suite: https://github.com/go-sql-driver/sql-benchmark
I noticed 2 other things, which could be optimized:
For each Query a slice of all column names is requested just to determine to number of
columns for the dest slice. It would be better to just ask the driver for the number
instead: https://golang.org/cl/17580043/ (will be mailed when the tree is open
again).
The pooling logic does a lot of insertions in a list, which produces a lot (relatively)
of garbage: http://files.julienschmidt.com/public/gostuff/exec.mem.new.svg
One option would be to somehow allow the reuse of list.Elements (
http://golang.org/pkg/container/list/#Element ) in the list package.
I'm also experimenting a bit with the pooling logic, separating it from the rest of the
database/sql package, hoping, that it could be pluggable / exchangeable in a future
version (as it was requested before). Currently, this approach does not use 
container/list at all. I'll let you know on the dev mailinglist, if my experiments lead
to something useful.

dvyukov · 2013-12-10T06:27:39Z

Comment 4:

Thinking aloud, on amd64 two words have enough space for 2 pointers and 2 16-bit len/cap
fields, or probably even more, or a string pointer and 32-bit len field.

rsc · 2014-03-03T21:51:54Z

Comment 5:

Labels changed: added release-none.

griesemer · 2014-10-01T21:03:49Z

Comment 6:

Labels changed: added repo-main.

achille-roussel · 2017-06-10T10:26:25Z

I've run into this same problem while trying to optimize software that used database/sql and go-sql-driver/mysql where we use go generate to generate long argument lists for batched inserts.

The code then make a call to runtime.convT2E for each argument, which isn't ideal, in our case it ends up being hundreds of dynamic memory allocations for a single SQL query.

The Sink idea is really appealing. I'll be happy to help make it happen if a hand is needed here.

kardianos · 2017-06-10T10:50:07Z

I'd love to see a concrete proposal or two.

achille-roussel · 2017-06-15T17:45:32Z

I've been thinking more about this problem and read more of the database/sql implementation, and I'm not sure it's a solvable problem without breaking the driver API.

The problem is the even if we introduced an optimized interface between database/sql and the driver, the code has to fallback to the existing implementation to ensure backward compatibility, which would then defeat escape analysis and still cause the argument list to be dynamically allocated.

kardianos · 2017-06-15T20:57:41Z

@achille-roussel Can you give me an example of a non-backwards compatible solution? If there is ever a Go2 this would be additionally useful.

achille-roussel · 2017-06-19T18:57:21Z

The issue is passing arguments as interfaces between database/sql and the driver requires dynamically allocating the values. The only way to avoid this is to pass arguments as a concrete type (kind of the Sink idea that was originally suggested here). The API could be based on reflect.Value (if reflect.ValueOf didn't force it's argument to escape to the heap), or use some custom reflection-like value type that permits the memory optimization.

There's another issue with driver.Valuer, because calling the Value() interface{} method will cause the value it's called on and the returned value to escape to the heap. Dropping support for custom types and extending the kind of values that may be passed to the driver seems like the only way to address this problem as well. For example if the concrete value type supported arrays and struct, a postgres driver could convert those to arrays and compound types, while a mysql driver would reject those.

gopherbot · 2018-04-19T22:52:14Z

Change https://golang.org/cl/107995 mentions this issue: database/sql: add the driver.ValueScanner interface

bradfitz added accepted labels Oct 1, 2014

bradfitz self-assigned this Oct 1, 2014

bradfitz mentioned this issue Feb 9, 2015

[database/sql] At least 32 bytes of garbage allocated per column go-sql-driver/mysql#192

Open

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

kardianos self-assigned this Dec 4, 2016

achille-roussel mentioned this issue Jun 10, 2017

Optimize value conversions back and forth between driver.Value and concrete types. go-sql-driver/mysql#616

Closed

5 tasks

bradfitz removed their assignment Apr 30, 2019

zephyrtronium mentioned this issue Apr 10, 2021

cmd/compile, runtime: GOEXPERIMENT to add two non-pointer words to iface/eface #45494

Open

rsc unassigned kardianos Jun 22, 2022

seankhliao added the NeedsInvestigation label Jul 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

database/sql: provide optional way to mitigate convT2E allocations #6918

database/sql: provide optional way to mitigate convT2E allocations #6918

bradfitz commented Dec 9, 2013

bradfitz commented Dec 9, 2013

julienschmidt commented Dec 9, 2013

julienschmidt commented Dec 9, 2013

dvyukov commented Dec 10, 2013

rsc commented Mar 3, 2014

griesemer commented Oct 1, 2014

achille-roussel commented Jun 10, 2017

kardianos commented Jun 10, 2017

achille-roussel commented Jun 15, 2017

kardianos commented Jun 15, 2017

achille-roussel commented Jun 19, 2017

gopherbot commented Apr 19, 2018

database/sql: provide optional way to mitigate convT2E allocations #6918

database/sql: provide optional way to mitigate convT2E allocations #6918

Comments

bradfitz commented Dec 9, 2013

bradfitz commented Dec 9, 2013

julienschmidt commented Dec 9, 2013

julienschmidt commented Dec 9, 2013

dvyukov commented Dec 10, 2013

rsc commented Mar 3, 2014

griesemer commented Oct 1, 2014

achille-roussel commented Jun 10, 2017

kardianos commented Jun 10, 2017

achille-roussel commented Jun 15, 2017

kardianos commented Jun 15, 2017

achille-roussel commented Jun 19, 2017

gopherbot commented Apr 19, 2018