Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: provide optional way to mitigate convT2E allocations #6918

Open
bradfitz opened this issue Dec 9, 2013 · 12 comments
Open

database/sql: provide optional way to mitigate convT2E allocations #6918

bradfitz opened this issue Dec 9, 2013 · 12 comments
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Dec 9, 2013

In debugging the performance of Camlistore start-up (which slurps the index from disk to
memory), I compared two popular MySQL drivers' allocations.  Even the one that was being
careful to not allocate (go-mysql-driver) was still allocating memory proportional to
the data size (169.55 bytes in MysQL and 153.36 allocated):

2013/12/09 03:32:55 2512532 rows, 177786156 bytes; took 2.375200197s, 945ns each, 71.38
MB/s
2013/12/09 03:32:55 allocated = 160809408

I look into where the allocations were coming from.

In the go-mysql-driver's Rows.Next (http://golang.org/pkg/database/sql/driver/#Rows)
call:

     .  153.5   597: dest[i], isNull, n, err = readLengthEncodedString(data[pos:])
....
     .      .      45d2bd: MOVQ AX,18(SP)
     .  153.5      45d2c2: CALL runtime.convT2E(SB)


It's exclusively from runtime.convT2E calls, assigning a []byte to an interface{}
(driver.Value)

160809408 bytes / 2512532 rows / 2 columns = 32 bytes allocated per scanned string
[]byte column, even when the user of database/sql is trying to reduce allocations with
sql.RawBytes.

Proposal:

Right now, the http://golang.org/pkg/database/sql/driver/#Rows Next method's dest
[]Value slice is purely an output that the driver is expected to write into.

If we change it to also be an optional input to the driver, we could supply it with
*[]byte and drivers can either work as they do today and write to dest, or new/updated
drivers can notice a *[]byte in dest and instead using that pointer to assign directly,
and signal that they've done so with a sentinel value:

   *(dest[i]) = rawByteSlice
   dest[i] = driver.SentinelValue

driver.SentinelValue can be a pointer, so it fits into the empty interface without
allocation.

The only other driver.Value type (http://golang.org/pkg/database/sql/driver/#Value) that
is bigger than a word is time.Time.  If we care about that (and perhaps we should), we
could extend this and say that instead of a *[]byte being supplied in the dest slice of
[]driver.Value, we instead populate it with *Sink pointers.

package driver
type Sink struct {
    ....
}

func (s *Sink) SetBytes(b []byte) { ... }
func (s *Sink) SetTime(t time.Time)

And then calling a Set method on a Sink replaces the need to have a sentinel value.

If we do this, all existing drivers are still compatible.  New drivers can type-assert
for the *driver.Sink and call the Set method instead.
@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 9, 2013

Comment 1:

Er, typo: there are 169.55 MB in MySQL and Go allocated 153.36 MB to read them, even
with sql.RawBytes:
        rows, err := db.Query("SELECT k, v FROM rows")
        if err != nil {
                log.Fatal(err)
        }
        var k, v sql.RawBytes
        n := 0
        var bytes int64
        for rows.Next() {
                if err := rows.Scan(&k, &v); err != nil {
                        log.Fatal(err)
                }
                bytes += int64(len(k) + len(v))
                n++
        }
        if err := rows.Err(); err != nil {
                log.Fatal(err)
        }

@julienschmidt
Copy link
Contributor

Comment 2:

I think a big problem is, that the row values are fetched before the destination type is
known ( http://golang.org/src/pkg/database/sql/sql.go?s=38680:38720#L1502 ).
So we don't know when to pass Sink pointers.
Maybe I understand something wrong, but if we do this for every value, this will likely
cause a lot of additional (and waste) allocations.
In my opinion the best option would be to allow the drivers to bypass the converter of
database/sql and somehow pass the args of Scan directly to the driver.
This has several benefits:
1. It allows type aware conversions:
Currently the drivers "guess" the destination type for best results. For example numeric
values are converted to int64 or float64. Date and Time values lead to problems in this
case. If the value is converted to time.Time, the value can not be scanned to string /
[]byte / sql.RawBytes, if the value is returned as []byte, it can not be scanned into a
time.Time var.
If the driver knows the destination type, it can directly convert to the correct type.
For example if the destination type is sql.RawBytes, the driver can just pass a slice of
its net buffer. The driver does not need to convert numeric values first, which the
database/sql package would convert back. This can get very expensive, e.g. when the
converter uses something like []byte(fmt.Sprintf(...)).
For date and time values, the driver can also choose the right type.
2. It is more flexible and allows to support additional types:
Currently the database/sql package is VERY restrictive. It only allows the driver.Value
types, which makes it impossible to even support uint64 with the high bit set (since it
must be converted to int64):
http://golang.org/src/pkg/database/sql/convert.go?s=2152:2306#L70
But this would also allow the driver to support additional types like complex numbers or
even database specific types.
3. It allows performance improvements and requires less allocations:
I think this point is obvious.
To be clear, this is a optimization for a rather high hanging fruit. I don't intend that
every single driver implements this. This adds responsibility to the driver to support
all types correctly. But this path is purely optional. Driver authors could still choose
the easy way and let the database/sql package convert the values.
We would still have the problem that the destination type is unknown before rows.Scan.
Therefore, this requires 2 steps: 1) In rows.Next ask the driver if more rows are
available or an error occurred 2) In rows.Scan pass the args to the driver and let the
driver copy the values into it, converting where necessary.

@julienschmidt
Copy link
Contributor

Comment 3:

And just for the reference:
I did also a lot of benchmarking an profiling a few weeks ago. 2 MySQL drivers are
compared in our benchmark suite: https://github.com/go-sql-driver/sql-benchmark
I noticed 2 other things, which could be optimized:
For each Query a slice of all column names is requested just to determine to number of
columns for the dest slice. It would be better to just ask the driver for the number
instead: https://golang.org/cl/17580043/ (will be mailed when the tree is open
again).
The pooling logic does a lot of insertions in a list, which produces a lot (relatively)
of garbage: http://files.julienschmidt.com/public/gostuff/exec.mem.new.svg
One option would be to somehow allow the reuse of list.Elements (
http://golang.org/pkg/container/list/#Element ) in the list package.
I'm also experimenting a bit with the pooling logic, separating it from the rest of the
database/sql package, hoping, that it could be pluggable / exchangeable in a future
version (as it was requested before). Currently, this approach does not use 
container/list at all. I'll let you know on the dev mailinglist, if my experiments lead
to something useful.

@dvyukov
Copy link
Member

dvyukov commented Dec 10, 2013

Comment 4:

Thinking aloud, on amd64 two words have enough space for 2 pointers and 2 16-bit len/cap
fields, or probably even more, or a string pointer and 32-bit len field.

@rsc
Copy link
Contributor

rsc commented Mar 3, 2014

Comment 5:

Labels changed: added release-none.

@griesemer
Copy link
Contributor

Comment 6:

Labels changed: added repo-main.

@achille-roussel
Copy link
Contributor

I've run into this same problem while trying to optimize software that used database/sql and go-sql-driver/mysql where we use go generate to generate long argument lists for batched inserts.

The code then make a call to runtime.convT2E for each argument, which isn't ideal, in our case it ends up being hundreds of dynamic memory allocations for a single SQL query.

The Sink idea is really appealing. I'll be happy to help make it happen if a hand is needed here.

@kardianos
Copy link
Contributor

I'd love to see a concrete proposal or two.

@achille-roussel
Copy link
Contributor

I've been thinking more about this problem and read more of the database/sql implementation, and I'm not sure it's a solvable problem without breaking the driver API.

The problem is the even if we introduced an optimized interface between database/sql and the driver, the code has to fallback to the existing implementation to ensure backward compatibility, which would then defeat escape analysis and still cause the argument list to be dynamically allocated.

@kardianos
Copy link
Contributor

@achille-roussel Can you give me an example of a non-backwards compatible solution? If there is ever a Go2 this would be additionally useful.

@achille-roussel
Copy link
Contributor

The issue is passing arguments as interfaces between database/sql and the driver requires dynamically allocating the values. The only way to avoid this is to pass arguments as a concrete type (kind of the Sink idea that was originally suggested here). The API could be based on reflect.Value (if reflect.ValueOf didn't force it's argument to escape to the heap), or use some custom reflection-like value type that permits the memory optimization.

There's another issue with driver.Valuer, because calling the Value() interface{} method will cause the value it's called on and the returned value to escape to the heap. Dropping support for custom types and extending the kind of values that may be passed to the driver seems like the only way to address this problem as well. For example if the concrete value type supported arrays and struct, a postgres driver could convert those to arrays and compound types, while a mysql driver would reject those.

@gopherbot
Copy link

Change https://golang.org/cl/107995 mentions this issue: database/sql: add the driver.ValueScanner interface

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants