-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
database/sql: provide optional way to mitigate convT2E allocations #6918
Comments
Er, typo: there are 169.55 MB in MySQL and Go allocated 153.36 MB to read them, even with sql.RawBytes: rows, err := db.Query("SELECT k, v FROM rows") if err != nil { log.Fatal(err) } var k, v sql.RawBytes n := 0 var bytes int64 for rows.Next() { if err := rows.Scan(&k, &v); err != nil { log.Fatal(err) } bytes += int64(len(k) + len(v)) n++ } if err := rows.Err(); err != nil { log.Fatal(err) } |
I think a big problem is, that the row values are fetched before the destination type is known ( http://golang.org/src/pkg/database/sql/sql.go?s=38680:38720#L1502 ). So we don't know when to pass Sink pointers. Maybe I understand something wrong, but if we do this for every value, this will likely cause a lot of additional (and waste) allocations. In my opinion the best option would be to allow the drivers to bypass the converter of database/sql and somehow pass the args of Scan directly to the driver. This has several benefits: 1. It allows type aware conversions: Currently the drivers "guess" the destination type for best results. For example numeric values are converted to int64 or float64. Date and Time values lead to problems in this case. If the value is converted to time.Time, the value can not be scanned to string / []byte / sql.RawBytes, if the value is returned as []byte, it can not be scanned into a time.Time var. If the driver knows the destination type, it can directly convert to the correct type. For example if the destination type is sql.RawBytes, the driver can just pass a slice of its net buffer. The driver does not need to convert numeric values first, which the database/sql package would convert back. This can get very expensive, e.g. when the converter uses something like []byte(fmt.Sprintf(...)). For date and time values, the driver can also choose the right type. 2. It is more flexible and allows to support additional types: Currently the database/sql package is VERY restrictive. It only allows the driver.Value types, which makes it impossible to even support uint64 with the high bit set (since it must be converted to int64): http://golang.org/src/pkg/database/sql/convert.go?s=2152:2306#L70 But this would also allow the driver to support additional types like complex numbers or even database specific types. 3. It allows performance improvements and requires less allocations: I think this point is obvious. To be clear, this is a optimization for a rather high hanging fruit. I don't intend that every single driver implements this. This adds responsibility to the driver to support all types correctly. But this path is purely optional. Driver authors could still choose the easy way and let the database/sql package convert the values. We would still have the problem that the destination type is unknown before rows.Scan. Therefore, this requires 2 steps: 1) In rows.Next ask the driver if more rows are available or an error occurred 2) In rows.Scan pass the args to the driver and let the driver copy the values into it, converting where necessary. |
And just for the reference: I did also a lot of benchmarking an profiling a few weeks ago. 2 MySQL drivers are compared in our benchmark suite: https://github.com/go-sql-driver/sql-benchmark I noticed 2 other things, which could be optimized: For each Query a slice of all column names is requested just to determine to number of columns for the dest slice. It would be better to just ask the driver for the number instead: https://golang.org/cl/17580043/ (will be mailed when the tree is open again). The pooling logic does a lot of insertions in a list, which produces a lot (relatively) of garbage: http://files.julienschmidt.com/public/gostuff/exec.mem.new.svg One option would be to somehow allow the reuse of list.Elements ( http://golang.org/pkg/container/list/#Element ) in the list package. I'm also experimenting a bit with the pooling logic, separating it from the rest of the database/sql package, hoping, that it could be pluggable / exchangeable in a future version (as it was requested before). Currently, this approach does not use container/list at all. I'll let you know on the dev mailinglist, if my experiments lead to something useful. |
I've run into this same problem while trying to optimize software that used database/sql and go-sql-driver/mysql where we use go generate to generate long argument lists for batched inserts. The code then make a call to runtime.convT2E for each argument, which isn't ideal, in our case it ends up being hundreds of dynamic memory allocations for a single SQL query. The Sink idea is really appealing. I'll be happy to help make it happen if a hand is needed here. |
I'd love to see a concrete proposal or two. |
I've been thinking more about this problem and read more of the database/sql implementation, and I'm not sure it's a solvable problem without breaking the driver API. The problem is the even if we introduced an optimized interface between database/sql and the driver, the code has to fallback to the existing implementation to ensure backward compatibility, which would then defeat escape analysis and still cause the argument list to be dynamically allocated. |
@achille-roussel Can you give me an example of a non-backwards compatible solution? If there is ever a Go2 this would be additionally useful. |
The issue is passing arguments as interfaces between There's another issue with |
Change https://golang.org/cl/107995 mentions this issue: |
The text was updated successfully, but these errors were encountered: