Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cgo: do not let Go pointers end up in C #8310

Closed
rsc opened this issue Jul 1, 2014 · 35 comments
Closed

cmd/cgo: do not let Go pointers end up in C #8310

rsc opened this issue Jul 1, 2014 · 35 comments

Comments

@rsc
Copy link
Contributor

rsc commented Jul 1, 2014

We need some way to prevent Go pointers from crossing the boundary into C.

People write code like (calling a C function that takes a T* to fill in).

var x C.T
C.f(&x)

One option is to disallow this - a Go pointer &x is being passed to C.

Another option is to have cgo's wrapper for C.f do a shallow copy into an allocated C
pointer, then run f, then copy the data back into the original Go data, then free the
allocated C pointer, then return.

If the second option handles 99% of the pointers being passed, then we probably should
do it. On the other hand, the free might create a dangling pointer that could confuse
people. At least it's a dangling pointer into the C heap and not the Go heap. :-)
@griesemer
Copy link
Contributor

Comment 2:

Just for reference, and not meant as implementation suggestion: Java's JNI passes
pointers as "handles": A JNI-wrapper (a little stub, generated on-the fly for each call
signature) pushes the arguments on the stack again in C (reverse) order. For each Java
object, the stack address of the corresponding Java pointer is pushed. In C, that stack
address (which is immutable for the duration of the C call) acts as the handle via which
the underlying Java object is accessed. This permits GC to move objects around, but
comes at the cost of an extra (hopefully inlined) function call and indirection for each
Java data structure access from within C. It also requires the respective C code to be
written specifically with JNI in mind.

@ianlancetaylor
Copy link
Contributor

Comment 3:

I think JNI is a good example of something we don't want.

@randall77
Copy link
Contributor

Comment 4:

Another option is to pin pointers that are passed into C.f for the duration of C.f. 
This allows the C code to use raw pointers, but does not allow the C code to squirrel
the pointer away past the immediate call.
There's no way to enforce the no-squirrel rule, unfortunately.
This requires a mechanism to pin objects in the heap.  If/when we have a copying
collector this will be a pain, but might be necessary for other reasons (runtime
internals?).

@rsc
Copy link
Contributor Author

rsc commented Jul 2, 2014

Comment 5:

There's no way to enforce the no-squirrel rule, and also if we have any kind of write
barrier on pointer updates then the C code would not respect that, which could be
problematic. I think it will be much simpler in the long run if we keep C from ever
seeing a Go pointer.

@dvyukov
Copy link
Member

dvyukov commented Jul 2, 2014

Comment 6:

Updating Go objects from C code never worked. C code runs concurrently with GC. GC needs
to see consistent picture of the world. If C updates Go objects, GC corrupts heap.

@rsc
Copy link
Contributor Author

rsc commented Jul 2, 2014

Comment 7:

All the more reason to not let Go pointers end up in C.

@dominikh
Copy link
Member

dominikh commented Jul 2, 2014

Comment 8:

Does this issue affect all Go pointers? The following is a somewhat common pattern in
cgo land: `x := make([]byte, 4096); C.foo(&x[0], len(x))`

@minux
Copy link
Member

minux commented Jul 2, 2014

Comment 9:

Re #8, it will affect *all* Go pointers crossing language boundary.

@bryanturley
Copy link

Comment 10:

I am confused, Is this for an upcoming copying collector?
Would the basic read/write syscalls need to be double buffered?
I think the problem isn't stated clearly.

@ianlancetaylor
Copy link
Contributor

Comment 11:

One goal is to make it possible to write a copying collector.  Another goal is to avoid
the possibility of the Go garbage collector collecting a pointer that has been passed to
C code.

@gopherbot
Copy link

Comment 12:

Please keep in mind leveldb, sqlite etc where one might actually want to store binary
blobs of data that e.g. came from a network socket. There has to be a better way than an
extra copy round for everything, e.g. for when you know the data pointed to does not
contain further pointers.

@andlabs
Copy link
Contributor

andlabs commented Aug 3, 2014

Comment 13:

The copying proposal would be bad for code like my ui package, which wants to store a Go
pointer on the C side for the purposes of giving it back to Go when an event comes in.
dominikh says he also has callback functions which work similarly.
I don't know how you intend to deal with this,  unless it's something through
unsafe.Pointer, but then you would need to use unsafe.Pointer for C-side types allocated
in Go, which is also ugly.

@alexbrainman
Copy link
Member

Comment 14:

@pietro10, windows standard packages pass pointers to Go objects into Windows kernel in
a few places too. So the situation is known.
Alex

@RLH
Copy link
Contributor

RLH commented Aug 4, 2014

Comment 15:

Passing a Go heap pointer to C can break today's 1.3 GC which does not scan
C stacks / variables.

@ianlancetaylor
Copy link
Contributor

Comment 16:

In the long run there just isn't any way that we can support passing a Go pointer into
C.  We want to make it possible to write a moving garbage collector, but that means that
the garbage collector has to be able to change all pointers.  The GC won't be able to
see pointers stored in C code.  So we can't permit that.
To pass a pointer into C and get it back you will have to do something like use a
map[uintptr]*X.  When you want to pass the pointer down, convert to uintptr and store
the pointer in the map.  When you get it back, use the map to get the real Go pointer.

@alexbrainman
Copy link
Member

Comment 17:

Ian, would you be kind enough to show small program to make it clear for me? Thank you.
Alex

@ianlancetaylor
Copy link
Contributor

Comment 18:

var ptrs map[uintptr]*T
func F(p *T) {
        u := uintptr(unsafe.Pointer(p))
        if ptrs[u] != nil {
                panic("need more sophisticated scheme")
        }
        ptrs[u] = p
        C.store(u)
}
func G() *T {
        u := C.fetch()
        return ptrs[u]
}

@alexbrainman
Copy link
Member

Comment 19:

Thank you for the code, Ian. But I don't see how your proposal helps with original issue
raised by rsc:
var x C.T
C.f(&x)
What will you do here?
Also, there is still issue of Go objects been moved by GC. For example, in windows net
package we pass Go buffers (just like in #8) to kernel. The syscall returns immediately,
while kernel proceed to write to passed buffer. We keep pointer to buffer in Go, until
kernel finished writing (just like your proposal suggests). But that approach will fail,
if GC will move the buffer.
Alex

@minux
Copy link
Member

minux commented Aug 5, 2014

Comment 20:

For those kind of requirements, we probably should just allocate the
buffer from C land, and not from the Go heap.

@ianlancetaylor
Copy link
Contributor

Comment 21:

The point of this issue is to resolve exactly what we will promise people who use cgo,
kernel syscalls, and Windows functions.  See the original issue description.

@alexbrainman
Copy link
Member

Comment 22:

@minux, allocating the buffer from C land will work. But, I think, it would be expensive
to do it as is. We'll end-up writing custom "external" memory allocators for these.
Alex

@ianlancetaylor
Copy link
Contributor

Comment 23:

We don't literally have to allocate by calling C code.  We just have to allocate using
an allocator that the Go GC does not examine.
I think the open question here is how much of that cgo can or should do for you
automatically.

@minux
Copy link
Member

minux commented Aug 5, 2014

Comment 24:

perhaps we can use sync.Pool to manage buffers allocated on C heaps, then they
could be made pretty cheap to use.
One main downside is that we can't return C buffers as is to Go land, otherwise
it becomes impossible to manage their life time and might lead to memory leaks,
and this necessitates at least one copying from C buffer to Go buffer.

@alexbrainman
Copy link
Member

Comment 25:

> We don't literally have to allocate by calling C code.  We just have to allocate using
an allocator that the Go GC does not examine.
If this is provided by Go runtime or similar, then it should work just fine. I assume we
are talking about memory that needs to live after syscall returns. You would have to
find a separate solution for any code that just uses Go memory during syscall, just like:
var x C.T
C.f(&x)
Alex

@btracey
Copy link
Contributor

btracey commented Aug 5, 2014

Comment 26:

I don't understand what is meant by a pointer "crossing the boundary" to C. If Go
maintains the pointer while C is doing stuff with the underlying data, is that okay? In
github.com/gonum/blas/cblas [0], we send []float64 to C by sending unsafe.Pointer(&x[0])
to C. The x variable stays alive on the Go side, but the values it points to are
modified by the C code. Is the plan to still support this?
[0] https://github.com/gonum/blas/blob/master/cblas/blas.go

@ianlancetaylor
Copy link
Contributor

Comment 27:

tracey.brendan: see comment #16.  We haven't yet decided what the exact rules will be,
but the general case of passing a pointer to C will not work even if the pointer is
maintained on the Go side.  I don't know whether the use case you describe will work or
not, but it's clearly an important case to keep in mind.

@btracey
Copy link
Contributor

btracey commented Aug 5, 2014

Comment 28:

Okay, thanks Ian. I wasn't sure if "into C" meant gone from Go's eyes, or at all.

@andlabs
Copy link
Contributor

andlabs commented Aug 5, 2014

Comment 29:

In my opinion map[token]pointer seems really inelegant; I tried to actually remove
similar (map[C pointer]Go pointer with more data) from package ui's rewrite (redo/
folder). I'm willing to hear counterarguments, though.
Also by moving GC do you mean only stack pointers will move or will heap pointers move
too?

@minux
Copy link
Member

minux commented Aug 6, 2014

Comment 30:

stack pointers are already moving with the copying stack in 1.3.
moving GC means that the heap pointers can also move (e.g.
compacting GC)

@gopherbot
Copy link

Comment 31 by voidlogic7:

This is going to break a huge number of existing cgo bindings. I understand this is
necessary to support things like copying/compacting GC... but in light how much existing
(and in production) Go software this is going to break I imagine whatever release this
becomes reality in, many people will be stuck on the previous release quite a while as
they scramble to replace offending dependencies.
Since the common case is buffers ([]byte, etc) being passed to C, filled and handed
back, what about introducing a cgo function to make this situation easier:
buf := C.make([]byte, 2048)
defer C.release(buf)
C.someFunction((...)buf.Data)
C.make would be like make, except the GC would be unaware of the resulting memory
(different memory arena from normal GC'd memory?).
C.release would be like 'free' but rather then freeing memory it just makes the GC aware
of it and eligible for later collection. (These functions are meant to be Go functions
so the above code only crosses the boundary once. Also notice no extra copying is
necessary to go on to use buf inside of Go objects [unless the GC decides to copy a
"released" slice into the normal GC space later of course])
I hope this would make it trivial to "fix" most broken cgo dependencies and at the same
time stay out of the way of GC advancement.
The alternative to something like this is either making two cgo calls (one to perform
the action and make the allocation) and another to free the buffer later OR to create
some kind of pooling object that allocates the pool on the C side and returns batches
later on the Go side.
This solution isn't perfect but I think its better than the alternatives. In some ways
this is like pinning without the burdons of supporting arbitrary one of pinning in the
primary GC arena.

@rsc
Copy link
Contributor Author

rsc commented Aug 12, 2014

Comment 32:

Disabling comments until we post our new proposal. We are still working on the wording.

Labels changed: added restrict-addissuecomment-commit.

@griesemer
Copy link
Contributor

Comment 33:

Labels changed: added repo-main.

@golang golang locked and limited conversation to collaborators Dec 8, 2014
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc removed accepted labels Apr 14, 2015
@ianlancetaylor
Copy link
Contributor

See also #12116 .

@ianlancetaylor
Copy link
Contributor

Proposal at #12416 .

@ianlancetaylor
Copy link
Contributor

Proposal #12416 was implemented. Closing this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests