This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: fix for 764437 AF_UNIX socket special linux socket names
Type: Stage:
Components: Extension Modules Versions: Python 2.5
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: arigo Nosy List: arigo, irmen, jlgijsbers, nnorwitz
Priority: normal Keywords: patch

Created on 2004-11-07 18:18 by irmen, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
socketmodule.patch irmen, 2004-11-07 18:18 patch for src/Modules/socketmodule.c
linux_socket_abstract_namespace.diff arigo, 2006-04-12 13:19 Another patch (try #2)
socket.diff nnorwitz, 2006-04-14 05:01 nn v1
socket.diff nnorwitz, 2006-04-19 06:29 nn v2
Messages (15)
msg47237 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2004-11-07 18:18
The patch addresses two things:
1. the socket name (obtained using getsockname() or
getpeername() ) will now be correct in case of the
special linux socket names (starting with 0 byte)
2. the socket module now has a new constant:
UNIX_PATH_MAX , that can be used to build
correctly-sized path names.
msg47238 - (view) Author: Johannes Gijsbers (jlgijsbers) * (Python triager) Date: 2004-11-07 20:11
Logged In: YES 
user_id=469548

As you noted on IRC: this needs a new regression test.
msg47239 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2004-11-08 11:05
Logged In: YES 
user_id=129426

Not only does it need a new regression test, but if it is
accepted as-is, the socket module docs need to be updated
too (mention the new constant for use with the special
naming case)
msg47240 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-04-12 12:57
Logged In: YES 
user_id=4771

I have made an independent patch for this, based on another
interpretation of "man 7 unix": all the names in the Linux
abstract namespace don't have to be UNIX_PATH_MAX characters
long; instead, their length is remembered by the system and
available as 'addrlen', which is correctly passed around in
socketmodule.c.

The attached patch allows these special names of any length
to be set and read, and the length is preserved.  It does
not expose UNIX_PATH_MAX because it's not an essential
piece of information.  I've written a small test too.

I don't think the docs need updating for this patch, beyond
a mention in the NEWS file.
msg47241 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-04-12 13:19
Logged In: YES 
user_id=4771

Added "#ifdef linux" in my patch, following the idea from
the original patch.
msg47242 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-04-13 04:09
Logged In: YES 
user_id=33168

Armin, do you want someone to review this?  Is it
small/straightforward enough that it could be checked in? 
Just trying to figure out if there's anything I should do.
msg47243 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-04-13 12:10
Logged In: YES 
user_id=4771

Hi Neal.  Although the patch is small I'd appreciate a
quick review, yes.  Also, Irmen, it would be nice to know
if my solution -- trying to preserve the length of the
addresses -- suits your original use case as well.
msg47244 - (view) Author: Irmen de Jong (irmen) (Python triager) Date: 2006-04-13 14:31
Logged In: YES 
user_id=129426

Armin, I don't have access to a Linux machine anymore so I
can't really comment on this issue anymore, sorry.
Also this patch was not made for a specific use case I had,
it was a patch for an open bug filed by someone else (bug
764437).
Previous discussion can be found there.
msg47245 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-04-14 05:01
Logged In: YES 
user_id=33168

Wow!  Now I see why you wanted review.

Modules/socketmodule.c:

In the first #ifdef linux section (968+), why the calc for
addrlen?  Isn't the len just sizeof(a->sun_path) and that
can be passed to FromStringAndSize()?

In the second #ifdef linux section (1098+), should if (len >
sizeof ...) be >= like the other check?

It's probably easier to show what I mean in code.  Attached
is another patch.  I didn't test it.  There's a big comment
that we should address in either patch about a negative
length issue.
msg47246 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-04-14 08:33
Logged In: YES 
user_id=4771

You're missing the point of my patch.  It is to preserve
the originally specified length of the name.  The names
'\x00abc', '\x00abc\x00' and '\x00abc\x00\x00' are all
different for the Linux IP stack, as far as I can tell.
In operations like bind() we specify the length of the
name by setting addrlen to the length of the header
plus the number of bytes actually used in the sun_path
array.  The kernel code remembers this length, and
passes it back in &addrlen to functions like
getsockaddr().  That's why I compute the number of
bytes that are really meaningful as the addrlen minus
the headers, instead of just UNIX_PATH_MAX.

About the second #ifdef, I really meant the > to be >
and not >=, because it is acceptable to pass a name
containing exactly UNIX_PATH_MAX bytes -- the abstract
namespace names don't have to be followed by a
terminating '\x00'.  By contrast, the regular names
must not be longer than UNIX_PATH_MAX-1 characters to
allow for the extra '\x00'.

In some sense, the Linux abstract namespace is rather
strange from the point of view of C, but not if you
think about it as you would think about Python strings.
Indeed, apart from the requirement that they start
with '\x00', these names have an intrinsic length
which is not related to any embedded '\x00'.  It is
stored together with the name in some kernel
structures, but outside the sun_path array of bytes.

But enough theory: the patch you propose don't pass
its own new test on my machine because of this :-)
Does it pass on yours?

I'm not too concerned about addrlen ending up shorter
than the size of the headers.  That would be a Linux
kernel bug.
msg47247 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-04-18 05:42
Logged In: YES 
user_id=33168

My man page (man 7 unix) says:

sun_family  always contains AF_UNIX.  sun_path contains the
zero-terminated pathname of the socket in the file system. 
 If  sun_path  starts with  a zero byte it refers to the
abstract namespace maintained by the Unix protocol module. 
The socket's address in this namespace is  given by  the
rest of the bytes in sun_path.  Note that names in the
abstract namespace are not zero-terminated.

I read that (rest of the bytes in sun_path) to mean that if
the first byte is NUL, the address is all the remaining
bytes in sun_path (ie, sun_path is treated as binary).  Also
since the abstract namespace is not NUL-terminated.  I don't
know if the man page is correct.  Do you understand the
manpage the way I do?
msg47248 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-04-18 11:52
Logged In: YES 
user_id=4771

The man page is indeed confusing.  I was relying on
small experiments.  On my Linux box at least, the
address does not have to occupy the whole of the
sun_path array.  You can really do the following:
set sun_path to '\x00!\x00\x00\x00\x00\x00\x00...'
then call bind() with an addrlen that covers only the
first two characters '\x00!', and then call bind() on
another socket with the same sun_path but an addrlen
that covers the first three characters '\x00!\x00'.
The two addresses don't conflict and are regarded as
different.

Conversely, if I try to bind() two sockets with the
same addrlen and a sun_path that differs after the
number of bytes included in the addrlen, I get an
"Address already in use", thus showing that addrlen
is really meaningful.

Another hint is that if you let the system pick a
default local address for your socket, as e.g. is done
by s=socket(AF_UNIX); s.connect(...), then getsockname()
will return a name that looks like '\x00000003'.  More
precisely, the addrlen is much shorter than the maximum
that would correspond to the whole sun_path array.

A more practical example: it is bogus to ignore the
addrlen and always return the whole sun_path array,
because if we did that, then the following kind of code
would fail:

  >>> s1 = socket(AF_UNIX); s2 = socket(AF_UNIX)
  >>> s1.bind('\x00hello')
  >>> s2.connect(s1.getsockname())

because getsockname() would return a string that is
108 characters long, and the connect would fail.
(Just try it out!  It gives 'Connection refused'.)
msg47249 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-04-19 06:29
Logged In: YES 
user_id=33168

I guess I really need to test this on my box.  I don't know
if the man page is correct or if I'm reading it correct. 
BTW what linux are you running (kernel version/distro)?  I'm
2.6.8-gentoo-r4.
msg47250 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-04-19 06:33
Logged In: YES 
user_id=33168

Damn, I wasn't finished.  I uploaded a new patch.  The only
difference is that I added 2 test cases.  I ran this under
valgrind and there were no errors.  I was concerned about
the boundaries, particularly given how poor the man page is.

Are anonymous namespaces supported on other platforms?

I'm fine with this going in, but it needs an update to
Misc/NEWS and Doc/lib/libsocket.tex.  Hmmm, there's nothing
remotely close in the doc currently about AF_UNIX.  I'll
leave it to your best judgment Armin.
msg47251 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2006-04-19 11:51
Logged In: YES 
user_id=4771

Checked in as r45560 with a Misc/NEWS entry.
No update to the docs (they basically refer to the
platform's socket documentation for this kind of
details).

I don't know if other platforms support something
similar; the man page documents this extension as
Linux-specific.  I wouldn't worry about it unless
somebody shows up and points to another platform...
History
Date User Action Args
2022-04-11 14:56:08adminsetgithub: 41144
2004-11-07 18:18:12irmencreate