This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Add support for reading records with arbitrary separators to the standard IO stack
Type: enhancement Stage: resolved
Components: IO, Library (Lib) Versions: Python 3.4
process
Status: closed Resolution: later
Dependencies: Superseder:
Assigned To: Nosy List: Douglas.Alan, abarnert, akira, amaury.forgeotdarc, benjamin.peterson, calestyo, eric.araujo, facundobatista, georg.brandl, jcon, maggyero, martin.panter, ncoghlan, nessus42, pconnell, pitrou, r.david.murray, ralph.corderoy, rhettinger, wolma, ysj.ray
Priority: normal Keywords: patch

Created on 2005-02-26 07:24 by ncoghlan, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pep-newline.txt abarnert, 2014-07-21 00:32 draft PEP for expanding the newline argument
pep-peek.txt abarnert, 2014-07-21 00:33 draft PEP for adding IOBase.peek, making this easier for end users to solve
io-newline-issue1152248.patch akira, 2014-07-26 02:06 Added support for alternative newlines in _pyio.TextIOWrapper. Updated documentation. Added more io tests. No C implementation. No implemention for binary files.
io-newline-issue1152248-2.patch akira, 2014-07-26 17:02 Reuploaded the patch so that it applies cleanly on the current tip review
Messages (43)
msg61179 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2005-02-26 07:24
There is no canonical way to iterate through a file on
chunks *other* than whole lines without reading the
whole file into memory.

Allowing the separator to be specified as an argument
to file.readlines and file.xreadlines would greatly
simplify the task.

See here for an example interface of the useful options:
http://mail.python.org/pipermail/python-list/2005-February/268482.html
msg61180 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2005-02-26 07:38
Logged In: YES 
user_id=1188172

I don't know whether (x)readlines is the right place, since
you are _not_ operating on lines.

What about (x)readchunks?
msg61181 - (view) Author: Douglas Alan (nessus42) Date: 2005-02-28 18:57
Logged In: YES 
user_id=401880

In reply to birkenfeld, I'm not sure why you don't want to
call lines separated with an alternate line-separation
string "lines", but if you want to call them something else,
I should think they should be called "records" rather than
"chunks".

|>oug
msg61182 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2005-06-27 04:25
Logged In: YES 
user_id=80475

The OPs request is not a non-starter.  There is a proven 
precedent in AWK which allows programmer specifiable record 
separators.
msg61183 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2005-06-27 09:29
Logged In: YES 
user_id=1038590

As Douglas Alan's sample implementation (and his second
attempt [1]) show, getting this right (and reasonably
efficient) is actually a non-trivial exercise. Leveraging
the existing readlines infrastructure is an idea worth
considering.

[1]
http://mail.python.org/pipermail/python-list/2005-February/268547.html
msg61184 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2005-06-27 11:22
Logged In: YES 
user_id=44345

Seems the most likely place you'd want to use this is to select a non-
native line ending in a situation where you didn't want to use universal
newlines (select \r as a line ending on Unix, for example, and allow
\n to just be another character).  In that case they'd clearly still be
lines, so embellishing the normal line reading machinery without
adding a new method would be most appropriate.
msg63060 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-02-27 02:41
Raymond disapproved it, Skip discouraged it, and Nick didn't push it any
more, all more than two years ago.

Nick, please, if you feel this is worthwhile, raise the discussion in
python-dev.
msg63067 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-02-27 08:48
For the record, I thought it was a reasonable request.

AWK has a similar feature.  The AWK book shows a number of example 
uses.  Google's codesearch shows that the feature does get used in the 
field:  http://www.google.com/codesearch?q=lang%3Aawk+RS&hl=en

I think this request should probably be kept open.
msg63068 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-02-27 11:08
Sorry, I misunderstood you. I assign this to myself to give it a try.
msg63134 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-02-29 11:58
The mail.python.org link I posted previously is broken. Here's an
updated link to the relevant c.l.p. thread:
http://mail.python.org/pipermail/python-list/2005-February/310020.html

From my point of view, I still think it's an excellent idea and would be
happy to review a patch, but I'm unlikely to get around to implementing
it myself.

Also keep in mind that we now have the option of doing this only for the
new io module in Python 3.0 - it may be easier to do that and implement
something in pure Python rather than having to deal with the 2.x file
implementation.

(P.S. I found the double negative in Raymond's original comment a little
tricky to parse even as a native English speaker. I would also take
Skip's comment as merely discouraging adding a completely new method
rather than the original idea)
msg64084 - (view) Author: Facundo Batista (facundobatista) * (Python committer) Date: 2008-03-19 18:52
I took a look at it...

It's not as not-complicated as I original thought. 

The way would be to adapt the Py_UniversalNewlineFread() function to
*not* process the normal separators (like \n or \r), but the passed one.

A critical point would be to handle more-than-1-byte characters... I
concur with Nick that this would better suited for Py3k.

So, I'm stepping down from this, and flagging it for that version.
msg82188 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2009-02-15 23:58
Any further work on this should wait until the io-in-c branch has landed
(or at least be based on that branch).
msg87801 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2009-05-15 09:18
> cat temp
this is$#a weird$#file$#
> ./python
Python 3.1b1+ (py3k:72632:72633M, May 15 2009, 05:11:27)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> f = open('temp', newline='$#')
[50354 refs]
>>> f.readlines()
['this is$#', 'a weird$#', 'file$#', '\n']

All I did was comment out the 'newline' argument validity check in textio.c.
msg87802 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2009-05-15 10:17
While RDM's quick test is encouraging, I think one of the key things is
going to be developing tests for the various cases:

- binary mode, single byte line ending
- binary mode, multi-byte line ending
- text mode, single byte single char line ending*
- text mode, multi-byte single char line ending
- text mode, multiple char line ending

The text mode tests would need to cover a variety of encodings (e.g.
ASCII, latin-1, UTF-8, UTF-16, UTF-32 and maybe something like koi8-r
and/or some of the CJK codecs).

*if applicable to codec under test
msg87803 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-15 11:13
-1 on this idea. readlines() exists precisely because line endings are
special when it comes to text IO (because of the various platform
differences).

If you want to split on any character, you can just use read() followed
by split(). No need to graft additional complexity on the file IO classes.
msg87805 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-15 11:24
And it's certainly not easy to do correctly :)
msg87806 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-15 11:25
Uh, trying again to remove the keyword :-(
msg87807 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-15 11:34
Ok, let me qualify my position a bit:
- -1 for abusing the newline parameter
- -1 for abusing readlines()
- +0 on an additional method ("readchunks" was suggested) which does the
splitting, either on a single character or on a string

Please bear in mind the latter should involve, for each of the C and
Python implementations:
- a generic unoptimized version for BufferedIOBase
- a generic unoptimized version for TextIOBase
- an optimized version for BufferedReader/BufferedRandom
- an optimized version for TextIOWrapper

However, it is certainly an interesting task for someone wanting to play
with C code, optimizations, etc.
msg87808 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2009-05-15 11:46
I agree with Antoine - given that the newlines parameter now deals with
Skip's alternate line separator use case, a new method "readrecords"
that takes a mandatory record separator makes more sense than using
readlines to read things that are not lines. (of course, taking the
alternate line ending use case away also reduces the total number of use
cases for the new method).

Note that the problem with the read()+split() approach is that you
either have to read the whole file into memory (which this RFE is trying
to avoid) or you have to do your own buffering and so forth to split
records as you go. Since the latter is both difficult to get right and
very similar to what the IO module already has to do for readlines(), it
makes sense to include the extra complexity there.
msg87817 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-15 13:07
> Note that the problem with the read()+split() approach is that you
> either have to read the whole file into memory (which this RFE is trying
> to avoid) or you have to do your own buffering and so forth to split
> records as you go. Since the latter is both difficult to get right and
> very similar to what the IO module already has to do for readlines(), it
> makes sense to include the extra complexity there.

I wonder how often this use case happens though. Usually you first split
on lines, and only then you split on another character or string (think
CSV files, HTTP headers, etc.).

When you don't split on lines, conversely, you probably have a binary
format, and binary formats have more efficient ways of chunking (for
example, a couple of bytes at the beginning indicating the length of the
chunk).
msg87823 - (view) Author: Douglas Alan (nessus42) Date: 2009-05-15 17:46
Antoine Pitrou <report@bugs.python.org> wrote:

> Nick Coghlan <ncoghlan@gmail.com> added the comment:

> > Note that the problem with the read()+split() approach is that you
> > either have to read the whole file into memory (which this RFE is 
trying
> > to avoid) or you have to do your own buffering and so forth to split
> > records as you go. Since the latter is both difficult to get right 
and
> > very similar to what the IO module already has to do for 
readlines(), it
> > makes sense to include the extra complexity there.

> I wonder how often this use case happens though.

Every day for me.  The reason that I originally brought up this request
some years back on comp.lang.python was that I wanted to be able to use
Python easily like I use the xargs program.

E.g.,

   find -type f -regex 'myFancyRegex' -print0 | stuff-to-do-on-each-
file.py

With "-print0" the line separator is chaged to null, so that you can
deal with filenames that have newlines in them.

("find" and "xargs" traditionally have used newline to separate files,
but that fails in the face of filenames that have newlines in them, so
the -print0 argument to find and the "-0" argument to xargs were
thankfully eventually added as a fix for this issue.  Nulls are not
allowed in filenames.  At least not on Unix.)

> When you don't split on lines, conversely, you probably have a binary
> format,

That's not true for the daily use case I just mentioned.

|>ouglas

P.S. I wrote my own version of readlines, of course, as the archives of
comp.lang.python will show.  I just don't feel that everyone should be
required to do the same, when this is the sort of thing that sysadmins
and other Unix-savy folks are wont to do on a daily basis.

P.P.S. Another use case is that I often end up with files that have
beeen transferred back and forth between Unix and Windows and
god-knows-what-else, and the newlines end up being some weird mixture of
carriage returns and line feeds (and sometimes some other stray
characters such as "=20" or somesuch) that many programs seem to have a
hard time recognizing as newlines.
msg109038 - (view) Author: Ralph Corderoy (ralph.corderoy) Date: 2010-07-01 10:05
Google has led me here because I'm trying to see how to process find(1)'s -print0 output with Python.  Perl's -0 option and $/ variable makes this trivial.

    find -name '*.orig' -print0 | perl -n0e unlink

awk(1) has its RS, record separator, variable too.  There's a clear need, and it should also be possible to modify or re-open sys.stdin to change the existing separator.
msg109098 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-02 10:41
Ralph, core developers have not rejected this idea. It needs a patch now (even rough) to get the discussion further.
msg109117 - (view) Author: Douglas Alan (Douglas.Alan) Date: 2010-07-02 17:31
Until this feature gets built into Python, you can use a Python-coded generator such as this one to accomplish the same effect:

def fileLineIter(inputFile,
                 inputNewline="\n",
                 outputNewline=None,
                 readSize=8192):
   """Like the normal file iter but you can set what string indicates newline.
   
   The newline string can be arbitrarily long; it need not be restricted to a
   single character. You can also set the read size and control whether or not
   the newline string is left on the end of the iterated lines.  Setting
   newline to '\0' is particularly good for use with an input file created with
   something like "os.popen('find -print0')".
   """
   if outputNewline is None: outputNewline = inputNewline
   partialLine = ''
   while True:
       charsJustRead = inputFile.read(readSize)
       if not charsJustRead: break
       partialLine += charsJustRead
       lines = partialLine.split(inputNewline)
       partialLine = lines.pop()
       for line in lines: yield line + outputNewline
   if partialLine: yield partialLine
msg111152 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2010-07-22 06:44
This fileLineIter function looks like a good recipe to me. Can we close the issue then?
msg111168 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-22 11:42
A recipe in the comments on a tracker item isn't enough reason to close the RFE, no.

An entry on the cookbook with a pointer from the docs might be sufficient, although I'm still not averse to the idea of an actual readrecords method (with appropriate tests).
msg111177 - (view) Author: ysj.ray (ysj.ray) Date: 2010-07-22 14:32
I think it's a good idea adding a keyword argument to specify the separator of readlines().

I believe most people can accept the universal meaning of "line", which has similar meaning of "record", that is a chunk data, maybe from using line separators other than '\n' in perl, or akw, or the find command. Maybe doing this doesn't pollute the meaning of "readlines". Splitting the file contents with s special character is really a common usage. Besides, I feel using a line separator other than '\n' doesn't mean we're dealing with binary format, in fact, I often deal with text format with the record separator '\t'.
msg111189 - (view) Author: Douglas Alan (Douglas.Alan) Date: 2010-07-22 16:33
Personally, I think that this functionality should be built into Python's readlines. That's where a typical person would expect it to be, and this is something that is supported by most other scripting language I've used. E.g., awk has the RS variable which lets you set the "input record separator", which defaults to newline. And as I previously pointed out, xargs and find provide the option to use null as their line separator.
msg111202 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-07-22 17:54
> Personally, I think that this functionality should be built into
> Python's readlines. That's where a typical person would expect it to
> be, and this is something that is supported by most other scripting
> language I've used.

Adding it to readline() and/or readlines() would modify the standard IO
Abstract Base Classes, and would therefore probably need discussion on
python-dev.
msg111220 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2010-07-22 22:15
On Fri, Jul 23, 2010 at 3:54 AM, Antoine Pitrou <report@bugs.python.org> wrote:
>
> Antoine Pitrou <pitrou@free.fr> added the comment:
>
>> Personally, I think that this functionality should be built into
>> Python's readlines. That's where a typical person would expect it to
>> be, and this is something that is supported by most other scripting
>> language I've used.
>
> Adding it to readline() and/or readlines() would modify the standard IO
> Abstract Base Classes, and would therefore probably need discussion on
> python-dev.

That's also the reason why I'm suggesting a separate readrecords()
method - the appropriate ABC should be able to implement it as a
concrete method based on something like the recipe above.
msg111453 - (view) Author: Ralph Corderoy (ralph.corderoy) Date: 2010-07-24 11:13
fileLineIter() is not a solution that allows this bug to be closed, no.
readline() needs modifying and if that means python-dev discussion then
that's what it needs.  Things to consider include changing the record
separator as the file is read.

    $ printf 'a b c\nd e f ' |
    > awk '{print "<" $0 ">"} NR == 1 {RS = " "}'
    <a b c>
    <d>
    <e>
    <f>
    $
msg223490 - (view) Author: Andrew Barnert (abarnert) * Date: 2014-07-20 00:07
http://thread.gmane.org/gmane.comp.python.ideas/28310 discusses the same idea.

Guido raised a serious problem with either adding an argument to readline and friends, or adding new methods readrecord and friends: It means the hundreds of existing file-like objects that exist today no longer meet the file API.

Putting the separator in the constructor call solves that problem. Construction is not part of the file API, and different file-like objects' constructors are already wildly different. It also seems to fit in better with what perl, awk, bash, etc. do (whether you either set something globally, or on the file, rather than on the line-reading mechanism). And it seems conceptually cleaner; a file shouldn't be changing line-endings in mid-stream—and if it does, that's similar to changing encodings.

Whether this should be done by reusing newline, or by adding another new parameter, I'm not sure. The biggest issue with reusing newline is that it has a meaning for write mode, not just for read mode (either terminal \n characters, or all \n characters, it's not entire clear which, are replaced with newline), and I'm not sure that's appropriate here. (Or, worse, maybe it's appropriate for text files but not binary files?)

R. David Murray's patch doesn't handle binary files, or _pyio, and IIRC from testing the same thing there was one more problem to fix for text files as well… but it's not hard to complete. If I have enough free time tomorrow, I'll clean up what I have and post it.
msg223491 - (view) Author: Andrew Barnert (abarnert) * Date: 2014-07-20 00:41
While we're at it, Douglas Alan's solution wouldn't be an ideal solution even if it were a builtin. A fileLineIter obviously doesn't support the stream API. It means you end up with two objects that share the same file, but have separate buffers and out-of-sync file pointers. And it's a lot slower.

That being said, I think it may be useful enough to put in the stdlib—even more so if you pull the resplit-an-iterator-of-strings code out:

def resplit(strings, separator):
    partialLine = None
    for s in strings:
        if partialLine:
            partialLine += s
        else:
            partialLine = s
        if not s:
            break
        lines = partialLine.split(separator)
        partialLine = lines.pop()
        yield from lines
    if partialLine:
        yield partialLine

Now, you can do this:

with open('rdm-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\0')
    lines = (line + '\n' for line in lines)

# Or, if you're just going to strip off the newlines anyway:
with open('file-0-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\0')

# Or, if you have a binary file:
with open('binary-example, 'rb') as f:
    chunks = iter(partial(f.read, 8192), b'')
    lines = resplit(chunks, b'\0')

# Or, if I understand ysj.ray's example:
with open('ysj.ray-example') as f:
    chunks = iter(partial(f.read, 8192), '')
    lines = resplit(chunks, '\r\n')
    records = resplit(lines, '\t')

# Or, if you have something that isn't a file at all:
lines = resplit((packet.body for packet in packets), '\n')
msg223492 - (view) Author: Andrew Barnert (abarnert) * Date: 2014-07-20 00:45
One last thing, a quick & dirty solution that works today, if you don't mind accessing private internals of stdlib classes, and don't mind giving up the performance of _io for _pyio, and don't need a solution for binary files:

class MyTextIOWrapper(_pyio.TextIOWrapper):
    def readrecord(self, sep):
        readnl, self._readnl = self._readnl, sep
        try:
            return self.readline()
        finally:
            self._readnl = readnl

Or, if you prefer:

class MyTextIOWrapper(_pyio.TextIOWrapper):
    def __init__(self, *args, separator, **kwargs):
        super().__init__(*args, **kwargs)
        self._readnl = separator

For binary files, there's no solution quite as simple; you need to write your own readline method by copying and pasting the one from _pyio.RawIOBase, and the modifications to use an arbitrary separator aren't quite as trivial as they look at first (at least if you want multi-byte separators).
msg224016 - (view) Author: Akira Li (akira) * Date: 2014-07-26 02:06
To make the discussion more specific, here's a patch that adds support
for alternative newlines in _pyio.TextIOWrapper. It aslo updates the
documentation and adds more io tests. It does not provide C
implementation or the extended newline support for binary files.

As a side-effect it also fixes the bug in line_buffering=True
behavior, see issue22069O.

Note: The implementation does no newline translations unless in legacy
special cases i.e., newline='\0' behaves like newline='\n'. This is a 
key distinction from the behavior described in
http://bugs.python.org/file36008/pep-newline.txt

The initial specification is from
https://mail.python.org/pipermail/python-ideas/2014-July/028381.html
msg224077 - (view) Author: Akira Li (akira) * Date: 2014-07-26 17:02
> As a side-effect it also fixes the bug in line_buffering=True
> behavior, see issue22069O.

It should be issue22069 "TextIOWrapper(newline="\n", line_buffering=True) 
mistakenly treat \r as a newline"

Reuploaded the patch so that it applies cleanly on the current tip.
msg224149 - (view) Author: Andrew Barnert (abarnert) * Date: 2014-07-28 02:14
Akira, your patch does this:

-        self._writetranslate = newline != ''
-        self._writenl = newline or os.linesep
+        self._writetranslate = newline in (None, '\r', '\r\n')
+        self._writenl = newline if newline is not None else os.linesep

Any reason you made the second change? Why change the value assigned to _writenl for newline='\n' when you don't want to actually change the behavior for those cases? Just so you can double-check at write time that _writetranslate is never set unless _writenl is '\r', '\r\n', or os.linesep?
msg224155 - (view) Author: Akira Li (akira) * Date: 2014-07-28 08:01
> Akira, your patch does this:
>
> -        self._writetranslate = newline != ''
> -        self._writenl = newline or os.linesep
> +        self._writetranslate = newline in (None, '\r', '\r\n')
> +        self._writenl = newline if newline is not None else os.linesep
>
> Any reason you made the second change? Why change the value assigned
> to _writenl for newline='\n' when you don't want to actually change
> the behavior for those cases? Just so you can double-check at write
> time that _writetranslate is never set unless _writenl is '\r',
> \r\n', or os.linesep?

If newline='\n' then writenl is '\n' with and without the patch.
If newline='\n' then write('\n\r') writes '\n\r' with and without the
patch.

If newline='\n' then writetranslate=False (with the patch). It does not
change the result for newline='\n' as it is documented now [1]:

  [newline] can be None, '', '\n', '\r', and '\r\n'.
  ...
  If newline is any of the other legal values [namely '\r', '\n',
  '\r\n'], any '\n' characters written are translated to the given
  string.

[...] are added by me for clarity.

[1] https://docs.python.org/3.4/library/io.html#io.TextIOWrapper

writetranslate=False so that if newline='\0' then write('\0\n') would
write '\0\n' i.e., embed '\n' are not corrupted if newline='\0'. That is
why it is the "no translation" patch:

+    When writing output to the stream:
+
+    - if newline is None, any '\n' characters written are translated to
+      the system default line separator, os.linesep
+    - if newline is '\r' or '\r\n', any '\n' characters written are
+      translated to the given string
+    - no translation takes place for any other newline value [any string].
msg226397 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2014-09-05 03:09
Related:
* Issue 563491: 2002 proposal for parameter to readline, rejected at the time
* Issue 17083: Newline is hard coded for binary file readline

Fixing this issue for binary files would probably also satisfy Issue 17083.
msg387491 - (view) Author: Christoph Anton Mitterer (calestyo) Date: 2021-02-22 03:25
Just wondered whether this is still being considered?

Cheers,
Chris.
msg387512 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2021-02-22 10:51
I don't think so.
msg387515 - (view) Author: Christoph Anton Mitterer (calestyo) Date: 2021-02-22 14:35
Oh, what a pity,... 

Seemed like a pretty common use case, which is unnecessarily prone to buggy or inefficient (user-)implementations.
msg387552 - (view) Author: Christoph Anton Mitterer (calestyo) Date: 2021-02-23 06:04
btw, just something for the record:

I think the example given in msg109117 above is wrong:

Depending on the read size it will produce different results, given how split() works:

Imagine a byte sequence:
>>> b"\0foo\0barbaz\0\0abcd".split(b"\0")
[b'', b'foo', b'barbaz', b'', b'abcd']


Now the same sequence, however with a different read size (here a shorter one):
>>> b"\0foo\0barbaz\0".split(b"\0")
[b'', b'foo', b'barbaz', b'']
>>> b"\0abcd".split(b"\0")
[b'', b'abcd']

=> it's the same bytes, but in the 2nd case one get's an extra b''.
History
Date User Action Args
2022-04-11 14:56:09adminsetgithub: 41622
2021-02-23 06:04:14calestyosetmessages: + msg387552
2021-02-22 14:35:53calestyosetmessages: + msg387515
2021-02-22 10:52:01pitrousetstatus: open -> closed
resolution: later
stage: needs patch -> resolved
2021-02-22 10:51:32pitrousetmessages: + msg387512
2021-02-22 03:25:11calestyosetnosy: + calestyo
messages: + msg387491
2019-08-25 15:42:34maggyerosetnosy: + maggyero
2014-09-05 03:09:20martin.pantersetmessages: + msg226397
2014-07-28 08:01:37akirasetmessages: + msg224155
2014-07-28 02:14:36abarnertsetmessages: + msg224149
versions: + Python 3.4, - Python 3.5
2014-07-26 17:02:19akirasetfiles: + io-newline-issue1152248-2.patch

messages: + msg224077
2014-07-26 08:24:20pconnellsetnosy: + pconnell
2014-07-26 05:48:57rhettingersetversions: + Python 3.5, - Python 3.4
2014-07-26 02:06:43akirasetfiles: + io-newline-issue1152248.patch

nosy: + akira
messages: + msg224016

keywords: + patch
2014-07-21 00:33:29abarnertsetfiles: + pep-peek.txt
2014-07-21 00:32:51abarnertsetfiles: + pep-newline.txt
2014-07-20 17:40:41wolmasetnosy: + wolma
2014-07-20 02:01:11martin.pantersetnosy: + martin.panter
2014-07-20 00:45:25abarnertsetmessages: + msg223492
2014-07-20 00:41:35abarnertsetmessages: + msg223491
2014-07-20 00:07:05abarnertsetnosy: + abarnert
messages: + msg223490
2012-08-20 05:46:03ncoghlansettitle: Enhance file.readlines by making line separator selectable -> Add support for reading records with arbitrary separators to the standard IO stack
versions: + Python 3.4, - Python 3.2
2011-06-01 01:20:37jconsetnosy: + jcon
2010-07-24 11:13:27ralph.corderoysetmessages: + msg111453
2010-07-22 22:15:10ncoghlansetmessages: + msg111220
2010-07-22 17:54:13pitrousetmessages: + msg111202
2010-07-22 16:33:43Douglas.Alansetmessages: + msg111189
2010-07-22 14:32:43ysj.raysetnosy: + ysj.ray
messages: + msg111177
2010-07-22 11:42:53ncoghlansetstatus: pending -> open
resolution: works for me -> (no value)
messages: + msg111168
2010-07-22 06:44:24amaury.forgeotdarcsetstatus: open -> pending

nosy: + amaury.forgeotdarc
messages: + msg111152

resolution: works for me
2010-07-02 17:31:17Douglas.Alansetnosy: + Douglas.Alan
messages: + msg109117
2010-07-02 10:41:06eric.araujosetnosy: georg.brandl, rhettinger, facundobatista, ncoghlan, pitrou, benjamin.peterson, nessus42, eric.araujo, ralph.corderoy, r.david.murray
messages: + msg109098
components: + Library (Lib), - Interpreter Core
2010-07-01 10:05:04ralph.corderoysetnosy: + ralph.corderoy
messages: + msg109038
2010-04-13 19:59:57eric.araujosetnosy: + eric.araujo
2009-05-15 17:46:23nessus42setmessages: + msg87823
2009-05-15 13:07:55pitrousetmessages: + msg87817
2009-05-15 11:47:10ncoghlansetmessages: - msg87809
2009-05-15 11:46:53ncoghlansetmessages: + msg87809
2009-05-15 11:46:28ncoghlansetmessages: + msg87808
2009-05-15 11:34:04pitrousetmessages: + msg87807
2009-05-15 11:25:13pitrousetkeywords: - easy

messages: + msg87806
2009-05-15 11:24:26pitrousetmessages: + msg87805
2009-05-15 11:13:00pitrousetmessages: + msg87803
2009-05-15 10:17:51ncoghlansetmessages: + msg87802
2009-05-15 09:18:37r.david.murraysetkeywords: + easy
nosy: + r.david.murray
messages: + msg87801

2009-05-15 02:53:53ajaksu2setnosy: + benjamin.peterson, pitrou

components: + IO
versions: + Python 3.2, - Python 3.1
2009-02-16 06:15:00skip.montanarosetnosy: - montanaro.historic
2009-02-15 23:58:45ncoghlansetmessages: + msg82188
stage: test needed -> needs patch
2009-02-15 23:49:49ajaksu2setstage: test needed
versions: + Python 3.1, - Python 3.0
2008-03-19 18:52:17facundobatistasetassignee: facundobatista ->
messages: + msg64084
versions: + Python 3.0
2008-02-29 11:58:52ncoghlansetmessages: + msg63134
2008-02-27 11:08:10facundobatistasetassignee: facundobatista
messages: + msg63068
2008-02-27 08:48:48rhettingersetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg63067
2008-02-27 02:41:20facundobatistasetstatus: open -> closed
nosy: + facundobatista
resolution: rejected
messages: + msg63060
2005-02-26 07:24:20ncoghlancreate