This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: mailbox / fromline matching
Type: Stage:
Components: Library (Lib) Versions: Python 2.2
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: barry Nosy List: barry, camield, gvanrossum, loewis, mwh, tim.peters
Priority: normal Keywords: patch

Created on 2002-02-22 14:54 by camield, last changed 2022-04-10 16:05 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
mailbox.py camield, 2002-02-22 14:54 mailbox.py diff
mailbox.py.diff2 camield, 2002-03-02 14:38 new diff
Messages (10)
msg39054 - (view) Author: Camiel Dobbelaar (camield) Date: 2002-02-22 14:54
mailbox.py does not parse this 'From' line correctly:
From camield@sentia.nl Mon Apr 23 18:22:28 2001 +0200
                                                ^^^^^
This is because of the trailing timezone information, 
that the regex does not account for.

Also, 'From' should match at the beginning of the line.
msg39055 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2002-02-28 22:37
Logged In: YES 
user_id=6380

That From line is simply illegal, or at least nonstandard.

If your system uses this nonstandard format, you can extend
the mailbox parser by overriding the ._isrealfromline
method.

The pattern doesn't need ^ because match() is used, which
only matches at the start of the line.

Rejected.
msg39056 - (view) Author: Camiel Dobbelaar (camield) Date: 2002-03-01 11:34
Logged In: YES 
user_id=466784

I have tracked this down to Pine, the mailreader. 

In imap/src/c-client/mail.c, it has this flag:
 static int notimezones = NIL;    /* write timezones in
"From " header */

(so timezones are written in the "From" lines by default)

I also found the following comment in imap/docs/FAQ in the
Pine distribution:

"""
So, good mail reading software only considers a line to be a
"From " line if it follows the actual specification for a
"From " line. This means, among other things, that the day
of week is fixed-format: "May 14", but "May  7" (note the
extra space) as opposed to "May 7".  ctime() format for the
date is the most common, although POSIX also allows a
numeric timezone after the year.
"""

While I don't consider Pine to be the ultimate mailreader,
its heritage may warrant that the 'From ' lines it creates
are considered 'standard'.
msg39057 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2002-03-01 21:42
Logged In: YES 
user_id=12800

IMO, Jamie Zawinski (author of the original mail/news reader
in Netscape among other accomplishments), wrote the
definitive answer on From_

http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/content-length.html

As far as Python's support for this in the mailbox module,
for backwards compatibility, the UnixMailbox class has a
strict-ish interpretation of the From_ delimiter, which I
think should not change.  It also has a class called
PortableUnixMailbox which recognizes delimiters as specified
in JWZ's document.  Personally, if I was trolling over a
real world mbox file I'd only use PortableUnixMailbox (as
long as non-delimiter From_ lines were properly escaped -- I
have some code in Mailman which tries to intelligently "fix"
non-escaped mbox files).

I agree with the Rejected resolution.
msg39058 - (view) Author: Camiel Dobbelaar (camield) Date: 2002-03-02 14:34
Logged In: YES 
user_id=466784

PortableUnixMailbox is not that useful, because it only
matches '^From '.  From-quoting is an even bigger mess then
From-headerlines, so that does not really help.

I submit a new diff that matches '\n\nFrom ' or
'<start-of-file>From ', which makes PortableUnixMailbox
useful for my purposes.  It is not that intrusive as the
comment in the mailbox.py suggests.





msg39059 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2002-03-02 16:47
Logged In: YES 
user_id=12800

Re-opening and assigning to myself.  I'll take a look at
your patches asap.
msg39060 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2002-03-16 16:43
Logged In: YES 
user_id=6656

Anything going to happen here by Monday?
msg39061 - (view) Author: Tim Peters (tim.peters) * (Python committer) Date: 2004-03-21 19:31
Logged In: YES 
user_id=31435

Since the Monday in question happened over 2 years ago, the 
answer to Michael's question is apparently "no" <wink>.  
Barry, we're stretching the conventional meaning of "asap" 
here -- can you close this one way or t'other now?
msg39062 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2004-08-18 11:36
Logged In: YES 
user_id=21627

The patch, as it stands, appears to be incorrect. It is
looking for *two* empty lines between messages, whereas
folder conventionally contain only a single empty line; this
is also what Zawinski says.

If that was fixed, I think the patch would be acceptable -
mailbox.py currently does not implement the rule that the
From: line must be preceded with an empty line.
msg39063 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2004-10-09 21:18
Logged In: YES 
user_id=12800

I see no follow up to Martin's comment of 2004-08-18. 
Therefore, closing, however if you come up with a patch that
addresses his comments you can re-open it.
History
Date User Action Args
2022-04-10 16:05:01adminsetgithub: 36138
2002-02-22 14:54:52camieldcreate