Skip to content

Commit

Permalink
[Mailman2Bridge] fix message separation and improve "From_ lines" dis…
Browse files Browse the repository at this point in the history
…ambiguation (#4156)

* [Mailman2Bridge.php] enable PCRE_MULTILINE pattern modifier

Enable PCRE_MULTILINE pattern modifier on mbox content parsing. Without it parsing monthly archives results in only a single message each.

* [Mailman2Bridge.php] extend mbox "From_ lines" pattern

Extend PCRE pattern matching individual "From_ lines" used to split single messages in mbox content. 

In addition to the matching line having to start with 'From ' it now also has to end with time and date (hh:mm:ss yyyy). 

This makes the pattern slightly more robust against accidental matches when a line within the actual message body starts with 'From ' which Mailman 2 (Pipermail) may not be configured to disambiguate.

* [Mailman2Bridge.php] remove trailing slash from URI constant

---------

Co-authored-by: enwu <[email protected]>
  • Loading branch information
enwuenwu and enwuenwu authored Jul 28, 2024
1 parent 049af3c commit 2fcba49
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions bridges/Mailman2Bridge.php
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
class Mailman2Bridge extends BridgeAbstract
{
const NAME = 'Mailman2Bridge';
const URI = 'https://list.org/';
const URI = 'https://list.org';
const MAINTAINER = 'imagoiq';
const CACHE_TIMEOUT = 60 * 30; // 30m
const DESCRIPTION = 'Fetch latest messages from Mailman 2 archive (Pipermail)';
Expand Down Expand Up @@ -68,7 +68,7 @@ public function collectData()
throw new \Exception('Failed to gzdecode');
}
}
$mboxParts = preg_split('/^From /', $data);
$mboxParts = preg_split('/^From\s.+\d{2}:\d{2}:\d{2}\s\d{4}$/m', $data);
// Drop the first element which is always an empty string
array_shift($mboxParts);
$mboxMails = array_reverse($mboxParts);
Expand Down

0 comments on commit 2fcba49

Please sign in to comment.