Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError - assert stripped_name.endswith(""") #35

Open
lancesnead opened this issue Oct 10, 2016 · 1 comment
Open

AssertionError - assert stripped_name.endswith(""") #35

lancesnead opened this issue Oct 10, 2016 · 1 comment
Assignees

Comments

@lancesnead
Copy link

lancesnead commented Oct 10, 2016

I get the occasional Assertion Error when scraping messages from my forum that faults the script. I was able to scrape the first 12,000 of my 14K+ messages before it occurred. My only workaround was to delete the original message on Yahoo servers to continue the scrape. Here is the error log:

         'Received: from [66.218.67.136] by n15.grp.scd.yahoo.com with '
         'NNFMP; 23 Feb 2003 03:31:01 -0000\r\n'
         'Date: Sun, 23 Feb 2003 03:30:59 -0000\r\n'
         'To: [email protected]\r\n'
         'Subject: e-rock\r\n'
         'Message-ID: <[email protected]>\r\n'
         'User-Agent: eGroups-EW/0.82\r\n'
         'MIME-Version: 1.0\r\n'
         'Content-Type: text/plain; charset=ISO-8859-1\r\n'
         'Content-Length: 258\r\n'
         'X-Mailer: Yahoo Groups Message Poster\r\n'
         'From: "kelly <[email protected]>" '
         '<[email protected]>\r\n'
         'X-Originating-IP: 67.95.3.66\r\n'
         'X-Yahoo-Group-Post: member; u=105309409\r\n'
         'X-Yahoo-Profile: nurkpb\r\n'
         '\r\n'
         'hey guys\n'
         'i am going to the e-rock climb\n'
         'i would love to leave fri pm and carpool with anyone\n'
         'i live off 287 & Debbie Ln in Mansfield, it's on the way :)\n'
         'anyone interested?\n'
         'please let me know\n'
         'thanks\n'
         'kelly\n'
         'hm 817-453-9557\n'
         'cell 817-271-8596\n'
         '[email protected]\n'
         '\n'
         '\n'
         '\n',

 'replyTo': 'SENDER',
 'senderId': '0JhJdOkNdI_n8iu5_u_2RoTIEuZdDtsmmD7dvc9nr6I-bGg2BJwlrQfYtCFjPiNvJW_oxNVzVqmQRfV3Ml8zyM94kZAT3VnhmuZ3MiMhoSes6VqBYEohqg',
 'spamInfo': {'isSpam': False, 'reason': '0'},
 'specialLinks': [],
 'subject': 'e-rock',
 'systemMessage': False,
 'topicId': 2103,
 'userId': 105309409}
Traceback (most recent call last):
  File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 129, in <module>
    main()
  File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 125, in main
    arguments, cfg_args)
  File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 103, in invoke_subcommand
    return module.command(args)
  File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\subcommands\scrape_messages.py", line 50, in command
    msg = scraper.get_message(cur_message)
  File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 180, in get_message
    return self._massage_message(data)
  File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 109, in _massage_message
    assert stripped_name.endswith("&quot;")
AssertionError
C:\Temp\yahoo-groups-backup>yahoo-groups-backup.py scrape_messages TEXASMOUNTAINEERS
Using '--mongo-port' from config file
Using '--mongo-host' from config file
Using '--password' from config file
Using '--login' from config file
Processing the log-in page...
Inserted message #14325 by Kevin Dahlstrom/None/[email protected]
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Message #2103 is missing
Failed to process message:
{'authorName': 'ajfreeman2002 &lt;[email protected]&gt;',
 'canDelete': True,
 'contentTrasformed': False,
 'from': '&quot;ajfreeman2002 &lt;[email protected]&gt;&quot; '
         '&lt;[email protected]&gt;',
 'headers': {'messageIdInHeader': 'PGIzOTExditncGc3QGVHcm91cHMuY29tPg=='},
 'messageBody': '<div id="ygrps-yiv-960954217">Will there be top ropes set up '
                'at Enchanted Rock? or is it all lead <br/>\n'
                'climbing? <br/>\n'
                '<br/>\n'
                'We are interested if it is top roping.<br/>\n'
                'Ardis</div>',
 'msgId': 2102,
 'msgSnippet': 'Will there be top ropes set up at Enchanted Rock? or is it all '
               'lead climbing? We are interested if it is top roping. Ardis',
 'nextInTime': 2105,
 'nextInTopic': 0,
 'numMessagesInTopic': 1,
 'postDate': 1045956479,
 'prevInTime': 2101,
 'prevInTopic': 0,
 'profile': 'ajfreeman2002',
 'rawEmail': 'Return-Path: &lt;[email protected]&gt;\r\n'
             'X-Sender: [email protected]\r\n'
             'X-Apparently-To: [email protected]\r\n'
             'Received: (EGP: mail-8_2_3_4); 22 Feb 2003 23:28:00 -0000\r\n'
             'Received: (qmail 95895 invoked from network); 22 Feb 2003 '
             '23:28:00 -0000\r\n'
             'Received: from unknown (66.218.66.216)\n'
             '  by m5.grp.scd.yahoo.com with QMQP; 22 Feb 2003 23:28:00 '
             '-0000\r\n'
             'Received: from unknown (HELO n21.grp.scd.yahoo.com) '
             '(66.218.66.77)\n'
             '  by mta1.grp.scd.yahoo.com with SMTP; 22 Feb 2003 23:28:00 '
             '-0000\r\n'
             'Received: from [66.218.67.162] by n21.grp.scd.yahoo.com with '
             'NNFMP; 22 Feb 2003 23:28:00 -0000\r\n'
             'Date: Sat, 22 Feb 2003 23:27:59 -0000\r\n'
             'To: [email protected]\r\n'
             'Subject: enchanted rock\r\n'
             'Message-ID: &lt;[email protected]&gt;\r\n'
             'User-Agent: eGroups-EW/0.82\r\n'
             'MIME-Version: 1.0\r\n'
             'Content-Type: text/plain; charset=ISO-8859-1\r\n'
             'Content-Length: 126\r\n'
             'X-Mailer: Yahoo Groups Message Poster\r\n'
             'From: &quot;ajfreeman2002 &lt;[email protected]&gt;&quot; '
             '&lt;[email protected]&gt;\r\n'
             'X-Originating-IP: 65.56.122.36\r\n'
             'X-Yahoo-Group-Post: member; u=126064000\r\n'
             'X-Yahoo-Profile: ajfreeman2002\r\n'
             '\r\n'
             'Will there be top ropes set up at Enchanted Rock? or is it all '
             'lead \n'
             'climbing? \n'
             '\n'
             'We are interested if it is top roping.\n'
             'Ardis\n'
             '\n'
             '\n',
 'replyTo': 'SENDER',
 'senderId': 'Z7i25P28BJENDnHwomyWIDjJh2nCRNcAExYizy-R4tFhTCcVvL_912yGWz279n7YJVL9UUtm6R_tSi9PqUWnHdUtsoTV9Qa09WZrMzWG0A7UUS_VL8AnG-i-xbZzbBzxzCIsLk8_ia7RZiAr',

 'spamInfo': {'isSpam': False, 'reason': '0'},
 'specialLinks': [],
 'subject': 'enchanted rock',
 'systemMessage': False,
 'topicId': 2102,
 'userId': 126064000}
Traceback (most recent call last):
  File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 129, in <module>
    main()
  File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 125, in main
    arguments, cfg_args)
  File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 103, in invoke_subcommand
    return module.command(args)
  File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\subcommands\scrape_messages.py", line 50, in command
    msg = scraper.get_message(cur_message)
  File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 180, in get_message
    return self._massage_message(data)
  File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 109, in _massage_message
    assert stripped_name.endswith("&quot;")
AssertionError
@csaftoiu csaftoiu self-assigned this Oct 10, 2016
@csaftoiu
Copy link
Owner

Ah wow, that's due to a very special "From" field, namely:

&quot;kelly &lt;[email protected]&gt;&quot; &lt;[email protected]&gt;

Or, when unescaped:

It didn't expect to see nested emails (an email with < inside the quotes).

Thanks for the bug report. I know enough to do a bug fix for this now.

@lancesnead lancesnead changed the title AssertionError AssertionError - assert stripped_name.endswith("&quot;") Oct 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants