You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I get the occasional Assertion Error when scraping messages from my forum that faults the script. I was able to scrape the first 12,000 of my 14K+ messages before it occurred. My only workaround was to delete the original message on Yahoo servers to continue the scrape. Here is the error log:
'Received: from [66.218.67.136] by n15.grp.scd.yahoo.com with '
'NNFMP; 23 Feb 2003 03:31:01 -0000\r\n'
'Date: Sun, 23 Feb 2003 03:30:59 -0000\r\n'
'To: [email protected]\r\n'
'Subject: e-rock\r\n'
'Message-ID: <[email protected]>\r\n'
'User-Agent: eGroups-EW/0.82\r\n'
'MIME-Version: 1.0\r\n'
'Content-Type: text/plain; charset=ISO-8859-1\r\n'
'Content-Length: 258\r\n'
'X-Mailer: Yahoo Groups Message Poster\r\n'
'From: "kelly <[email protected]>" '
'<[email protected]>\r\n'
'X-Originating-IP: 67.95.3.66\r\n'
'X-Yahoo-Group-Post: member; u=105309409\r\n'
'X-Yahoo-Profile: nurkpb\r\n'
'\r\n'
'hey guys\n'
'i am going to the e-rock climb\n'
'i would love to leave fri pm and carpool with anyone\n'
'i live off 287 & Debbie Ln in Mansfield, it's on the way :)\n'
'anyone interested?\n'
'please let me know\n'
'thanks\n'
'kelly\n'
'hm 817-453-9557\n'
'cell 817-271-8596\n'
'[email protected]\n'
'\n'
'\n'
'\n',
'replyTo': 'SENDER',
'senderId': '0JhJdOkNdI_n8iu5_u_2RoTIEuZdDtsmmD7dvc9nr6I-bGg2BJwlrQfYtCFjPiNvJW_oxNVzVqmQRfV3Ml8zyM94kZAT3VnhmuZ3MiMhoSes6VqBYEohqg',
'spamInfo': {'isSpam': False, 'reason': '0'},
'specialLinks': [],
'subject': 'e-rock',
'systemMessage': False,
'topicId': 2103,
'userId': 105309409}
Traceback (most recent call last):
File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 129, in <module>
main()
File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 125, in main
arguments, cfg_args)
File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 103, in invoke_subcommand
return module.command(args)
File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\subcommands\scrape_messages.py", line 50, in command
msg = scraper.get_message(cur_message)
File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 180, in get_message
return self._massage_message(data)
File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 109, in _massage_message
assert stripped_name.endswith(""")
AssertionError
C:\Temp\yahoo-groups-backup>yahoo-groups-backup.py scrape_messages TEXASMOUNTAINEERS
Using '--mongo-port' from config file
Using '--mongo-host' from config file
Using '--password' from config file
Using '--login' from config file
Processing the log-in page...
Inserted message #14325 by Kevin Dahlstrom/None/[email protected]
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Skipped 1000 messages we already processed
Message #2103 is missing
Failed to process message:
{'authorName': 'ajfreeman2002 <[email protected]>',
'canDelete': True,
'contentTrasformed': False,
'from': '"ajfreeman2002 <[email protected]>" '
'<[email protected]>',
'headers': {'messageIdInHeader': 'PGIzOTExditncGc3QGVHcm91cHMuY29tPg=='},
'messageBody': '<div id="ygrps-yiv-960954217">Will there be top ropes set up '
'at Enchanted Rock? or is it all lead <br/>\n'
'climbing? <br/>\n'
'<br/>\n'
'We are interested if it is top roping.<br/>\n'
'Ardis</div>',
'msgId': 2102,
'msgSnippet': 'Will there be top ropes set up at Enchanted Rock? or is it all '
'lead climbing? We are interested if it is top roping. Ardis',
'nextInTime': 2105,
'nextInTopic': 0,
'numMessagesInTopic': 1,
'postDate': 1045956479,
'prevInTime': 2101,
'prevInTopic': 0,
'profile': 'ajfreeman2002',
'rawEmail': 'Return-Path: <[email protected]>\r\n'
'X-Sender: [email protected]\r\n'
'X-Apparently-To: [email protected]\r\n'
'Received: (EGP: mail-8_2_3_4); 22 Feb 2003 23:28:00 -0000\r\n'
'Received: (qmail 95895 invoked from network); 22 Feb 2003 '
'23:28:00 -0000\r\n'
'Received: from unknown (66.218.66.216)\n'
' by m5.grp.scd.yahoo.com with QMQP; 22 Feb 2003 23:28:00 '
'-0000\r\n'
'Received: from unknown (HELO n21.grp.scd.yahoo.com) '
'(66.218.66.77)\n'
' by mta1.grp.scd.yahoo.com with SMTP; 22 Feb 2003 23:28:00 '
'-0000\r\n'
'Received: from [66.218.67.162] by n21.grp.scd.yahoo.com with '
'NNFMP; 22 Feb 2003 23:28:00 -0000\r\n'
'Date: Sat, 22 Feb 2003 23:27:59 -0000\r\n'
'To: [email protected]\r\n'
'Subject: enchanted rock\r\n'
'Message-ID: <[email protected]>\r\n'
'User-Agent: eGroups-EW/0.82\r\n'
'MIME-Version: 1.0\r\n'
'Content-Type: text/plain; charset=ISO-8859-1\r\n'
'Content-Length: 126\r\n'
'X-Mailer: Yahoo Groups Message Poster\r\n'
'From: "ajfreeman2002 <[email protected]>" '
'<[email protected]>\r\n'
'X-Originating-IP: 65.56.122.36\r\n'
'X-Yahoo-Group-Post: member; u=126064000\r\n'
'X-Yahoo-Profile: ajfreeman2002\r\n'
'\r\n'
'Will there be top ropes set up at Enchanted Rock? or is it all '
'lead \n'
'climbing? \n'
'\n'
'We are interested if it is top roping.\n'
'Ardis\n'
'\n'
'\n',
'replyTo': 'SENDER',
'senderId': 'Z7i25P28BJENDnHwomyWIDjJh2nCRNcAExYizy-R4tFhTCcVvL_912yGWz279n7YJVL9UUtm6R_tSi9PqUWnHdUtsoTV9Qa09WZrMzWG0A7UUS_VL8AnG-i-xbZzbBzxzCIsLk8_ia7RZiAr',
'spamInfo': {'isSpam': False, 'reason': '0'},
'specialLinks': [],
'subject': 'enchanted rock',
'systemMessage': False,
'topicId': 2102,
'userId': 126064000}
Traceback (most recent call last):
File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 129, in <module>
main()
File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 125, in main
arguments, cfg_args)
File "C:\Temp\yahoo-groups-backup\yahoo-groups-backup.py", line 103, in invoke_subcommand
return module.command(args)
File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\subcommands\scrape_messages.py", line 50, in command
msg = scraper.get_message(cur_message)
File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 180, in get_message
return self._massage_message(data)
File "C:\Temp\yahoo-groups-backup\yahoo_groups_backup\scraper.py", line 109, in _massage_message
assert stripped_name.endswith(""")
AssertionError
The text was updated successfully, but these errors were encountered:
I get the occasional Assertion Error when scraping messages from my forum that faults the script. I was able to scrape the first 12,000 of my 14K+ messages before it occurred. My only workaround was to delete the original message on Yahoo servers to continue the scrape. Here is the error log:
The text was updated successfully, but these errors were encountered: