You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all: I really like the project, good stuff!
Now to my request ;)
I use pjscrape to scrape a single file and then do some post processing of the collected data. Right now the post processing doesn't seem to be possible.
It would be great to have a post scrape callback, ideally with the data collected available.
The text was updated successfully, but these errors were encountered:
I was going to say that you could do this in a custom writer, and you can, but it's more of a pain than I initially thought, and you can't really leverage the code in the base writer.
Just to make sure I understand the request here - you want to:
Scrape some data from one or more pages
Post-process the data with a custom function
Use the existing writers and formatters to write your output (e.g. JSON to STDOUT)
Is that right? If you wanted to do some custom writing in (3), I'd say just make a new writer, but if you want to take advantage of the existing writers and formatters you do need an addition to the library.
Actually, after creating the issue it occurred to me that I could use a writer...
I would like to do the post processing in the same process, as that processing will need to use phantom too.
What I need to archive is to:
scrape a page
replace svg with img elements
process the svg into temp. image files
save the final HTML
write a summary file with images processed, etc
The point is to convert a complex JS driven page into a static HTML page.
I think I can live with a custom writer, n particular since I can add custom config options to drive the process.
In that context it is quite nice to be able to provide multiple files on the command line, since that means I can one file for the actual code and another that I create on the fly that contains just url and other config settings.
Hi there,
First of all: I really like the project, good stuff!
Now to my request ;)
I use pjscrape to scrape a single file and then do some post processing of the collected data. Right now the post processing doesn't seem to be possible.
It would be great to have a post scrape callback, ideally with the data collected available.
The text was updated successfully, but these errors were encountered: