intercept.js

JavaScript library for intercepting AJAX / XHR calls performed by a website in order to:

perform reverse engineering of communication between front-end and back-end; and
perform data extraction (a.k.a scraping) of such a website.

Outline:

Getting Started
Processing AJAX Responses
Gathering Data
Pause Interception
Debug Mode
Working with Selnium
Disclaimer

1. Getting Started

Open a browser where you can access your Facebook profile.
Go to this URL to see the latest posts in the Facebook groups where you are joined to:

Press CTRL+SHIFT+I to open the Developer Tools:

Inject intercept.js the the webpage.

In the console tab, paste the source code of intercept.js and press ENTER.

Initialize intercept.js:

$$.init();

Scroll down to load new posts, and see the URLs of the AJAX calls in the console.

2. Processing AJAX Responses

Initialize intercept.js with a custom parsing function.

E.g.: The code below extract the content of each post from the AJAX response.

$$.init({
    parse: function(xhr) {
        var s = null; // complete response text 
        var ar = null; // array of lines in the response text
        var x = null; // line in the response text
        var t = null; // response text wrapped in array
        var j = null; // response json

        // get the content of all the posts
        if (xhr._url == '/api/graphql/') {
            s = xhr.responseText;
            ar = s.split("\n");
            for (let z = 0; z < ar.length; z++) {
                x = ar[z];
                // JSON is not a valid json, you must wrap it in array.
                t = '['+x+']';
                j = JSON.parse(t)[0];

                if (x.startsWith('{"label":"CometNewsFeed_viewerConnection$stream$CometNewsFeed_viewer_news_feed"')) {

                    let a = j.data.node.comet_sections.content.story.message;
                    if (a != null) {
                        console.log('POST: ' + a.text);
                    }
                }
            }
        } 
    }
});

Additioonally to logging the contents, you can store them into the $$.data array.

console.log('POST: ' + a.text);
$$.push(a.text);

3. Gathering Data

Every time you call the $$.push metod you add an element into the array $$.data

console.log($$.data.length);
// => 1

You can clean up both arrays: $$.data and $$.calls by calling the $$.reset method:

$$.reset();

4. Pause Interception

You can pause interception:

$$.pause();

You can resume interception:

$$.play();

You can check if interception is running or not:

$$._paused
// => true

5. Debug Mode

You can request intercept.js to store all the requests and their responses into an array.

$$.debug(true);

You can also define the debugging mode when initialize:

$$.init({
    debug: true,
    parse: function(xhr) {
        // ...
    }
});

Such a feature is useful for developers, when they are performing reverse engieering of a website.

$$.calls.length
// => 64

$$.calls[0].url
// => '/ajax/navigation/'

You can check if intercept.js is running in debug mode or not:

$$._debug
// => false

Such a feature is resourses consuming too, and it should keep disabled in production environment.

6. Working with Selenium

You can automate your web-scraping using Selenium, injecting the intercept.js library using the Chrome DevTools Protocol (a.k.a. CDP).

You can find a full example here.

Such an example is written in Ruby, but you can use any other lenguage like Phyton if you want.
Such an example is using AdsPower Client to operate stealth browsers, but you can use the old fashion Selenium/Webdriver if you want.

In this secton, we explain such an example line by line.

In your Ruby script, include the requried libraries:

require 'net/http'
require 'json'
require 'adspower-client'

Create the AdsPower client:

key = '*************8c95acbf*************'
client = AdsPowerClient.new(key: key);

Start the browser:

id = 'jdu****'
driver = client.driver(id)

Get source code of intercept.js library:

uri = URI.parse('https://raw.githubusercontent.com/leandrosardi/intercept/main/lib/intercept.js')
js1 = Net::HTTP.get(uri)

Get the source code of the scraper:

uri = URI.parse('https://raw.githubusercontent.com/leandrosardi/intercept/main/examples/facebook_group_posts.js')
js2 = Net::HTTP.get(uri)

Injecting the library into the browser using CDP:

driver.execute_cdp("Page.addScriptToEvaluateOnNewDocument", source: js1+js2)

Get the URL to scrape:

url = 'https://www.facebook.com/?filter=groups&sk=h_chr'
driver.get(url)

Waiting for the page to load:

sleep(5)

Reset the interceptor:

driver.execute_script('$$.reset();')

Clicking to load posts with ajax:

a = driver.find_element(:css, 'a[href="/?filter=groups&sk=h_chr"]')
a.click

Waiting for the AJAX to load:

sleep(5)

Getting the list of scraped posts:

s = driver.execute_script('return JSON.stringify($$.data)')
arr = JSON.parse(s)

Disclaimer

Use this library at your own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
docu/pics		docu/pics
examples		examples
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

intercept.js

1. Getting Started

2. Processing AJAX Responses

3. Gathering Data

4. Pause Interception

5. Debug Mode

6. Working with Selenium

Disclaimer

About

Releases

Packages

Languages

License

MassProspecting/intercept

Folders and files

Latest commit

History

Repository files navigation

intercept.js

1. Getting Started

2. Processing AJAX Responses

3. Gathering Data

4. Pause Interception

5. Debug Mode

6. Working with Selenium

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages