Skip to content

Commit

Permalink
chore: set up rollup for multiplatform builds (#96)
Browse files Browse the repository at this point in the history
  • Loading branch information
karashiiro authored Jul 7, 2024
1 parent 334a5db commit a66fb5d
Show file tree
Hide file tree
Showing 5 changed files with 1,870 additions and 1,152 deletions.
109 changes: 71 additions & 38 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,43 +1,59 @@
# twitter-scraper

[![Documentation badge](https://img.shields.io/badge/docs-here-informational)](https://the-convocation.github.io/twitter-scraper/)

A port of [n0madic/twitter-scraper](https://github.com/n0madic/twitter-scraper) to Node.js.
A port of [n0madic/twitter-scraper](https://github.com/n0madic/twitter-scraper)
to Node.js.

> Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse-engineered. No API rate limits. No tokens needed. No restrictions. Extremely fast.
> Twitter's API is annoying to work with, and has lots of limitations — luckily
> their frontend (JavaScript) has it's own API, which I reverse-engineered. No
> API rate limits. No tokens needed. No restrictions. Extremely fast.
>
> You can use this library to get the text of any user's Tweets trivially.
Known limitations:

* Search operations require logging in with a real user account via `scraper.login()`.
* Twitter's frontend API does in fact have rate limits ([#11](https://github.com/the-convocation/twitter-scraper/issues/11))
- Search operations require logging in with a real user account via
`scraper.login()`.
- Twitter's frontend API does in fact have rate limits
([#11](https://github.com/the-convocation/twitter-scraper/issues/11))

## Installation

This package requires Node.js v16.0.0 or greater.

NPM:

```sh
npm install @the-convocation/twitter-scraper
```

Yarn:

```sh
yarn add @the-convocation/twitter-scraper
```

TypeScript types have been bundled with the distribution.

## Usage
Most use cases are exactly the same as in [n0madic/twitter-scraper](https://github.com/n0madic/twitter-scraper).
Channel iterators have been translated into [AsyncGenerator](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/AsyncGenerator)
instances, and can be consumed with the corresponding `for await (const x of y) { ... }` syntax.

Most use cases are exactly the same as in
[n0madic/twitter-scraper](https://github.com/n0madic/twitter-scraper). Channel
iterators have been translated into
[AsyncGenerator](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/AsyncGenerator)
instances, and can be consumed with the corresponding
`for await (const x of y) { ... }` syntax.

### Browser usage
This package directly invokes the Twitter API, which does not have permissive CORS headers. With the default
settings, requests will fail unless you disable CORS checks, which is not advised. Instead, applications must
provide a CORS proxy and configure it in the `Scraper` options.

Proxies (and other request mutations) can be configured with the request interceptor transform:
This package directly invokes the Twitter API, which does not have permissive
CORS headers. With the default settings, requests will fail unless you disable
CORS checks, which is not advised. Instead, applications must provide a CORS
proxy and configure it in the `Scraper` options.

Proxies (and other request mutations) can be configured with the request
interceptor transform:

```ts
const scraper = new Scraper({
Expand All @@ -46,13 +62,11 @@ const scraper = new Scraper({
// The arguments here are the same as the parameters to fetch(), and
// are kept as-is for flexibility of both the library and applications.
if (input instanceof URL) {
const proxy =
"https://corsproxy.io/?" +
const proxy = "https://corsproxy.io/?" +
encodeURIComponent(input.toString());
return [proxy, init];
} else if (typeof input === "string") {
const proxy =
"https://corsproxy.io/?" + encodeURIComponent(input);
const proxy = "https://corsproxy.io/?" + encodeURIComponent(input);
return [proxy, init];
} else {
// Omitting handling for example
Expand All @@ -63,12 +77,15 @@ const scraper = new Scraper({
});
```

[corsproxy.io](https://corsproxy.io) is a public CORS proxy that works correctly with this package.
[corsproxy.io](https://corsproxy.io) is a public CORS proxy that works correctly
with this package.

The public CORS proxy [corsproxy.org](https://corsproxy.org) *does not work* at the time of writing (at least
not using their recommended integration on the front page).
The public CORS proxy [corsproxy.org](https://corsproxy.org) _does not work_ at
the time of writing (at least not using their recommended integration on the
front page).

#### Next.js 13.x example:

```tsx
"use client";

Expand All @@ -82,13 +99,12 @@ export default function Home() {
transform: {
request(input: RequestInfo | URL, init?: RequestInit) {
if (input instanceof URL) {
const proxy =
"https://corsproxy.io/?" +
const proxy = "https://corsproxy.io/?" +
encodeURIComponent(input.toString());
return [proxy, init];
} else if (typeof input === "string") {
const proxy =
"https://corsproxy.io/?" + encodeURIComponent(input);
const proxy = "https://corsproxy.io/?" +
encodeURIComponent(input);
return [proxy, init];
} else {
throw new Error("Unexpected request input type");
Expand Down Expand Up @@ -120,19 +136,23 @@ export default function Home() {
```

### Edge runtimes
This package currently uses [`cross-fetch`](https://www.npmjs.com/package/cross-fetch) as a portable `fetch`.
Edge runtimes such as CloudFlare Workers sometimes have `fetch` functions that behave differently from the web
standard, so you may need to override the `fetch` function the scraper uses. If so, a custom `fetch` can be

This package currently uses
[`cross-fetch`](https://www.npmjs.com/package/cross-fetch) as a portable
`fetch`. Edge runtimes such as CloudFlare Workers sometimes have `fetch`
functions that behave differently from the web standard, so you may need to
override the `fetch` function the scraper uses. If so, a custom `fetch` can be
provided in the options:

```ts
const scraper = new Scraper({
fetch: fetch
fetch: fetch,
});
```

Note that this does not change the arguments passed to the function, or the expected return type. If the custom
`fetch` function produces runtime errors related to incorrect types, be sure to wrap it in a shim (not currently
Note that this does not change the arguments passed to the function, or the
expected return type. If the custom `fetch` function produces runtime errors
related to incorrect types, be sure to wrap it in a shim (not currently
supported directly by interceptors):

```ts
Expand All @@ -151,26 +171,37 @@ const scraper = new Scraper({
## Contributing

### Setup
This project currently targets Node 16.x and uses Yarn for package management. [Corepack](https://nodejs.org/dist/latest-v16.x/docs/api/corepack.html)
is configured for this project, so you don't need to install a particular package manager version manually.

Just run `corepack enable` to turn on the shims, then run `yarn` to install the dependencies.
This project currently requires Node 18.x for development and uses Yarn for
package management.
[Corepack](https://nodejs.org/dist/latest-v18.x/docs/api/corepack.html) is
configured for this project, so you don't need to install a particular package
manager version manually.

> The project supports Node 16.x at runtime, but requires Node 18.x to run its
> build tools.
Just run `corepack enable` to turn on the shims, then run `yarn` to install the
dependencies.

#### Basic scripts
* `yarn build`: Builds the project into the `dist` folder
* `yarn test`: Runs the package tests (see [Testing](#testing) first)

- `yarn build`: Builds the project into the `dist` folder
- `yarn test`: Runs the package tests (see [Testing](#testing) first)

Run `yarn help` for general `yarn` usage information.

### Testing
This package includes unit tests for all major functionality. Given the speed at which Twitter's private API
changes, failing tests are to be expected.

This package includes unit tests for all major functionality. Given the speed at
which Twitter's private API changes, failing tests are to be expected.

```sh
yarn test
```

Before running tests, you should configure environment variables for authentication.
Before running tests, you should configure environment variables for
authentication.

```
TWITTER_USERNAME= # Account username
Expand All @@ -181,5 +212,7 @@ PROXY_URL= # HTTP(s) proxy for requests (optional)
```

### Commit message format
We use [Conventional Commits](https://www.conventionalcommits.org), and enforce this with precommit checks.
Please refer to the Git history for real examples of the commit message format.

We use [Conventional Commits](https://www.conventionalcommits.org), and enforce
this with precommit checks. Please refer to the Git history for real examples of
the commit message format.
21 changes: 19 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,19 @@
"crawler"
],
"version": "0.13.1",
"main": "dist/_module.js",
"main": "dist/default/cjs/index.js",
"types": "./dist/types/index.d.ts",
"exports": {
"types": "./dist/types/index.d.ts",
"node": {
"import": "./dist/node/esm/index.mjs",
"require": "./dist/node/cjs/index.cjs"
},
"default": {
"import": "./dist/default/esm/index.mjs",
"require": "./dist/default/cjs/index.js"
}
},
"repository": "https://github.com/the-convocation/twitter-scraper.git",
"author": "karashiiro <[email protected]>",
"license": "MIT",
Expand All @@ -17,7 +29,7 @@
},
"packageManager": "[email protected]",
"scripts": {
"build": "tsc",
"build": "rimraf dist && rollup -c",
"commit": "cz",
"docs:generate": "typedoc --options typedoc.json",
"docs:deploy": "yarn docs:generate && gh-pages -d docs",
Expand Down Expand Up @@ -47,6 +59,7 @@
"@typescript-eslint/parser": "^5.59.7",
"cz-conventional-changelog": "^3.3.0",
"dotenv": "^16.3.1",
"esbuild": "^0.21.5",
"eslint": "^8.41.0",
"eslint-config-prettier": "^8.8.0",
"eslint-plugin-prettier": "^4.2.1",
Expand All @@ -56,6 +69,10 @@
"jest": "^29.5.0",
"lint-staged": "^13.2.2",
"prettier": "^2.8.8",
"rimraf": "^5.0.7",
"rollup": "^4.18.0",
"rollup-plugin-dts": "^6.1.1",
"rollup-plugin-esbuild": "^6.1.1",
"ts-jest": "^29.1.0",
"typedoc": "^0.24.7",
"typescript": "^5.0.4"
Expand Down
61 changes: 61 additions & 0 deletions rollup.config.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import dts from 'rollup-plugin-dts';
import esbuild from 'rollup-plugin-esbuild';

export default [
{
input: 'src/_module.ts',
plugins: [
esbuild({
define: {
PLATFORM_NODE: 'false',
PLATFORM_NODE_JEST: 'false',
},
}),
],
output: [
{
file: 'dist/default/cjs/index.js',
format: 'cjs',
sourcemap: true,
},
{
file: 'dist/default/esm/index.mjs',
format: 'es',
sourcemap: true,
},
],
},
{
input: 'src/_module.ts',
plugins: [
esbuild({
define: {
PLATFORM_NODE: 'true',
PLATFORM_NODE_JEST: 'false',
},
}),
],
output: [
{
file: 'dist/node/cjs/index.cjs',
format: 'cjs',
sourcemap: true,
inlineDynamicImports: true,
},
{
file: 'dist/node/esm/index.mjs',
format: 'es',
sourcemap: true,
inlineDynamicImports: true,
},
],
},
{
input: 'src/_module.ts',
plugins: [dts()],
output: {
file: 'dist/types/index.d.ts',
format: 'es',
},
},
];
2 changes: 1 addition & 1 deletion tsconfig.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"extends": "@tsconfig/node16/tsconfig.json",
"exclude": ["node_modules", "dist", "**/*.test.ts", "src/test-utils.ts"],
"exclude": ["node_modules", "dist", "examples", "**/*.test.ts", "src/test-utils.ts"],
"compilerOptions": {
// TODO: Remove "dom" from this when support for Node 16 is dropped
"lib": ["es2021", "dom"],
Expand Down
Loading

0 comments on commit a66fb5d

Please sign in to comment.