Automated scraper built on top of Stack Exchange API.
Search code fragments by given phrase in Stack Overflow - consider only code snippets in selected threads.
Report Bug
- Node.js
- Express
- React.js
- React Redux
- Redux Thunk
- Mongoose
- React Bootstrap
- Docker
- Scraping with:
- Set all following environmental variables, e.g.:
HOST=0.0.0.0
PORT=3000
MONGO_DB_USER=root
MONGO_DB_PASSWORD=example
MONGO_DB_NAME=appdb
MONGO_DB_PORT=27017
MONGO_DB_SERVICE_NAME=mongodb
CODE_FRAGMENTS_FETCH_LIMIT=10
JWT_TOKEN_SECRET=access_token_secret
STACK_API_KEY=stack_api_key
You can find more information on getting the STACK_API_KEY
by following → https://api.stackexchange.com/docs/authentication.
Important note → https://api.stackexchange.com/docs/throttle
- Clone the repo
git clone https://github.com/adjaskam/stack-code-finder.git
- Install NPM packages for the client
cd client npm i
- Start the project with
concurrently
(invoke from the root directory)npm run dev:fullstack
Note: The backend part of this project is based on Dockerfile and the development process is placed within the container.
- POST
/api/codefragments
- start a job for giventag
(includes scraping procedure). The application supports:- Preventing creation of duplicates extracted fragments (comparing values of MD5 hash from code fragment factors).
- Handling user-specific documents - that means owning the single code fragment by multiple users.
- Web scraping optionally with Puppeteer or Cheerio.
{
"tag": "Java",
"searchPhrase": "int",
"amount": 1,
"scraperType": "cheerio"
}
{
"items":[
{
"questionId": "71860220",
"tag": "Java",
"searchPhrase": "int",
"codeFragment": "public class TekuciRacun implements IRacun{\n private String vlasnik;\n private int isplate;\n private int kredit;\nthis.stanje = stanje;\n }\n \n \n \n}\n",
"hashMessage": "2de6aac5afba3f6f44aa7f9e91cb9d8d",
"usersOwn": [
"[email protected]"
],
"_id": "625712efea1e61ff34001739",
"createdAt": "2022-04-13T18:14:07.563Z",
"updatedAt": "2022-04-13T18:14:07.563Z",
"__v": 0
}
],
"amount": 1,
"executionTime": 960
}
-
GET
/api/codefragments/my
- get all obtained code fragments per user -
DELETE
/api/codefragments/:hashMessage
- delete code fragment by MD5 hash- Available for authenticated user.
- Soft delete is being proceeded until the last user owns the specific code fragment.
usersOwn
array of given code fragment is empty? -> hard delete item.
Authentication is needed to handle user-specific documents and is based on JWT standard. No confirmation needed while registering. Email has to be unique. All forms available in the application are being validated.
- POST
/api/register
- register new user - POST
/api/login
- service login
- Handle user-specific documents - create authentication & owning the documents by the specific user
- Work on performance - added
cheerio
as the main scraper - Adjust searching for
searchPhrase
in the obtained content to be more precise (currently, the base of the search process is check if fragment includes givensearchPhrase
) - Work on refresh tokens