Skip to content

Commit

Permalink
Rust: JustBooks: add new provider
Browse files Browse the repository at this point in the history
  • Loading branch information
julien-gautier-munic committed Jun 6, 2023
1 parent 282a299 commit 86a5439
Show file tree
Hide file tree
Showing 10 changed files with 1,436 additions and 20 deletions.
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,18 +35,19 @@ BookMetaData {
### Sources

| Source | Metadata (in addition to title and authors) | Notes |
|-------------------------------------------------------|---------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Babelio](https://www.babelio.com/) | blurb, keyword | No API available. No plan to build one.<br/>Babelio seem to block the IP if it detect this bot is doing some scrapping |
| [Decitre](https://www.decitre.fr/) | blurb, keywords in commentaries | |
| [GoodReads](https://www.goodreads.com/) | blurb, genres in english | An API was available, but GoodRead does not create new developer key. [See this](https://help.goodreads.com/s/article/Does-Goodreads-support-the-use-of-APIs) |
| [Google Books](https://www.google.fr/books/) | blurb, genres | [A real API](https://developers.google.com/books/docs/overview) is available to look up a book by ISBN <br/> Some book can't be search by ISBN, even though a search by title can find them, and they display the right ISBN |
| [ISBSearcher](https://www.isbnsearcher.com/) | blurb, main category in english | |
| [Label Emmaus](https://www.label-emmaus.co/) | blurb, genres | |
| [OpenLibrary](https://openlibrary.org/) | blurb are not translated | Its is based on physical books, it is not really a book database |
| [Chasse Aux Livre](https://www.chasse-aux-livres.fr/) | price only | it is not possible to parse with Selenium |
| [AbeBooks](https://www.abebooks.fr/) | Seems to have good french blurb | |
| [Fnac](https://www.fnac.com/) | blurb, second-hand price | |
| [Librarie Kleber](https://www.librairie-kleber.com/) | blurb, price | |
|-------------------------------------------------------|--------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| [Babelio](https://www.babelio.com/) | blurb, keyword | No API available. No plan to build one.<br/>Babelio seem to block the IP if it detect this bot is doing some scrapping |
| [Decitre](https://www.decitre.fr/) | blurb, keywords in commentaries | |
| [GoodReads](https://www.goodreads.com/) | blurb, genres in english | An API was available, but GoodRead does not create new developer key. [See this](https://help.goodreads.com/s/article/Does-Goodreads-support-the-use-of-APIs) |
| [Google Books](https://www.google.fr/books/) | blurb, genres | [A real API](https://developers.google.com/books/docs/overview) is available to look up a book by ISBN <br/> Some book can't be search by ISBN, even though a search by title can find them, and they display the right ISBN |
| [ISBSearcher](https://www.isbnsearcher.com/) | blurb, main category in english | |
| [Label Emmaus](https://www.label-emmaus.co/) | blurb, genres | |
| [OpenLibrary](https://openlibrary.org/) | blurb are not translated | Its is based on physical books, it is not really a book database |
| [Chasse Aux Livre](https://www.chasse-aux-livres.fr/) | price only | it is not possible to parse with Selenium |
| [AbeBooks](https://www.abebooks.fr/) | Seems to have good french blurb | |
| [Fnac](https://www.fnac.com/) | blurb, second-hand price | |
| [Librarie Kleber](https://www.librairie-kleber.com/) | blurb, price | |
| [JustBooks](https://www.justbooks.fr/) | blurb (seldom), prices | |

#### GoogleBooks
GoogleBooks has some inconsistencies:
Expand Down
1 change: 1 addition & 0 deletions lib/bridge_definitions.dart
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ enum ProviderEnum {
BooksPrice,
AbeBooks,
LesLibraires,
JustBooks,
}

class ProviderMetadataPair {
Expand Down
6 changes: 4 additions & 2 deletions native/src/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@ use std::fs::File;
use std::io::{Read, Write};

use crate::cached_client::CachedClient;
use crate::common;
use crate::common::Ad;
use crate::common::{LbcCredential, Provider};
use crate::publisher::Publisher;
use crate::{abebooks, babelio, booksprice, google_books, leboncoin, leslibraires};
use crate::{abebooks, babelio, booksprice, google_books, leboncoin, leslibraires, justbooks};
use crate::common;
use itertools::Itertools;
use serde::{Deserialize, Serialize};
use strum::IntoEnumIterator;
Expand All @@ -20,6 +20,7 @@ pub enum ProviderEnum {
BooksPrice,
AbeBooks,
LesLibraires,
JustBooks,
}

#[derive(PartialEq, Debug, Deserialize, Serialize)]
Expand Down Expand Up @@ -180,6 +181,7 @@ pub fn get_metadata_from_provider(
ProviderEnum::LesLibraires => {
leslibraires::LesLibraires {}.get_book_metadata_from_isbn(&isbn)
}
ProviderEnum::JustBooks => justbooks::JustBooks {}.get_book_metadata_from_isbn(&isbn),
}
}

Expand Down
2 changes: 2 additions & 0 deletions native/src/bridge_generated.rs
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ impl Wire2Api<ProviderEnum> for i32 {
2 => ProviderEnum::BooksPrice,
3 => ProviderEnum::AbeBooks,
4 => ProviderEnum::LesLibraires,
5 => ProviderEnum::JustBooks,
_ => unreachable!("Invalid variant for ProviderEnum: {}", self),
}
}
Expand Down Expand Up @@ -218,6 +219,7 @@ impl support::IntoDart for ProviderEnum {
Self::BooksPrice => 2,
Self::AbeBooks => 3,
Self::LesLibraires => 4,
Self::JustBooks => 5,
}
.into_dart()
}
Expand Down
16 changes: 16 additions & 0 deletions native/src/justbooks.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
use crate::{cached_client::CachedClient, common};
mod parser;
mod request;

pub struct JustBooks;

impl common::Provider for JustBooks {
fn get_book_metadata_from_isbn(&self, isbn: &str) -> Option<common::BookMetaDataFromProvider> {
let client = reqwest::blocking::Client::builder().build().unwrap();
let cached_client = CachedClient {
http_client: client,
};
let book_page = request::get_book_page(&cached_client, isbn);
parser::extract_metadata(&book_page)
}
}
108 changes: 108 additions & 0 deletions native/src/justbooks/parser.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
use crate::common::{html_select, BookMetaDataFromProvider};
use itertools::Itertools;

fn extract_author(author_scope: scraper::ElementRef) -> crate::common::Author {
let author_span = author_scope
.first_child()
.expect("author scope > span shoud have a first child");

crate::common::Author {
first_name: author_span
.value()
.as_text()
.expect("Should be a text")
.trim()
.to_string(),
last_name: "".to_string(),
}
}

pub fn extract_metadata(html: &str) -> Option<BookMetaDataFromProvider> {
let doc = scraper::Html::parse_document(html);

let book_select = html_select("div[itemscope][itemtype=\"http://schema.org/Book\"]");
let res = doc.select(&book_select);
let book_scope = match res.exactly_one() {
Ok(book_scope) => book_scope,
Err(_) => {
eprintln!("Response should contain a element whose with id is itemscope and itemtype=\"https://schema.org/Book\"");
return None;
}
};
let title_select = html_select("[itemprop=\"name\"]");
let title = book_scope
.select(&title_select)
.exactly_one()
.expect("There should be exactly one element with itemprop=\"name\"")
.first_child()
.unwrap()
.value()
.as_text()
.unwrap()
.trim()
.to_string();

let authors_select = html_select("[itemprop=\"author\"]");
let authors = book_scope
.select(&authors_select)
.map(extract_author)
.collect_vec();

let blurb = book_scope
.select(&html_select("[itemprop=\"description\"]"))
.at_most_one()
.unwrap()
.map(|d| {
d.first_child()
.unwrap()
.value()
.as_text()
.unwrap()
.trim()
.to_string()
});

Some(BookMetaDataFromProvider {
title: Some(title),
authors,
blurb,
..Default::default()
})
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn extract_metadata_with_blurb() {
let html = std::fs::read_to_string("src/justbooks/test/9782953189018.html").unwrap();
let md = extract_metadata(&html);
assert_eq!(md, Some(BookMetaDataFromProvider {
title: Some("La prière en sept chapitres par PADMASAMBHAVA".to_string()),
authors: vec![crate::common::Author {
first_name: "Tchimé Rigdzin Rinpotché; James Low".to_string(),
last_name: "".to_string()
}],
blurb: Some("Traduction : Chhimed Rigdzin Rinpoche et James Low Tirées du Terma du Nord (Tchang Ter), ces prières furent écrites par Padmasambhava à la requête de ses cinq principaux disciples (Yéshé Tsogyel, Trisong Deutsen, etc.). On y retrouve le célèbre Sampa Lhundroup (prière qui exauce tous les souhaits) et le Bartché Namsel (prière qui élimine tous les obstacles). Avec texte en tibétain, phonétique, traduction mot à mot et traduction du vers. Introduction de James Low sur la foi et la dévotion dans le bouddhisme tibétain. Relié, 322 pages".to_owned()),
keywords: vec![],
market_price: vec![],
}));
}
#[test]

fn extract_metadata_without_blurb() {
let html = std::fs::read_to_string("src/justbooks/test/9782298086294.html").unwrap();
let md = extract_metadata(&html);
assert_eq!(
md,
Some(BookMetaDataFromProvider {
title: Some("1918 la terrible victoire".to_string()),
authors: vec![],
blurb: None,
keywords: vec![],
market_price: vec![],
})
);
}
}
18 changes: 18 additions & 0 deletions native/src/justbooks/request.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
use crate::cached_client::Client;

pub fn get_book_page(client: &dyn Client, isbn: &str) -> String {
client.make_request(
format!("justbooks/get_book_url_{}.html", isbn).as_str(),
&|http_client| {
http_client
.get(format!(
"https://www.justbooks.fr/search/?isbn={}&st=xl&ac=qr",
&isbn
))
.send()
.unwrap()
.text()
.unwrap()
},
)
}
Loading

0 comments on commit 86a5439

Please sign in to comment.