-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse native CSS selectors #111
Comments
I am also interested in this. I am trying to port a scraper from Kotlin (JSoup) to Haskell. Having CSS selectors would make this a lot easier, since that's what I was using originally. There are some sites that are too hard to scrap manually. |
I think supporting the basic set of CSS selectors could make sense. I think trying to support all the different pseudo-classes would be a huge effort. Also, I think ideally we'd have a way to parse the CSS selectors at compile time so you don't have to deal with runtime parse errors. That's not an area I'm super familiar with, maybe template haskell would be the way to go? I could see this working a couple of ways:
If someone is interested in investigating and working on this, I'd be able to provide guidance. Otherwise, I think this is interesting and something I'd like to look into but I don't think I'd be able to get to it soon. Most major scalpel development happens in bursts when I happen to have large blocks of free time and I'm not sure when that will happen next. |
I am not very familiar with the internals of Scalpel, but I can see it uses tagsoup for selection. Maybe it would be possible to leverage Selenium to implement CSS selectors? There is already a Haskell library that could be used: https://github.com/haskell-webdriver/haskell-webdriver/blob/main/examples/readme-example-beginner.md. A The webdriver would a heavy dependency and it might not make sense for this project, but I think implementing the whole CSS parsing + selection from scratch would be a lot of work. |
I recently tried out a similar library in Rust https://github.com/rust-scraper/scraper, which i was able to use productively a bit quicker, because i could reuse my knowledge about plain CSS selectors, which are documented for free on sites like https://www.w3schools.com/cssref/css_selectors.php, e.g.
let selector = Selector::parse("h1.foo").unwrap();
.Would parsing CSS selectors make sense for scalpel as well?
The text was updated successfully, but these errors were encountered: