Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typeclassed String to StringLike #82

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ cabal-dev
.stack-work/
cabal.sandbox.config
cabal.config
*.swp
*.swo
*.DS_Store
1 change: 1 addition & 0 deletions .travis/stack-lts-2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ extra-deps:
- http-client-0.4.30
- http-client-tls-0.2.4
- pointedlist-0.6.1
- tagsoup-0.14.8@sha256:56b2023d2e9fdbff093719ce9af5285d2436b234c6c684d6a69f14595a8348ae
resolver: lts-2.0
1 change: 1 addition & 0 deletions .travis/stack-lts-6.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ packages:
extra-deps:
- fail-4.9.0.0
- http-client-0.4.30
- tagsoup-0.14.8@sha256:56b2023d2e9fdbff093719ce9af5285d2436b234c6c684d6a69f14595a8348ae
resolver: lts-6.0
1 change: 1 addition & 0 deletions .travis/stack-lts-7.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ packages:
- ../examples/
extra-deps:
- fail-4.9.0.0
- tagsoup-0.14.8@sha256:56b2023d2e9fdbff093719ce9af5285d2436b234c6c684d6a69f14595a8348ae
resolver: lts-7.4
5 changes: 4 additions & 1 deletion examples/complex-predicates/Main.hs
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ExtendedDefaultRules #-}

import Text.HTML.Scalpel
import Control.Applicative
import Control.Monad
import Data.List (isInfixOf)

default (String)


exampleHtml :: String
exampleHtml = "<html>\
Expand Down Expand Up @@ -37,7 +40,7 @@ catComment :: Scraper String String
catComment =
-- 1. First narrow the current context to the div containing the comment's
-- textual content.
chroot ("div" @: [hasClass "comment", hasClass "text"]) $ do
chroot (("div" :: TagName String) @: [hasClass "comment", hasClass "text"]) $ do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is why I chose to mix String and StringLike. It seems a lot less convenient to have to explicitly type each tag name as a TagName String.

Is there a specific benefit that using StringLike enables?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have a String or Text specific file that just declares the type

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmmm, there's already Text.HTML.Scalpel and Text.HTML.Scalpel.Core. I'd rather not create more re-exports of the API if its not necessary.

Other than API consistency is there a benefit to having StringLike, Text, and String versions of the API? I'm not super familiar with the implementation of OverloadedStrings, but I think the conversion happens at run time. If that's the case I don't think there would be a performance difference since internally, the library is already converting to Text for comparing nodes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using libs that use Text and it's inconvenient to convert to strings and from strings.

What if we have a classed based file Text.HTML.Scalpel.StringLike and then just apply String to all the functions in Text.HTML.Scalpel?

Copy link
Owner

@fimad fimad May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I was not really expecting people to be passing dynamic values as attributes. I think having a single Text.HTML.Scalpel.StringLike is a bit more palatable, let's go with that.

-- 2. Any can be used to access the root tag of the current context.
contents <- text anySelector
-- 3. Skip comment divs that do not contain "cat".
Expand Down
13 changes: 8 additions & 5 deletions examples/example-from-documentation/Main.hs
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ExtendedDefaultRules #-}

import Text.HTML.Scalpel
import Control.Applicative

default (String)


exampleHtml :: String
exampleHtml = "<html>\
Expand Down Expand Up @@ -35,19 +38,19 @@ main :: IO ()
main = print $ scrapeStringLike exampleHtml comments
where
comments :: Scraper String [Comment]
comments = chroots ("div" @: [hasClass "container"]) comment
comments = chroots (("div" :: TagName String) @: [hasClass "container"]) comment

comment :: Scraper String Comment
comment = textComment <|> imageComment

textComment :: Scraper String Comment
textComment = do
author <- text $ "span" @: [hasClass "author"]
commentText <- text $ "div" @: [hasClass "text"]
author <- text $ ("span" :: TagName String) @: [hasClass "author"]
commentText <- text $ ("div" :: TagName String) @: [hasClass "text"]
return $ TextComment author commentText

imageComment :: Scraper String Comment
imageComment = do
author <- text $ "span" @: [hasClass "author"]
imageURL <- attr "src" $ "img" @: [hasClass "image"]
author <- text $ ("span" :: TagName String) @: [hasClass "author"]
imageURL <- attr "src" $ ("img" :: TagName String) @: [hasClass "image"]
return $ ImageComment author imageURL
1 change: 1 addition & 0 deletions examples/html-to-markdown/Main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
-- the rough markdown equivalent using serial scrapers.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE RankNTypes #-}
import Control.Applicative
import System.Environment
import Text.HTML.Scalpel
Expand Down
3 changes: 3 additions & 0 deletions scalpel-core/benchmarks/Main.hs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ExtendedDefaultRules #-}

import Text.HTML.Scalpel.Core

Expand All @@ -9,6 +10,8 @@ import Data.Foldable (foldr')
import qualified Data.Text as T
import qualified Text.HTML.TagSoup as TagSoup

default (String, Int)


main :: IO ()
main = do
Expand Down
2 changes: 1 addition & 1 deletion scalpel-core/scalpel-core.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ library
, pointedlist
, regex-base
, regex-tdfa
, tagsoup >= 0.12.2
, tagsoup >= 0.14.8
, text
, vector
default-extensions:
Expand Down
4 changes: 4 additions & 0 deletions scalpel-core/src/Text/HTML/Scalpel/Core.hs
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,14 @@ module Text.HTML.Scalpel.Core (
, seekBack
, untilNext
, untilBack

-- * Select
, select
) where

import Text.HTML.Scalpel.Internal.Scrape
import Text.HTML.Scalpel.Internal.Scrape.StringLike
import Text.HTML.Scalpel.Internal.Select.Combinators
import Text.HTML.Scalpel.Internal.Select
import Text.HTML.Scalpel.Internal.Select.Types
import Text.HTML.Scalpel.Internal.Serial
45 changes: 23 additions & 22 deletions scalpel-core/src/Text/HTML/Scalpel/Internal/Scrape.hs
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import Text.HTML.Scalpel.Internal.Select.Types
import Control.Applicative
import Control.Monad
import Data.Maybe
import Text.StringLike (StringLike, castString)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I never really liked that StringLike was a part of the TagSoup package.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think StringLike is an abomination as well


import qualified Control.Monad.Fail as Fail
import qualified Data.Vector as Vector
Expand Down Expand Up @@ -67,7 +68,7 @@ instance Fail.MonadFail (Scraper str) where

-- | The 'scrape' function executes a 'Scraper' on a list of
-- 'TagSoup.Tag's and produces an optional value.
scrape :: (TagSoup.StringLike str)
scrape :: (StringLike str)
=> Scraper str a -> [TagSoup.Tag str] -> Maybe a
scrape s = scrapeTagSpec s . tagsToSpec . TagSoup.canonicalizeTags

Expand All @@ -77,7 +78,7 @@ scrape s = scrapeTagSpec s . tagsToSpec . TagSoup.canonicalizeTags
--
-- This function will match only the first set of tags matching the selector, to
-- match every set of tags, use 'chroots'.
chroot :: (TagSoup.StringLike str)
chroot :: (StringLike str)
=> Selector -> Scraper str a -> Scraper str a
chroot selector inner = do
maybeResult <- listToMaybe <$> chroots selector inner
Expand All @@ -91,30 +92,30 @@ chroot selector inner = do
--
-- > s = "<div><div>A</div></div>"
-- > scrapeStringLike s (chroots "div" (pure 0)) == Just [0, 0]
chroots :: (TagSoup.StringLike str)
chroots :: (StringLike str)
=> Selector -> Scraper str a -> Scraper str [a]
chroots selector (MkScraper inner) = MkScraper
$ return . mapMaybe inner . select selector

-- | The 'matches' function takes a selector and returns `()` if the selector
-- matches any node in the DOM.
matches :: (TagSoup.StringLike str) => Selector -> Scraper str ()
matches :: (StringLike str) => Selector -> Scraper str ()
matches s = MkScraper $ withHead (pure ()) . select s

-- | The 'text' function takes a selector and returns the inner text from the
-- set of tags described by the given selector.
--
-- This function will match only the first set of tags matching the selector, to
-- match every set of tags, use 'texts'.
text :: (TagSoup.StringLike str) => Selector -> Scraper str str
text :: (StringLike str) => Selector -> Scraper str str
text s = MkScraper $ withHead tagsToText . select s

-- | The 'texts' function takes a selector and returns the inner text from every
-- set of tags (possibly nested) matching the given selector.
--
-- > s = "<div>Hello <div>World</div></div>"
-- > scrapeStringLike s (texts "div") == Just ["Hello World", "World"]
texts :: (TagSoup.StringLike str)
texts :: (StringLike str)
=> Selector -> Scraper str [str]
texts s = MkScraper $ withAll tagsToText . select s

Expand All @@ -123,15 +124,15 @@ texts s = MkScraper $ withAll tagsToText . select s
--
-- This function will match only the first set of tags matching the selector, to
-- match every set of tags, use 'htmls'.
html :: (TagSoup.StringLike str) => Selector -> Scraper str str
html :: (StringLike str) => Selector -> Scraper str str
html s = MkScraper $ withHead tagsToHTML . select s

-- | The 'htmls' function takes a selector and returns the html string from
-- every set of tags (possibly nested) matching the given selector.
--
-- > s = "<div><div>A</div></div>"
-- > scrapeStringLike s (htmls "div") == Just ["<div><div>A</div></div>", "<div>A</div>"]
htmls :: (TagSoup.StringLike str)
htmls :: (StringLike str)
=> Selector -> Scraper str [str]
htmls s = MkScraper $ withAll tagsToHTML . select s

Expand All @@ -141,7 +142,7 @@ htmls s = MkScraper $ withAll tagsToHTML . select s
--
-- This function will match only the first set of tags matching the selector, to
-- match every set of tags, use 'innerHTMLs'.
innerHTML :: (TagSoup.StringLike str)
innerHTML :: (StringLike str)
=> Selector -> Scraper str str
innerHTML s = MkScraper $ withHead tagsToInnerHTML . select s

Expand All @@ -150,7 +151,7 @@ innerHTML s = MkScraper $ withHead tagsToInnerHTML . select s
--
-- > s = "<div><div>A</div></div>"
-- > scrapeStringLike s (innerHTMLs "div") == Just ["<div>A</div>", "A"]
innerHTMLs :: (TagSoup.StringLike str)
innerHTMLs :: (StringLike str)
=> Selector -> Scraper str [str]
innerHTMLs s = MkScraper $ withAll tagsToInnerHTML . select s

Expand All @@ -160,22 +161,22 @@ innerHTMLs s = MkScraper $ withAll tagsToInnerHTML . select s
--
-- This function will match only the opening tag matching the selector, to match
-- every tag, use 'attrs'.
attr :: (Show str, TagSoup.StringLike str)
=> String -> Selector -> Scraper str str
attr :: (Show str, StringLike str)
=> str -> Selector -> Scraper str str
attr name s = MkScraper
$ join . withHead (tagsToAttr $ TagSoup.castString name) . select s
$ join . withHead (tagsToAttr $ castString name) . select s

-- | The 'attrs' function takes an attribute name and a selector and returns the
-- value of the attribute of the given name for every opening tag
-- (possibly nested) that matches the given selector.
--
-- > s = "<div id=\"out\"><div id=\"in\"></div></div>"
-- > scrapeStringLike s (attrs "id" "div") == Just ["out", "in"]
attrs :: (Show str, TagSoup.StringLike str)
=> String -> Selector -> Scraper str [str]
attrs :: (Show str, StringLike str)
=> str -> Selector -> Scraper str [str]
attrs name s = MkScraper
$ fmap catMaybes . withAll (tagsToAttr nameStr) . select s
where nameStr = TagSoup.castString name
where nameStr = castString name

-- | The 'position' function is intended to be used within the do-block of a
-- `chroots` call. Within the do-block position will return the index of the
Expand Down Expand Up @@ -211,7 +212,7 @@ attrs name s = MkScraper
-- , (2, "Third paragraph.")
-- ]
-- @
position :: (TagSoup.StringLike str) => Scraper str Int
position :: (StringLike str) => Scraper str Int
position = MkScraper $ Just . tagsToPosition

withHead :: (a -> b) -> [a] -> Maybe b
Expand All @@ -221,27 +222,27 @@ withHead f (x:_) = Just $ f x
withAll :: (a -> b) -> [a] -> Maybe [b]
withAll f xs = Just $ map f xs

foldSpec :: TagSoup.StringLike str
foldSpec :: StringLike str
=> (TagSoup.Tag str -> str -> str) -> TagSpec str -> str
foldSpec f = Vector.foldr' (f . infoTag) TagSoup.empty . (\(a, _, _) -> a)


tagsToText :: TagSoup.StringLike str => TagSpec str -> str
tagsToText :: StringLike str => TagSpec str -> str
tagsToText = foldSpec f
where
f (TagSoup.TagText str) s = str `TagSoup.append` s
f _ s = s

tagsToHTML :: TagSoup.StringLike str => TagSpec str -> str
tagsToHTML :: StringLike str => TagSpec str -> str
tagsToHTML = foldSpec (\tag s -> TagSoup.renderTags [tag] `TagSoup.append` s)

tagsToInnerHTML :: TagSoup.StringLike str => TagSpec str -> str
tagsToInnerHTML :: StringLike str => TagSpec str -> str
tagsToInnerHTML (tags, tree, ctx)
| len < 2 = TagSoup.empty
| otherwise = tagsToHTML (Vector.slice 1 (len - 2) tags, tree, ctx)
where len = Vector.length tags

tagsToAttr :: (Show str, TagSoup.StringLike str)
tagsToAttr :: (Show str, StringLike str)
=> str -> TagSpec str -> Maybe str
tagsToAttr attr (tags, _, _) = do
guard $ 0 < Vector.length tags
Expand Down
Loading