Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NEWYORKER] Span tags are replaced with a paragraph tag #13

Open
PawanHegde opened this issue May 11, 2019 · 1 comment
Open

[NEWYORKER] Span tags are replaced with a paragraph tag #13

PawanHegde opened this issue May 11, 2019 · 1 comment

Comments

@PawanHegde
Copy link

Sites such as the New Yorker use span elements to make the first element in their article more prominent. These are supposed to appear inline with the rest of the paragraph, but since Crux replaces spans with paragraph tags in the post-process step, the single character occurs as its own paragraph in the output.

https://www.newyorker.com/news/our-columnists/putin-and-trumps-ominous-nostalgia-for-the-second-world-war

We can start retaining span tags in the output without a minimum length (because spans in these cases are usually really short) or remove the span tag and only keep the content.

I can create a PR if you want.

@chimbori
Copy link
Owner

Sure, that sounds good! PRs are always welcomed. All we request is that tests continue to pass, either by updating the tests to match the expected extracted output, or if the existing tests are not affected.

https://github.com/chimbori/crux/blob/master/CONTRIBUTING.md

Thanks, Pawan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants