Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insert space when text contains line break #736

Open
seezee opened this issue Nov 11, 2024 · 2 comments
Open

Insert space when text contains line break #736

seezee opened this issue Nov 11, 2024 · 2 comments
Labels
bug Something isn't working Pagefind CLI The CLI responsible for indexing content

Comments

@seezee
Copy link

seezee commented Nov 11, 2024

I use markdown to create

elements like so:

Foo
  : Bar

Which outputs:

<dl>
  <dt>Foo</dt>
  <dd>Bar</dd>
</dl>

In the search result excerpt, this shows up as "FooBar".

It looks like this also occurs between text nodes in any block level element. For instance, if a paragraph ends with "Foo" and the next one starts with "Bar", the excerpt is "FooBar".

Excerpts should respect line breaks/carriage returns and insert a space between elements.

@bglw
Copy link
Contributor

bglw commented Nov 20, 2024

👋 Hey @seezee

Thanks for bringing this up! Pagefind should indeed handle these elements better than it is now.

Excerpts should respect line breaks/carriage returns and insert a space between elements.

There's actually no semantic line break here for Pagefind to spot — the whitespace between HTML elements doesn't carry any meaning and is lost when parsing, so it's equivalent to <dt>Foo</dt><dd>Bar</dd>.

Instead, Pagefind categorizes the elements that should break text into sentences, and makes sure to separate them with spaces and periods if required. dt and dl haven't been categorized correctly, so are being treated as inline elements.

In this case, I'll add handling so that when indexing, the dt is followed by a : if not present, and the dd is followed by .. So indexing:

<dl>
  <dt>Morgawr</dt>
  <dd>A sea serpent</dd>

  <dt>Owlman</dt>
  <dd>A giant owl-like creature</dd>
</dl>

Will come through as Morgawr: A sea serpent. Owlman: A giant owl-like creature.

@bglw bglw added bug Something isn't working Pagefind CLI The CLI responsible for indexing content labels Nov 20, 2024
@seezee
Copy link
Author

seezee commented Nov 20, 2024

Awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Pagefind CLI The CLI responsible for indexing content
Projects
None yet
Development

No branches or pull requests

2 participants