Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WikiEntityUtil - Apostrophe rendered as ’ instead of ' #12

Open
GoogleCodeExporter opened this issue Mar 23, 2015 · 5 comments

Comments

@GoogleCodeExporter
Copy link

WikiEntityUtil translates the "'" (apostrophe) character to the entity
"’".  This entity is not recognized by the SAX parser if you feed it
the HTML generated by WikiModel.  I believe that WikiEntityUtil should be
changed to map this character to "'" which the SAX parser handles
correctly.

Original issue reported on code.google.com by [email protected] on 8 Jan 2008 at 4:41

@GoogleCodeExporter
Copy link
Author

This is one possible solution. We can replace all XHTML entities by the 
corresponding
digital codes. In this case all elements will be recognized by XML parsers. 
Right now
the XHTML output can not be parsed as is: many entities are not defined in the 
XML
header (but they are recognized as such by browsers).
Example of entities to replace:
{{{
 
"
«
»
©
...
}}}

The full list of entities to replace see the 
org.wikimodel.wem.util.WikiEntityUtil class.

(http://wikimodel.googlecode.com/svn/trunk/org.wikimodel.wem/src/main/java/org/w
ikimodel/wem/util/WikiEntityUtil.java)

Original comment by [email protected] on 10 Jan 2008 at 3:37

  • Changed state: Accepted

@GoogleCodeExporter
Copy link
Author

If you think this is a valid solution, it sounds good to me.

Original comment by [email protected] on 10 Jan 2008 at 4:31

@GoogleCodeExporter
Copy link
Author

Actually, I just realized what you are saying here.  Is there some way to 
pre-define
the common ones used in HTML like:

{{{
 
<
>
&
"
'
}}}

It would by nice if the parser would handle these common character entities.

Original comment by [email protected] on 11 Jan 2008 at 1:07

@GoogleCodeExporter
Copy link
Author

I just tried and the standard all of the above are already handled except for 
 .
 I have searched information on SAX parsers and the only way I have found to add
entities is to define them in a dtd.  This means that the input xml must have a 
dtd
defined.  Even then, I'm not sure if this means that DTD validation must be 
turned on
for the SAX parser to recognize the character entities.

Original comment by [email protected] on 11 Jan 2008 at 5:29

@GoogleCodeExporter
Copy link
Author

Danny, I think I've fixed this some time ago by adding the XHTML DTDs to the 
XHTML
parser. Could you try again and let me know if all is working for you?

Thanks

Original comment by [email protected] on 26 Oct 2008 at 1:52

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant