Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import EMOJI_DATA from emoji==2.0.0 instead of UNICODE_EMOJI #25

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

calebchiam
Copy link

Hey @jfilter, not sure if there's any intentions to upgrade the emoji dependency to v2.0.0, but I looked into it briefly and made the change if you want to integrate it.

@stevefoga
Copy link

@jfilter have there been any updates on this? I believe this fixes #24.

@cvzi
Copy link

cvzi commented Aug 1, 2022

Hi, I am one of the developers of the emoji module. The emoji module also offers a replace_emoji function, which would be a better choice. I think something like this:

def remove_emoji(text):
    return emoji.replace_emoji(text, replace="")

There is another problem here:

def to_ascii_unicode(text, lang="en", no_emoji=False):
"""
Try to represent unicode data in ascii characters similar to what a human
with a US keyboard would choose.
Works great for languages of Western origin, worse the farther the language
gets from Latin-based alphabets. It's based on hand-tuned character mappings
that also contain ascii approximations for symbols and non-Latin alphabets.
"""
# normalize quotes before since this improves transliteration quality
text = fix_strange_quotes(text)
if not no_emoji:
text = demojize(text, use_aliases=True)
lang = lang.lower()
# special handling for German text to preserve umlauts
if lang == "de":
text = save_replace(text, lang=lang)
text = unidecode(text)
# important to remove utility characters
if lang == "de":
text = save_replace(text, lang=lang, back=True)
if not no_emoji:
text = emojize(text, use_aliases=True)
return text

The demojize(text, use_aliases=True) and emojize(text, use_aliases=True) were changed, and now with version 2.0.0 it would be demojize(text, language='alias') and emojize(text, language='alias')

@jfilter
Copy link
Owner

jfilter commented Aug 3, 2022

Hey @cvzi, thanks for bringing this up. I will look into this for the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants