-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CloudflareProtectedEmailAddress #6
base: master
Are you sure you want to change the base?
Conversation
26fc736
to
d6d9c80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming the plan is to get this working here, and then extract to a gem? Would be good to note that if so.
As per previous discussion, you also need to explain where the algorithm comes from.
require 'cgi' | ||
|
||
class CloudflareProtectedEmailAddress | ||
def initialize(obfuscated:) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your thinking behind making this require a named parameter, rather than just a positional one? If there are likely to be other arguments that might make sense, but here it seems like it just complicates the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was leaning towards verbosity but I see why it complicates the interface.
end | ||
|
||
def key | ||
@key ||= obfuscated[0..1].hex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really expensive enough to need caching? (Especially compared to unobfuscated
)
@key ||= obfuscated[0..1].hex | ||
end | ||
|
||
def unobfuscated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value returned from this still seems fairly obfuscated to me…
end | ||
|
||
it 'will make an obfuscated email address human readable' do | ||
email.must_equal '[email protected]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to have another test here as well. I could make this test suite pass with a MUCH simpler implementation :D
ae51cff
to
2d0f184
Compare
|
|
||
def key | ||
obfuscated[0..1].hex | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key
and obfuscated_characters
here are both taking responsibility for the same thing, but doing it in slightly different ways. This would be cleaner and more maintainable if those both used an underlying method that took care of the conversion e.g.
def hex_pairs
obfuscated.scan(/.{2}/).map(&:hex)
end
def obfuscated_characters
hex_pairs.drop(1)
end
def key
hex_pairs.first
end
(not sure if hex_pairs
is the right name here)
def escaped_email_address | ||
obfuscated_characters.reduce('') do |email, encoded_character| | ||
email + (encoded_character ^ key).to_s(16).prepend('%') | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit over-complicated.
Perhaps a simple map+join would be clearer?
obfuscated_characters.map { |char| (char ^ key).to_s(16).prepend('%') }.join
|
||
def human_readable | ||
return CGI.unescape(escaped_email_address) if CGI.unescape(escaped_email_address).include?('@') | ||
obfuscated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has become much too complicated and confusing now, I fear. If you really need to do this check, I'd suggest at least factoring out that CGI.unescape(escaped_email_address)
to a named method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of this also seems back-to-front. A guard clause should be used to break out in unusual cases, not be the main path, with a fall through to the odd ones.
Again, I'm not convinced you need this at all, but if you do, then it would be better as:
return obfuscated unless unescaped.include? '@'
unescaped
The Paraguay members page contains obfuscated email addresses. When initialized with an encrypted email address from the Paraguay members list page, CloudFlareProtectedEmail#unobfuscated_email will return the human readable address.
1488b48
to
9199f3c
Compare
attr_accessor :obfuscated | ||
|
||
def components | ||
obfuscated.scan(/.{2}/).map(&:hex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious here why you're using /.{2}/
rather than just ..
|
||
attr_accessor :obfuscated | ||
|
||
def components |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components
is a bit too overloaded a term for me to be entirely comfortable with it here — I'm not sure it really conveys what's actually going on, and could be misleading. Is there a better name you can think of?
obfuscated.scan(/.{2}/).map(&:hex) | ||
end | ||
|
||
def obfuscated_characters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually quite a misleading name too, as it returns integers, not characters
components.first | ||
end | ||
|
||
def escaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be easier to follow if it actually mentioned percent encoding (or hex encoding) somewhere. Ideally that would be obvious from the name of the method, though if that's not sensible, then a comment might be called for.
end | ||
|
||
def escaped | ||
obfuscated_characters.map { |char| (char ^ key).to_s(16).prepend('%') }.join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I've worked out what's actually going on here, I'm not entirely sure that .to_s(16)
gets you the correct format. This approach will convert 15 to '%f' when I think it's strictly meant to be '%0F'.
sprintf("#%02X" % i)
should get you the correct value, and means you wouldn't need the prepend
either.
c7ef388
to
ed9f546
Compare
ed9f546
to
54f0000
Compare
This is a step towards fixing: everypolitician/everypolitician-data#34867
Member email addresses have been obfuscated and the scraper is currently capturing a string of javascript instead of the email address:
:email=>"[email protected]/* <![CDATA[ */!function(t,e,r,n,c,a,p){try{t=document.currentScript||function(){for(t=document.getElementsByTagName('script'),e=t.length;e--;)if(t[e].getAttribute('data-cfhash'))return t[e]}();if(t&&(c=t.previousSibling)){p=t.parentNode;if(a=c.getAttribute('data-cfemail')){for(e='',r='0x'+a.substr(0,2)|0,n=2;a.length-n;n+=2)e+='%'+('0'+('0x'+a.substr(n,2)^r).toString(16)).slice(-2);p.replaceChild(document.createTextNode(decodeURIComponent(e)),c)}p.removeChild(t)}}catch(u){}}()/* ]]> */"
For example, when the obfuscated address is
b3d2dfd7dcc5d6c1d2f3d7dac3c6c7d2d7dcc09dd4dcc59dc3ca
, callingwill return:
[email protected]
The javascript comes from the Paraguay member pages. The key part is:
r
is the first character of the input string. The remainder is the obfuscated email. Two digit hexadecimals represent each character.The next step will be to use this class within
MemberPage
.This method of obfuscation is not particular to this site. For example, it is being used on the New Zealand site everypolitician/everypolitician-data#36492. The intention is to develop the class in this scraper before extracting it to a gem.