-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic/ByteOrderMark: improve code coverage #278
Generic/ByteOrderMark: improve code coverage #278
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @rodrigoprimo. Looking good.
There are no tests for the UTF-16 BOM characters, and the sniff does not support BOM characters used in other UTF encodings. I don't know much about this topic, so I'm not sure if we should add that or not.
I'd very much like to see tests added for the UTF-16 BOM characters too. Let me know if you need help generating those files.
As for other encodings - as the sniff does not look for those, we don't need to bother with tests for that.
Also keep in mind, that projects can specify the file encoding used in their file, but that PHPCS defaults to UTF-8 (since PHPCS 3.0.0) and that, in my estimation, it will be exceedingly rare to see PHP files encoded in anything other than ANSI/iso-8859-1 or UTF-8, though, it is, of course, possible.
cf65b45
to
470b05f
Compare
I think I managed to create the files with the UTF-16 BE and UTF-16 LE BOM characters, but there is a significant chance that I missed something, as this was the first time that I was playing with these encodings. So any help is appreciated. I used the following script to create the files (changing the encoding in the <?php
$file = fopen('src/Standards/Generic/Tests/Files/ByteOrderMarkUnitTest.5.inc', 'w');
fwrite($file, "\xFE\xFF");
$phpCode = "<?php\n// File with a UTF-16 Big Endian byte order mark (BOM) character before the opening PHP tag.";
$phpCodeUtf16 = iconv('UTF-8', 'UTF-16BE', $phpCode);
fwrite($file, $phpCodeUtf16);
fclose($file); And I checked the files using $ file src/Standards/Generic/Tests/Files/ByteOrderMarkUnitTest.5.inc
src/Standards/Generic/Tests/Files/ByteOrderMarkUnitTest.5.inc: PHP script, Unicode text, UTF-16, big-endian text
$ od -c src/Standards/Generic/Tests/Files/ByteOrderMarkUnitTest.5.inc
0000000 376 377 \0 < \0 ? \0 p \0 h \0 p \0 \n \0 /
0000020 \0 / \0 \0 F \0 i \0 l \0 e \0 \0 w
0000040 \0 i \0 t \0 h \0 \0 a \0 \0 U \0 T
0000060 \0 F \0 - \0 1 \0 6 \0 \0 B \0 i \0 g
0000100 \0 \0 E \0 n \0 d \0 i \0 a \0 n \0
0000120 \0 b \0 y \0 t \0 e \0 \0 o \0 r \0 d
0000140 \0 e \0 r \0 \0 m \0 a \0 r \0 k \0
0000160 \0 ( \0 B \0 O \0 M \0 ) \0 \0 c \0 h
0000200 \0 a \0 r \0 a \0 c \0 t \0 e \0 r \0
0000220 \0 b \0 e \0 f \0 o \0 r \0 e \0 \0 t
0000240 \0 h \0 e \0 \0 o \0 p \0 e \0 n \0 i
0000260 \0 n \0 g \0 \0 P \0 H \0 P \0 \0 t
0000300 \0 a \0 g \0 . \0 ;
0000310 One potential problem with the files that I created is that when using |
@rodrigoprimo I have checked the encoding of the files (using NotePad++ which has a great little feature for that) and they look fine to me. I also checked the encoding for the other files - The |
Thanks for checking!
Sure, I can do that. I guess the easiest way is to add a character outside of the ASCII (together with a comment explaining why it is there and why it should not be removed) to either Outside the scope of this PR, while checking the sniff code once again, I realized that it might make sense to return the number of tokens instead of |
That would very much depend on what editor you use to make that change and whether it automatically changes the file encoding if needed. It would also depend on the character used as files in iso-8859-1 can contain characters outside ASCII too... Shall I make it easier and just add a commit to sort this out with a file encoding conversion ?
💯 Good catch! I made a change like that for a number of other sniffs not too long ago (PR #95). For this sniff, I think that change should be made for all exit points of the function - not just the Go right ahead and create a PR for this 👍🏻 |
Works for me. Thanks. (I failed to provide more details, but I had checked that my editor defaults to UTF-8 when characters outside of the ANSCII range are added to the file)
Ok, I will create a PR for this. |
I'll do that now and will rebase the PR (and squash some commits) before merging. |
To add more tests that need to go into separate files.
This commit adds four more tests case files: * 2 to cover the cases where the sniff is triggered but the file does not contain a BOM character. One of these files is in UTF-8 encoding without a BOM, the other is in ANSI/iso-8859-1 encoding. * 2 to cover the different UTF-16 encodings with a BOM character. Includes adding a comment to the original test case file to document the encoding (UTF-8 with BOM) of that file.
470b05f
to
a9642e5
Compare
Description
This PR improves code coverage for the Generic.Files.ByteOrderMark sniff. It adds two more tests that need to go into separate files to exercise the code that handles the cases when the files do not contain a BOM character.
There are no tests for the UTF-16 BOM characters, and the sniff does not support BOM characters used in other UTF encodings. I don't know much about this topic, so I'm not sure if we should add that or not.
Related issues/external references
Part of #146
Types of changes
PR checklist