Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 23 Jetbrains Python Survey #33

Open
dcombs4 opened this issue Mar 30, 2023 · 0 comments
Open

Chapter 23 Jetbrains Python Survey #33

dcombs4 opened this issue Mar 30, 2023 · 0 comments

Comments

@dcombs4
Copy link

dcombs4 commented Mar 30, 2023

The Jetbrains Python survey used in chapter 23 and subsequent chapters is very problematic. I ran into numerous problems when trying to make the jb2 DataFrame on page 233. The first problem was that the number ranges under 'company_size' (e.g. 2-10) were not interpreted correctly by Excel. The hyphen between the two numbers was changed into very strange looking three-character symbols. I had to go into the Excel file and manually change them back into hyphens using Ctrl-H. But that made new problems.

Once the hyphens were inserted, Excel regarded some of the number ranges as dates. For example, 2-10 was turned into 10-Feb. Changing the column format had no effect. After many hours of frustration, I finally discovered that adding a leading space prevented Excel from treating the range as a date.

But then Python had trouble recognizing other number range strings. I kept getting the error "ValueError: invalid literal for int() with base 10: '51-500' ", and others like it. After more frustration I found that many of the string entries in the CSV file had extra spaces, or whitespace. I tried to remove the whitespace all in one sweep using pd.read_csv(jb, delim_whitespace=True), but I only got the following error: ParserError: Error tokenizing data. C error: Expected 194 fields in line 961, saw 215

I had to use Ctrl-H to replace each of the number ranges with whitespace, with a number range without whitespace. As for the ranges that Excel thought were dates, I had to modify the Python code to recognize the needed whitespace.

But that still was not the end. After fixing the strings, the code would not recognize "company_size" as an attribute. It gave me the following error: _"AttributeError: 'DataFrame' object has no attribute _'company_size'__. Again, it took me a few hours, but I finally figured out that the attribute 'company_size' had an extra leading and trailing space, making Python unable to recognize it, since it technically did not match the code.

Bottom line: the Jetbrains survey is not ready to use out of the box, so to speak. Translating the file into CSV creates strange symbols that must be changed internally. Additionally, there is a lot of whitespace surrounding the data entries; without knowing what the whitespace is, it is impossible to make Python read it. Finally, some of the entries need whitespace so that they are not changed into dates, and the Python code must reflect the same thing.

I am still frustrated about this because it took me approximately three days to figure out what was going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant