Skip to main content
60 votes

Python web-scraper to download table of transistor counts from Wikipedia

Is there a way to simplify this code? Yes. Don't scrape Wikipedia. Your first thought before "should I need to scrape this thing?" should be "Is there an API that can give me the data I want?" In this ...
Reinderien's user avatar
  • 71.1k
51 votes

Python web-scraper to download table of transistor counts from Wikipedia

Let me tell you about IMPORTHTML()... So here's all the code you need in Google Sheets: ...
Eric Duminil's user avatar
  • 3,991
42 votes

Web scraping the titles and descriptions of trending YouTube videos

Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a ...
200_success's user avatar
18 votes
Accepted

Parsing RFC 4180 CSV with GOTOs

1) I would save constants (doubleQuote, etc.) as fields, so they don't take up extra space in already fairly large method body. 2) I think your use of ...
Nikita B's user avatar
  • 13.1k
18 votes
Accepted

System that manages employee data for managers

Welcome to Code Review! Kudos to writing a fairly large program. Several things pop-out from your program. But, a few things first. If you are using any intelligent editor; please see if you can get a ...
hjpotter92's user avatar
  • 8,921
17 votes

Sensor logger for Raspberry Pi in a stratospheric probe

Do not call main recursively. You are setting yourself up for stack overflow. Consider instead ...
vnp's user avatar
  • 58.7k
16 votes

Web scraping the titles and descriptions of trending YouTube videos

Context manager You open a file at the beginning of the program and close it explicitly at the end. Python provides a nice way to allocate and release resources (such as files) easily: they are ...
SylvainD's user avatar
  • 29.8k
16 votes

Python web-scraper to download table of transistor counts from Wikipedia

Make sure to follow naming conventions. You name two variables inappropriately: ...
Carcigenicate's user avatar
14 votes

System that manages employee data for managers

Don't do wildcard imports Use import tkinter as tk and then prefix all tk classes and commands with tk. (eg: ...
Bryan Oakley's user avatar
  • 2,169
14 votes

Read CSV and use bidirectional BFS to find the shortest connections between actors

It looks like you found a pretty nice approach there already. A few general things stand out: Using globals Magical constants and unsafe C-style arrays/raw pointers: ...
sehe's user avatar
  • 1,398
13 votes
Accepted

Aggregate prescriptions per drug in a CSV file

Performance Your performance problem is primarily due to the duplicate_id() function, namely the if id in id_list test. ...
200_success's user avatar
12 votes

Parsing RFC 4180 CSV with GOTOs

The biggest problem with gotos is that they make it really hard to refactor code easily. Your loop body alone is 66 lines of code long and has a nesting depth of 5. Length of code and nesting depth ...
Voo's user avatar
  • 525
12 votes

Are there ways to speed up this string fuzzy matching in Golang?

Review I - package go-fuzzywuzzy After a quick read of your Go code, the Go code for package go-fuzzywuzzy, and the Go ...
peterSO's user avatar
  • 3,591
11 votes
Accepted

CSV file reader in PHP that supports large files (>15k lines)

Performance As performance is your main concern, let's face this first. To complete the example CSV-file with ~36k lines your original script needs around 139s*. The main bottlenecks are ...
insertusernamehere's user avatar
11 votes

Web scraping the titles and descriptions of trending YouTube videos

I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly ...
alecxe's user avatar
  • 17.5k
11 votes
Accepted

Sensor logger for Raspberry Pi in a stratospheric probe

Have you already executed the code to see how it performs and if the battery will last? There is that famous Donald Knuth quote saying premature optimization is the root of all evil (or at least most ...
Kim's user avatar
  • 226
10 votes
Accepted

Searching for data from file1 in file2

There are things to improve: when you use with context manager, you don't need to close the file explicitly you can combine two context managers into one reducing ...
alecxe's user avatar
  • 17.5k
10 votes
Accepted

Reducing the amount of duplicated code (python) - cricket matches

Disclaimer: I know nothing whatsoever about the rules of cricket. Useless check Because either info['toss_decision'] equals ...
SylvainD's user avatar
  • 29.8k
9 votes

Parse one table out of CSV with multiple titled tables with the Python CSV module

In general this code look very nice and seems to follow style guidelines well. I found your code easy to follow. You asked for: ... suggestions to make the code more compact, idiomatic, and fit for ...
Stephen Rauch's user avatar
9 votes
Accepted

Transform dict of lists to CSV

As per PEP 8, the standard indentation for Python code is 4 spaces. Since whitespace is significant in Python, this is a pretty strong convention that you should follow. The code itself isn't bad, ...
200_success's user avatar
9 votes
Accepted

Calculate average values for each day of the week for each Meter

Please note: the code has been reviewed in two parts. Some modifications about the first part can be found under update #1 . As I suggested in the comments you can take advantage of CsvHelper to ...
Peter Csala's user avatar
  • 10.8k
9 votes

adding data to a CSV file for it to be read

DRY You should store the long file path into a variable instead of replicating it multiple times: ...
toolic's user avatar
  • 15.7k
9 votes
Accepted

Very simple CSV-parser in Java

Side-effects Stream.forEach() operation should be utilized with care since it operates via side-effects and should not be used as a substitution of a proper ...
Alexander Ivanchenko's user avatar
8 votes
Accepted

Convert all CSV files in a given directory to JSON using Python

Looks good to me. It's a perfectly sensible approach. There's just one line I'm going to criticize: csvfile = os.path.splitext(filename)[0] Picking off ...
J_H's user avatar
  • 42.1k
8 votes
Accepted

Compute mean, variance and standard deviation of CSV number file

Portability #import is a GCC extension (or perhaps a preview of C++20). There's no good reason not to simply ...
Toby Speight's user avatar
  • 88.3k
8 votes

String Similarity using fuzzywuzzy on big data

The first algorithmic recommendation is to use itertools.combinations instead of .permutations, since you don't care about order....
scnerd's user avatar
  • 2,090
8 votes

Aggregate prescriptions per drug in a CSV file

If your dataset is large, but not larger than memory, you might want to consider using pandas for this: ...
Graipher's user avatar
  • 41.7k
8 votes

Sensor logger for Raspberry Pi in a stratospheric probe

Opening and closing files takes resources: with open('babar.txt', 'a') as f: f.write('a'*10000) takes 300 micro-seconds while: ...
Benoît P's user avatar
  • 809
8 votes

Read/write a pipe-delimited file line by line with some simple text manipulation

Here is another way to organize your code. Instead of an if within the loop, use iterators more explicitly. Concretely: ...
Benjamin Kuykendall's user avatar

Only top scored, non community-wiki answers of a minimum length are eligible