60
votes
Python web-scraper to download table of transistor counts from Wikipedia
Is there a way to simplify this code?
Yes. Don't scrape Wikipedia. Your first thought before "should I need to scrape this thing?" should be "Is there an API that can give me the data I want?" In this ...
51
votes
Python web-scraper to download table of transistor counts from Wikipedia
Let me tell you about IMPORTHTML()...
So here's all the code you need in Google Sheets:
...
42
votes
Web scraping the titles and descriptions of trending YouTube videos
Why web-scrape, when you can get the data properly through the YouTube Data API, requesting the mostpopular list of videos? If you make a ...
18
votes
Accepted
Parsing RFC 4180 CSV with GOTOs
1) I would save constants (doubleQuote, etc.) as fields, so they don't take up extra space in already fairly large method body.
2) I think your use of ...
18
votes
Accepted
System that manages employee data for managers
Welcome to Code Review! Kudos to writing a fairly large program.
Several things pop-out from your program. But, a few things first. If you are using any intelligent editor; please see if you can get a ...
17
votes
Sensor logger for Raspberry Pi in a stratospheric probe
Do not call main recursively. You are setting yourself up for stack overflow. Consider instead
...
16
votes
Web scraping the titles and descriptions of trending YouTube videos
Context manager
You open a file at the beginning of the program and close it explicitly at the end.
Python provides a nice way to allocate and release resources (such as files) easily: they are ...
16
votes
Python web-scraper to download table of transistor counts from Wikipedia
Make sure to follow naming conventions. You name two variables inappropriately:
...
14
votes
System that manages employee data for managers
Don't do wildcard imports
Use import tkinter as tk and then prefix all tk classes and commands with tk. (eg: ...
14
votes
Read CSV and use bidirectional BFS to find the shortest connections between actors
It looks like you found a pretty nice approach there already.
A few general things stand out:
Using globals
Magical constants and unsafe C-style arrays/raw pointers:
...
13
votes
Accepted
Aggregate prescriptions per drug in a CSV file
Performance
Your performance problem is primarily due to the duplicate_id() function, namely the if id in id_list test. ...
12
votes
Parsing RFC 4180 CSV with GOTOs
The biggest problem with gotos is that they make it really hard to refactor code easily.
Your loop body alone is 66 lines of code long and has a nesting depth of 5. Length of code and nesting depth ...
12
votes
Are there ways to speed up this string fuzzy matching in Golang?
Review I - package go-fuzzywuzzy
After a quick read of your Go code, the Go code for package go-fuzzywuzzy, and the Go ...
11
votes
Accepted
CSV file reader in PHP that supports large files (>15k lines)
Performance
As performance is your main concern, let's face this first. To complete the example CSV-file with ~36k lines your original script needs around 139s*.
The main bottlenecks are ...
11
votes
Web scraping the titles and descriptions of trending YouTube videos
I'd definitely look into using an API directly as @200_success suggested to avoid any web-scraping or HTML parsing, but here are some additional suggestions to improve your current code focused mostly ...
11
votes
Accepted
Sensor logger for Raspberry Pi in a stratospheric probe
Have you already executed the code to see how it performs and if the battery will last? There is that famous Donald Knuth quote saying premature optimization is the root of all evil (or at least most ...
10
votes
Accepted
Searching for data from file1 in file2
There are things to improve:
when you use with context manager, you don't need to close the file explicitly
you can combine two context managers into one reducing ...
10
votes
Accepted
Reducing the amount of duplicated code (python) - cricket matches
Disclaimer: I know nothing whatsoever about the rules of cricket.
Useless check
Because either info['toss_decision'] equals ...
9
votes
Parse one table out of CSV with multiple titled tables with the Python CSV module
In general this code look very nice and seems to follow style guidelines well. I found your code easy to follow. You asked for:
... suggestions to make the code more compact, idiomatic, and fit for ...
9
votes
Accepted
Transform dict of lists to CSV
As per PEP 8, the standard indentation for Python code is 4 spaces. Since whitespace is significant in Python, this is a pretty strong convention that you should follow.
The code itself isn't bad, ...
9
votes
Accepted
Calculate average values for each day of the week for each Meter
Please note: the code has been reviewed in two parts.
Some modifications about the first part can be found under update #1 .
As I suggested in the comments you can take advantage of CsvHelper to ...
9
votes
adding data to a CSV file for it to be read
DRY
You should store the long file path into a variable instead of replicating
it multiple times:
...
9
votes
Accepted
Very simple CSV-parser in Java
Side-effects
Stream.forEach() operation should be utilized with care since it operates via side-effects and should not be used as a substitution of a proper ...
8
votes
Accepted
Convert all CSV files in a given directory to JSON using Python
Looks good to me. It's a perfectly sensible approach.
There's just one line I'm going to criticize:
csvfile = os.path.splitext(filename)[0]
Picking off ...
8
votes
Accepted
Compute mean, variance and standard deviation of CSV number file
Portability
#import is a GCC extension (or perhaps a preview of C++20).
There's no good reason not to simply ...
8
votes
String Similarity using fuzzywuzzy on big data
The first algorithmic recommendation is to use itertools.combinations instead of .permutations, since you don't care about order....
8
votes
Aggregate prescriptions per drug in a CSV file
If your dataset is large, but not larger than memory, you might want to consider using pandas for this:
...
8
votes
Sensor logger for Raspberry Pi in a stratospheric probe
Opening and closing files takes resources:
with open('babar.txt', 'a') as f: f.write('a'*10000)
takes 300 micro-seconds while:
...
8
votes
Read/write a pipe-delimited file line by line with some simple text manipulation
Here is another way to organize your code. Instead of an if within the loop, use iterators more explicitly. Concretely:
...
Only top scored, non community-wiki answers of a minimum length are eligible
Related Tags
csv × 648python × 304
performance × 121
beginner × 84
python-3.x × 79
c# × 60
java × 57
parsing × 54
pandas × 42
c++ × 39
python-2.x × 35
php × 28
file × 28
json × 25
powershell × 25
excel × 21
object-oriented × 19
javascript × 18
ruby × 18
hash-map × 15
regex × 15
statistics × 15
c × 14
strings × 14
datetime × 14