0

I'm trying to echo the contents of this link and it exhibits what to me is bizarre behavior.

git@gud:/home/git$ URL="https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv"
git@gud:/home/git$ content=$(wget $URL -q -O -)
git@gud:/home/git$ echo $content
2003,12,31,3,12374_month,day_of_week,births

I expected this code to print the contents as I see them when I open the link on a browser. But instead, the output, on its entirety, is 2003,12,31,3,12374_month,day_of_week,births, that's it.

I actually see this behaviour locally as well, after downloading the file. Tried it both using curl and simply copy and pasting into a text editor and saving the file. They all exhibit the same behavior. The same happens with cat, cut, head, tail and even awk.

This doesn't happen with other files and works fine on Python. What am I missing? How do I get it to work?

I realize that the file doesn't end with a new line character, but adding it doesn't fix it.

I'm on Ubuntu 18.04.1 LTS and the CLI I'm using is Bash release 4.4.19(1).

4
  • I have a hunch about what the problem is, but what behavior do you actually see? Commented Dec 31, 2018 at 19:58
  • @DavidZ The output is 2003,12,31,3,12374_month,day_of_week,births, that's it. Nothing more. It's the last line over a part of the first line. Commented Dec 31, 2018 at 20:00
  • Ah, I see. It would be useful to edit the question to mention that what you see is the last line over part of the first line - or alternatively, you could include the first and last couple lines in the question (maybe using ... to indicate that the middle has been omitted). Try to make it so that, if someone is not able to access the file at the link, they can still understand what's going wrong. Commented Dec 31, 2018 at 20:02
  • @DavidZ That's actually the whole output. I hopefully have made this clear now. Thank you. Commented Dec 31, 2018 at 20:05

1 Answer 1

3

The data file uses Mac-style end-of-line markers (carriage return only). When you echo the content, or just cat the file, the lines are all printing over eachother. If you were to view the file with less or vim, you would see the complete content.

Try this:

$ URL="https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv"
$ curl -o data.csv "$URL"

The wc command thinks that the file has zero lines:

$ wc -l data.csv
0 data.csv

Now let's translate those end-of-line markers:

$ tr '\r' '\n' < data.csv > data-modified.csv

wc now sees a more reasonable number of lines:

$ wc -l data-modified.csv
3652 data-modified.csv

And if we were to cat the file:

$ cat data-modified.csv
.
.
.
2003,12,28,7,7645
2003,12,29,1,12823
2003,12,30,2,14438
2003,12,31,3,12374
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much. I have a follow up question (let me know if this is stretching it), hopefully you can help me. When I read the file in Python, I see \n, not \r, so it was easy for me to dismiss it being a problem with the end of line character. See here. Why is this happening? What knowledge am I missing?
Would it? I know that tr will be installed pretty much anywhere. That's less true for mac2unix.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.