1

I have a large CSV file with about 10000 rows of text information and each row of my dataset consists of a number of lines. However I just want to read say first 20 lines of each of the rows of my CSV file.

I came across n_rows parameter in pandas read_csv method which helps us in limiting the number of rows of the dataset that we would like to load. Is there also a way where we could only read first 20 lines of data from each of the rows in pandas?

3
  • 1
    Can you post the example dataset? Commented Jun 3, 2020 at 8:24
  • Doesn't a newline terminate a "row" in a CSV file? How can a row consist of multiple lines? Commented Jun 3, 2020 at 8:33
  • Thank you. The solution provided by Arun works for me and I would edit my question. Commented Jun 3, 2020 at 9:40

3 Answers 3

1

You can read in the csv with df = pd.read_csv('path\file.csv') and than just select the first 20 rows by df_new = df.head(20). Is that what you where thinking of?

Sign up to request clarification or add additional context in comments.

Comments

0

If I get your question correctly, your CSV file has multiple rows, where each row has multiple lines separated by the newline character '\n'. And you want to choose the first (say, for example) 3 lines from each row.

This can be achieved as:

# Read in CSV file using pandas-
data = pd.read_csv("example.csv")

# The first two rows (toy example) of dataset are-
data.iloc[0,0]
# 'Hello, this is first line\nAnd this is the second line\nThird and final line'

data.iloc[1,0]
# 'Today is 3rd June\nThe year is 2020\nSummer weather'

# First row's first line-
data.iloc[0,0].split("\n")[0]
# 'Hello, this is first line'

# First row's first two lines-
data.iloc[0,0].split("\n")[0:2]
# ['Hello, this is first line', 'And this is the second line']

The general syntax to get the first 'n' lines from row 'x' (assuming that the first column has the string data) is:

data.iloc[x,0].split("\n")[:n]

To select the first 'm' lines (assuming there are m lines or more) from the first 'x' rows, use the code:

data.iloc[:x, 0].apply(lambda y: y.split("\n")[0:m])

Does this help?

3 Comments

Thanks arun. This was something I was exactly looking for and one other thing. How can I apply your general syntax to an entire dataframe. For example I have 50 rows of data and from your syntax I can get first 'n' lines of a specific row 'x'. But if I want to do it for all rows and store the results to a new dataframe?
@dmorgan I have added code to answer your question. If it answers your question, can you mark it as helping your question?
Thanks arun. Will do
0

If TiTo's answer is not what you are looking for, maybe the iloc method is. You can store the first 20 rows by doing firstRows = df.iloc[:20].

However, if you only ever need the first 20 rows, you shouldn't load the whole file into memory. As you mentioned, this can be achieved with the nrows parameter.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.