Return to Answer

replaced http://stackoverflow.com/ with https://stackoverflow.com/

Source Link

edited May 23, 2017 at 11:50

URL Rewriter Bot

Agreeing with Ramon Ramon, Pandas is definitely the way to go, and has extraordinary filtering/sub-setting capability once you get used to it. But it can be tough to first wrap your head around (or at least it was for me!), so I dug up some examples of the sub-setting you need from some of my old code. The variable itu below is a Pandas DataFrame with data on various countries over time.

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US

Agreeing with Ramon, Pandas is definitely the way to go, and has extraordinary filtering/sub-setting capability once you get used to it. But it can be tough to first wrap your head around (or at least it was for me!), so I dug up some examples of the sub-setting you need from some of my old code. The variable itu below is a Pandas DataFrame with data on various countries over time.

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US

Source Link

answered Oct 7, 2014 at 18:10

TCAllen07

1.4k
18
28

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US

Collectives™ on Stack Overflow

Return to Answer