Get Unique Values from a Column in Pandas DataFrame in Python5 Jan 2025 | 4 min read IntroductionOne of the most powerful data manipulation libraries in Python is Pandas. In addition, it provides a range of structured data functions. Actually about the DataFrames in particular, one often just needs to consider only unique values for a certain column. In this chapter we examine some of the methods for obtaining all those elements you require. Understanding Pandas DataFrameSo first off, we'll skip ahead a bit and quickly cover some basic facts about Pandas DataFrames. That is, before getting into the technical details of how to get unique values. A DataFrame is a two-dimensional labeled data table, with rows and columns. It was custom made for data work, sitting on the shoulders of NumPy. Output: Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Alice 25 New York 3 Charlie 35 Los Angeles 4 Bob 30 San Francisco Method 1: Using 'unique()' MethodPandas unique() method is an efficient way to get the unique elements of a column. It returns an array containing only unique values, in the order that they appear in DataFrame. Output: Unique Names: ['Alice' 'Bob' 'Charlie'] In this piece of code, 'unique()' in Pandas gets the unique values from df['Name'] column. unique_names is an array of the original names displayed in order. This print statement displays these special names. Method 2: Using 'value_counts()' MethodIn addition to giving unique values, the 'value_counts()' method also counts their occurrences. If you want to know how many times each unique element occurs in a given column, it can be very useful. Output: Name Counts: Bob 2 Alice 2 Charlie 1 Name: Name, dtype: int64 Here, the 'value_counts()' method is used to extract both unique names and their counts from that Name column. The result of name_counts is a Pandas Series providing the frequency distribution for all unique names. Method 3: Using 'drop_duplicates()' MethodA second way to get unique values is the 'drop_duplicates()' method. Unlike unique(), this method returns a new DataFrame containing no duplicates. Output: DataFrame with Unique Names: Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 3 Charlie 35 Los Angeles Drop duplicate rows based on the 'Name' column (unique_df) using drop_duplicates(). As a result, our DataFrame retains only the first instance of each unique name and we have set which is clean. Method 4: Applying a SetBy definition, Python's set stores only unique elements. If we change a column to set, then finding all the distinct values is easy. Output: Unique Cities: {'San Francisco', 'Los Angeles', 'New York'} This tiny piece of code turns the 'City' column into a set (unique_cities). Since sets, by definition, contain only non-repeated elements this procedure finds the city names that are different from DataFrame and prints them. Method 5: Using 'nunique()' MethodThe method 'nunique()' returns the number of unique elements in a column. It's especially good when what you want is a count of unique values but without having to enumerate them. Output: Number of Unique Names: 3 'nunique()' calculates the number of unique names in that column, returning a single numeric value (num_unique_names). The print statement shows the number of unique names. Method 6: Custom Functions for Unique ValuesIn other cases, you will have to introduce custom logic of your own in order determine unique values. It could also involve using a function which checks for uniqueness based on certain criteria. Output: Unique Names based on Custom Logic: [] A custom function ('custom_unique_check') is defined to check uniqueness according to a specific standard, for example that the name be even in length. This function is then applied to 'Name' using the 'apply() method, and the resulting DataFrame contains all values meeting our custom condition. The names that meet the criterion are then printed in a unique form. ConclusionIn this exhaustive guide we went over how to extract novel values from a column in Pandas DataFrame. Whether your precision requirements dictate the use of built-in methods such as 'unique()', 'value_counts()' and/or, drop duplicates (), or you choose to write custom functions, Pandas offers a range options for meeting all manner of needs. Knowing these skills are essential for data cleansing, preprocessing and analysis work which enable you to appreciate what makes your datasets special. As you continue your work with Pandas DataFrames, learning these methods will make it easier to break down and extract information from your data. Next TopicGet-utc-timestamp-in-python |
Python is a lot more efficient in coping with time and its complexity, and it may forecast time-series statistics, get actual-time records, or every other related hassle with time. The ctime module is a vital module in Python to deal with time-associated troubles. This article will...
3 min read
Embarking on Python projects, from beginner to advanced levels, can be a fulfilling journey. Here's a theoretical overview of what you might encounter along the way: 1. Beginner Level: Basic Syntax and Data Types: At the outset, you'll need to grasp Python's syntax, including variables, data types...
26 min read
How To Calculate Cramer's V in Python? In the following tutorial, we will learn how to calculate the Cramer's V in the Python programming language. So, let's get started. What is Cramer's V? The Cramer's V, by definition, means the length between two specified nominal variables. One kind of...
2 min read
Introduction: Univariate linear regression is a key concept in statistics and machine learning. It acts as the foundation for more sophisticated regression and predictive modelling strategies. We will explore the world of univariate linear regression in this article, emphasizing its foundational ideas, Python implementation, and real-world...
3 min read
Python has a bunch of great libraries and tools for NLP, which give us some cool ways to detect languages. In this guide, we'll check out four Python libraries that can tell English from non-English: langdetect langid pycld2 fastText Let's take a closer look at each of these libraries. The langdetect...
6 min read
Before knowing about the best books to learn Python, let us see why we need to choose Python. Why do we need to choose Python? Python's widespread popularity and continued trendiness can be attributed to several key factors. Its readability and simplicity have made it exceptionally accessible...
8 min read
First Python Program In this Section, we will discuss the basic syntax of Python, we will run a simple program to print Hello World on the console. Python provides us the two ways to run a program: Using Interactive interpreter prompt Using a script file Let's discuss each one of...
7 min read
Python is a robust, flexible programming language with an extensive standard library, including the itertools module. The combinations() method is one of its numerous helpful tools and is particularly handy for handling combinatorial situations quickly. We will investigate the syntax, use cases, and practical applications...
4 min read
Introduction: In this tutorial we are learning about the . Python offers different functions to its users. For vectorization, the Python library provides the NumPy function. NumPy vectorization accepts a NumPy array or hierarchical-level variables as input to the system and produces a NumPy array or...
6 min read
? Developers can efficiently store and manipulate data with Python dictionaries, which are highly versatile data structures. When it comes to persisting this data to external files, one popular choice is the Comma Separated Values (CSV) format. In many spreadsheet programs, CSV files are straightforward, extensively...
6 min read
We request you to subscribe our newsletter for upcoming updates.
We provides tutorials and interview questions of all technology like java tutorial, android, java frameworks
G-13, 2nd Floor, Sec-3, Noida, UP, 201301, India