I have a pandas.DataFrame df and would like to add a new column col with one single value "hello". I would like this column to be of dtype category with the single category "hello". I can do the following.
df["col"] = "hello"
df["col"] = df["col"].astype("category")
- Do I really need to write
df["col"]three times in order to achieve this? - After the first line I am worried that the intermediate dataframe
dfmight take up a lot of space before the new column is converted to categorical. (The dataframe is rather large with millions of rows and the value"hello"is actually a much longer string.)
Are there any other straightforward, "short and snappy" ways of achieving this while avoiding the above issues?
An alternative solution is
df["col"] = pd.Categorical(itertools.repeat("hello", len(df)))
but it requires itertools and the use of len(df), and I am not sure how memory usage is under the hood.
df.assign()with passing a dict fordf.astype()would be a scalable way to go. The first can help create as variables, and the next can change dtypes, all in a stepwise manner in the same line of code. Check my answer for detailed examples.