6

I'm trying to hash each value of a python 3.6 pandas dataframe column with the following algorithm on the dataframe-column ORIG:

HK_ORIG = base64.b64encode(hashlib.sha1(str(df.ORIG).encode("UTF-8")).digest())

However, the above mentioned code does not hash each value of the column, so, in order to hash each value of the df-column ORIG, I need to use the apply function. Unfortunatelly, I don't seem to be good enough to get this done.

I imagine it to look like the following code:

df["HK_ORIG"] = str(df['ORIG']).encode("UTF-8")).apply(hashlib.sha1)

I'm looking very much forward to your answers! Many thanks in advance!

1
  • 2
    define it as a function and then: df['ORIG'].apply(func) Commented Jul 4, 2018 at 18:46

1 Answer 1

6

You can either create a named function and apply it - or apply a lambda function. In either case, do as much processing as possible withing the dataframe.

A lambda-based solution:

df['ORIG'].astype(str).str.encode('UTF-8')\
          .apply(lambda x: base64.b64encode(hashlib.sha1(x).digest()))

A named function solution:

def hashme(x):
    return base64.b64encode(hashlib.sha1(x).digest())
df['ORIG'].astype(str).str.encode('UTF-8')\
          .apply(hashme)
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.