I'm hoping to get some assistance of debugging my script using fuzzywuzzy attribute joins.
I have housing permit data that needs to be joined to parcel data. Unfortunately, the housing permit data is not an exact match with the parcel data (Assign Parcel Number (APN): Housing permit APN = Parcel APN). It may have an extra zero depending the position.
Example Housing permit APN = 1000234 Parcel APN = 100234
So developed a script that would join two tables by fuzzywuzzy method. However I'm running into errors.
ArcGIS Pro Python Script:
# Import libraries
import arcpy
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
# Define paths for input feature classes
table1_path = r'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb\SACOG_2020_Redux_testcase'
table2_path = r'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb\SACOG_Parcel_2023_Initial_Exptable_CLEANED'
# Load data into NumPy arrays and convert to pandas DataFrames
array1 = arcpy.da.FeatureClassToNumPyArray(table1_path, ['FOR_APN_JOIN_PERMITS'])
array2 = arcpy.da.FeatureClassToNumPyArray(table2_path, ['FOR_APN_JOIN_PARCEL'])
df1 = pd.DataFrame(array1)
df2 = pd.DataFrame(array2)
df1['FOR_APN_JOIN_PERMITS'] = df1['FOR_APN_JOIN_PERMITS'].astype(str).str[:101].fillna('')
df2['FOR_APN_JOIN_PARCEL'] = df2['FOR_APN_JOIN_PARCEL'].astype(str).str[:101].fillna('')
# Define fuzzy matching function
def fuzzy_match(x, choices, scorer, cutoff):
if x is None: return None
choices_list = choices.dropna().tolist()
if not choices_list: return None
result = process.extractOne(x, choices_list, scorer=scorer)
return result[0] if result and result[1] >= cutoff else None
# Apply fuzzy matching and merge DataFrames
df1['matched_field'] = df1['FOR_APN_JOIN_PERMITS'].apply(lambda x: fuzzy_match(x, df2['FOR_APN_JOIN_PARCEL'], fuzz.ratio, 75))
result_df = df1.merge(df2, left_on='matched_field', right_on='FOR_APN_JOIN_PARCEL', how='left')
# Convert merged DataFrame to structured NumPy array
structured_array = np.array(list(result_df.to_records(index=False)))
print(structured_array.dtype)
# Define temporary and final output table names and paths
temp_table_name = 'TempTable_2020_Joined_charlie'
output_table_name = 'Test_Joined_Table'
output_table_path = r'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb'
temp_table_path = f"{output_table_path}\\{temp_table_name}"
# Create a new table and add fields
arcpy.CreateTable_management(output_table_path, temp_table_name)
arcpy.AddField_management(temp_table_path, 'FOR_APN_JOIN_PERMITS', 'TEXT', field_length=101)
# Populate the new table with data
arcpy.da.NumPyArrayToTable(structured_array, temp_table_path)
print("Done")
Error encountered
RuntimeError Traceback (most recent call last) In [3]: Line 47: arcpy.da.NumPyArrayToTable(structured_array, temp_table_path)
RuntimeError: table 'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb\TempTable_2020_Joined_charlie' Exists
I understand the run time error but I can't figure out the line 47 error.
