0

I'm hoping to get some assistance of debugging my script using fuzzywuzzy attribute joins.

I have housing permit data that needs to be joined to parcel data. Unfortunately, the housing permit data is not an exact match with the parcel data (Assign Parcel Number (APN): Housing permit APN = Parcel APN). It may have an extra zero depending the position.

Example Housing permit APN = 1000234 Parcel APN = 100234

So developed a script that would join two tables by fuzzywuzzy method. However I'm running into errors.

ArcGIS Pro Python Script:

    # Import libraries
import arcpy
import pandas as pd
import numpy as np
from fuzzywuzzy import fuzz
from fuzzywuzzy import process

# Define paths for input feature classes
table1_path = r'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb\SACOG_2020_Redux_testcase'
table2_path = r'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb\SACOG_Parcel_2023_Initial_Exptable_CLEANED'

# Load data into NumPy arrays and convert to pandas DataFrames
array1 = arcpy.da.FeatureClassToNumPyArray(table1_path, ['FOR_APN_JOIN_PERMITS'])
array2 = arcpy.da.FeatureClassToNumPyArray(table2_path, ['FOR_APN_JOIN_PARCEL'])
df1 = pd.DataFrame(array1)
df2 = pd.DataFrame(array2)
df1['FOR_APN_JOIN_PERMITS'] = df1['FOR_APN_JOIN_PERMITS'].astype(str).str[:101].fillna('')
df2['FOR_APN_JOIN_PARCEL'] = df2['FOR_APN_JOIN_PARCEL'].astype(str).str[:101].fillna('')

# Define fuzzy matching function
def fuzzy_match(x, choices, scorer, cutoff):
    if x is None: return None
    choices_list = choices.dropna().tolist()
    if not choices_list: return None
    result = process.extractOne(x, choices_list, scorer=scorer)
    return result[0] if result and result[1] >= cutoff else None

# Apply fuzzy matching and merge DataFrames
df1['matched_field'] = df1['FOR_APN_JOIN_PERMITS'].apply(lambda x: fuzzy_match(x, df2['FOR_APN_JOIN_PARCEL'], fuzz.ratio, 75))
result_df = df1.merge(df2, left_on='matched_field', right_on='FOR_APN_JOIN_PARCEL', how='left')

# Convert merged DataFrame to structured NumPy array
structured_array = np.array(list(result_df.to_records(index=False)))
print(structured_array.dtype)

# Define temporary and final output table names and paths
temp_table_name = 'TempTable_2020_Joined_charlie'
output_table_name = 'Test_Joined_Table'
output_table_path = r'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb'
temp_table_path = f"{output_table_path}\\{temp_table_name}"

# Create a new table and add fields
arcpy.CreateTable_management(output_table_path, temp_table_name)
arcpy.AddField_management(temp_table_path, 'FOR_APN_JOIN_PERMITS', 'TEXT', field_length=101)

# Populate the new table with data
arcpy.da.NumPyArrayToTable(structured_array, temp_table_path)

print("Done")

Error encountered

Error encountered


RuntimeError Traceback (most recent call last) In [3]: Line 47: arcpy.da.NumPyArrayToTable(structured_array, temp_table_path)

RuntimeError: table 'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb\TempTable_2020_Joined_charlie' Exists

I understand the run time error but I can't figure out the line 47 error.

0

1 Answer 1

0

There is only one error - the RuntimeError which occurs when you call arcpy.da.NumPyArrayToTable on line 47 of your script.

The output table 'I:\Housing\2020 Housing Contacts & Permits\Housing_2020_Redux\Housing_2020_Redux.gdb\TempTable_2020_Joined_charlie' exists and you need to either set arcpy.env.overwriteOutput = True or delete that table.

1
  • Thank you! I suppose my script works but my output is left to be desired. I'm not getting any joins even though some records come close. Housing permit: 123000123 = Parcel: 12300123 Commented Feb 6, 2024 at 18:24

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.