3
\$\begingroup\$

I use the below code to loop through a list of data and search through all the sub-folders in a given folder for files which match my requirements. Right now it takes this code about 22 minutes to run, and after a "quick" time study, I found it takes me about the same amount of time to do this step manually. I'm fairly green when it comes to VBA loops, and I'm sure the reason for the long runtime is all the looping.

I was hoping someone could review the code and find some ways for me to optimize the code to help it run quicker.

Sub loopAllSubFolderSelectStartDirectory()

Dim FSOLibrary As FileSystemObject
Dim FSOFolder As Object
Dim folderName As String
Dim rowNums As Variant

Debug.Print Time
lrow = ThisWorkbook.Sheets("240812").Range("D1048576").End(xlUp).Row
Debug.Print lrow
For i = 3 To lrow
rowNums = i

folderName = "C:\Users\jdesantis\Documents\temp240813\"

Set FSOLibrary = New FileSystemObject

LoopAllSubFolders FSOLibrary.GetFolder(folderName), rowNums
Next i
Debug.Print Time
End Sub

Sub LoopAllSubFolders(FSOFolder As Object, rowNum As Variant)

Dim FSOSubFolder As Object
Dim FSOFile As Object

Dim PartNumber As String
Dim PurchaseOrder As String

PartNumber = Cells(rowNum, 4).Value
PurchaseOrder = Cells(rowNum, 5).Value

For Each FSOSubFolder In FSOFolder.SubFolders
    LoopAllSubFolders FSOSubFolder, rowNum
Next

For Each FSOFile In FSOFolder.Files
    If FSOFile.Path Like "*" & PartNumber & "-*-20##-" & PurchaseOrder & ".xlsx" Then

        Cells(rowNum, 8).Value = "Inspected"
    End If
Next

End Sub

Thanks for the suggestion jkpieterse, I was able to cut the time down to only a few minutes using the below code:

Sub GetFiles()

    Dim oFSO As Object
    Dim oFolder As Object
    Dim oFile As Object, sf
    Dim i As Integer, colFolders As New Collection, ws As Worksheet
    Debug.Print Time
    Application.ScreenUpdating = False
    Set ws = Worksheets("Inspected")
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    
    Set oFolder = oFSO.GetFolder("C:\Users\jdesantis\Documents\temp240813\")
    
    colFolders.Add oFolder          'start with this folder
    
    Do While colFolders.Count > 0      'process all folders
        Set oFolder = colFolders(1)    'get a folder to process
        colFolders.Remove 1            'remove item at index 1
    
        For Each oFile In oFolder.Files
         If oFile.Name Like "######-*-20##-###.xlsx*" Then
            With ws
                'ws.Cells(i + 1, 1) = oFile.Name
                .Cells(i + 1, 1).Value = "'" & Mid(oFile.Name, 1, 6) & Mid(oFile.Name, Len(oFile.Name) - 7, 3)
                i = i + 1
            End With
         End If
        
        Next oFile

        'add any subfolders to the collection for processing
        For Each sf In oFolder.SubFolders
            colFolders.Add sf
        Next sf
    Loop
Debug.Print Time
Application.ScreenUpdating = True
End Sub
```
\$\endgroup\$
3
  • 1
    \$\begingroup\$ Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Feel free to edit and give it a different title if there is something more appropriate. \$\endgroup\$ Commented Aug 13, 2024 at 23:35
  • 4
    \$\begingroup\$ Have you tried to use Data, Get Data, From File, From Folder to get the list of files into Excel directly? You could then use MATCH using wildcards to see if the file you are looking for exists. That way, reading the file list only has to be done once. Which is something you could implement in your VBA code too: make sure it only reads the list of files once rather than n times. \$\endgroup\$ Commented Aug 14, 2024 at 13:17
  • 1
    \$\begingroup\$ This post inspired me to write: Evaluating WinAPI, PowerQuery, PowerShell, and FileSystemObject for File Retrieval Using VBA. \$\endgroup\$ Commented Aug 24, 2024 at 17:03

1 Answer 1

4
\$\begingroup\$

Code Optimizations

To improve performance when making multiple cell updates, set Application.Calculation to xlCalculationManual during the process. For even greater efficiency, store the new cell values in an array and update all the cells in a single operation.

Const folderName As String = "C:\Users\jdesantis\Documents\temp240813\"

Using Const for immutable values clearly indicates that the value is constant and should not change throughout the program. This enhances code clarity and reduces the potential for errors.

Clear Old Values The old data is never cleared. This leads to potential leftover data when the number of files is less than the last time the code ran.

Power Query

Jkpieterse's first recommendation was to "use Data, Get Data, From File, From Folder to get the list of files into Excel directly". He was referring to using Power Query (PQ) to get the list of files. This is the first time I used PQ to get files and WOW! The following code replicates the OP's GetFiles(). It returns the search the folders and subfolders of the target directory, and extracts the part numbers from the files that matches the pattern used in the original code.

Execution time for a folder containing +500k subfolders and files:

GetFiles(): 6 minutes and 11 seconds

AddPartNumberQueryAndTable(): 29 seconds

Refreshing the Query Table: 25 seconds

Sub AddPartNumberQueryAndTable() Debug.Print Time Const QueryName As String = "PartNumberQuery" Const TableName As String = "PartNumberTable" Const RootFolderPath As String = "C:\Users\jdesantis\Documents\temp240813"

    On Error Resume Next
    ActiveWorkbook.Queries(QueryName).Delete
    On Error GoTo 0
        
    ActiveWorkbook.Queries.Add Name:=QueryName, Formula:= _
        "let" & vbCrLf & _
        "    Source = Folder.Files(" & Chr(34) & RootFolderPath & Chr(34) & ")," & vbCrLf & _
        "    #""Removed Columns"" = Table.RemoveColumns(Source, {""Content"", ""Date accessed"", ""Date modified"", ""Date created"", ""Attributes"", ""Folder Path"", ""Extension""})," & vbCrLf & _
        "    #""Split Column by Delimiter"" = Table.SplitColumn(#""Removed Columns"", ""Name"", Splitter.SplitTextByDelimiter(""-"", QuoteStyle.Csv), {""Part1"", ""Part2"", ""Part3"", ""Part4""})," & vbCrLf & _
        "    #""Removed Other Columns"" = Table.SelectColumns(#""Split Column by Delimiter"", {""Part1"", ""Part2"", ""Part3"", ""Part4""})," & vbCrLf & _
        "    #""Removed Columns1"" = Table.RemoveColumns(#""Removed Other Columns"",{""Part2""})," & vbCrLf & _
        "    #""Remove Extension from Part4"" = Table.TransformColumns(#""Removed Columns1"", {{""Part4"", each Text.BeforeDelimiter(_, "".""), type text}})," & vbCrLf & _
        "    #""Filtered Rows"" = Table.SelectRows(#""Remove Extension from Part4"", each " & vbCrLf & _
        "        Text.Length([Part1]) = 6 and " & vbCrLf & _
        "        Text.Length([Part4]) = 3 and " & vbCrLf & _
        "        Text.Select([Part1], {""0""..""9""}) = [Part1] and " & vbCrLf & _
        "        Text.Select([Part4], {""0""..""9""}) = [Part4] and" & vbCrLf & _
        "        Text.StartsWith([Part3], ""20""))," & vbCrLf & _
        "    #""Added Custom Column"" = Table.AddColumn(#""Filtered Rows"", ""Part Numbers"", each [Part1] & [Part4])," & vbCrLf & _
        "    #""Removed Columns2"" = Table.RemoveColumns(#""Added Custom Column"",{""Part1"", ""Part3"", ""Part4""})," & vbCrLf & _
        "    #""Removed Duplicates"" = Table.Distinct(#""Removed Columns2"")" & vbCrLf & _
        "in" & vbCrLf & _
        "    #""Removed Duplicates"""


    Application.ScreenUpdating = False
    ActiveWorkbook.Worksheets.Add
    With ActiveSheet.ListObjects.Add(SourceType:=0, Source:= _
        "OLEDB;Provider=Microsoft.Mashup.OleDb.1;Data Source=$Workbook$;Location=" & Chr(34) & QueryName & Chr(34) & ";Extended Properties=""""" _
        , Destination:=Range("$A$1")).QueryTable
        .CommandType = xlCmdSql
        .CommandText = Array("SELECT * FROM [" & QueryName & "]")
        .RowNumbers = False
        .FillAdjacentFormulas = False
        .PreserveFormatting = True
        .RefreshOnFileOpen = False
        .BackgroundQuery = True
        .RefreshStyle = xlInsertDeleteCells
        .SavePassword = False
        .SaveData = True
        .AdjustColumnWidth = True
        .RefreshPeriod = 0
        .PreserveColumnInfo = True
        .ListObject.Name = TableName
        .Refresh BackgroundQuery:=False
    End With
    Debug.Print Time
End Sub
\$\endgroup\$
6
  • \$\begingroup\$ Wow! I totally spaced on what he had been suggesting. Thanks for the code. This blows it out of the water. One caveat to add though, this code needs to be added to a module and not on the worksheet level. Found that out the hard way. \$\endgroup\$ Commented Aug 23, 2024 at 16:06
  • \$\begingroup\$ @J.DeSantis Good to hear that it was useful. Thanks for accepting my answer. \$\endgroup\$ Commented Aug 23, 2024 at 21:00
  • \$\begingroup\$ Off topic but it's just occurred to me that you can use PQ to query SharePoint data, so making such a VBA library to query SharePoint Online would be pretty useful. Otherwise you're stuck trying to authenticate using the REST API which is a nightmare in my experience. \$\endgroup\$ Commented Aug 27, 2024 at 22:14
  • 1
    \$\begingroup\$ @Sancarn cool idea. PQ can also be used to retrieve any the file attributes, which, would be nice features to add to the library. I'm actually play around with the idea of using it as a FileSystemwatcher by leveraging QueryTable.RefreshPeriod. \$\endgroup\$ Commented Aug 27, 2024 at 22:51
  • 1
    \$\begingroup\$ @Sancarn QueryTable exposes BeforeRefresh and BeforeRefresh and AfterRefresh events which is nice but has a minimum 1 minute RefreshPeriod. We could trigger an earlier refresh intervals using Application.Wait but that is messy. \$\endgroup\$ Commented Aug 27, 2024 at 23:59

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.