Skip to content

1.5.1 patch version changes#1606

Merged
val-ms merged 14 commits into
Cisco-Talos:dev/1.5.1from
val-ms:1.5.1-prep
Oct 15, 2025
Merged

1.5.1 patch version changes#1606
val-ms merged 14 commits into
Cisco-Talos:dev/1.5.1from
val-ms:1.5.1-prep

Conversation

@val-ms
Copy link
Copy Markdown
Contributor

@val-ms val-ms commented Oct 14, 2025

This pull request:

  1. Updates ClamAV to version 1.5.1 and updates the functionality level macro in libclamav/others.h to match the new release.
  2. Backports critical fixes from Several fixes: ZIP alert-exceeds-max, OOXML properties, PE scan and other performance issues #1599 for the 1.5.1 version.
  3. Updates NEWS.md with 1.5.1 release notes.
val-ms and others added 11 commits October 14, 2025 13:06
Update the FunctionalityLevels enum; correcting the 220/230
inconsistency.

Add a section to the NEWS.md for 1.5.1.
Scanning CL_TYPE_MSEXE that have embedded file type signature matches
for CL_TYPE_MSEXE are incorrectly passing the PE header check for the
contained file, resulting in excessive scan times.

The problem is that the `peinfo` struct needs to have the `offset` set
for the contained `CL_TYPE_MSEXE` match prior to the header check.
Without that, the header check was actually validating the PE header of
the original file, which would always pass when that's a PE, and would
always fail if it's an OLE2 file (the other type which we check for
contained PEs).

The additional code change in this commit is to make it so the `ctx`
parameter must never be NULL, and removing the `map` parameter because,
in practice, that is always from `ctx->fmap`. This is to safeguard
against future changes to the function that may accidentally use `ctx`
without a proper NULL check.

CLAM-2882
The function which indexes a ZIP central directory is not advancing
to the next central directory record thus exceeding the max-files scan
limit for many ZIPs.

CLAM-2884
If csize (and usize) are 0, like with a directory or other empty file
entry, then the functionionality to record file record information when
indexing the central directory and each associated file record will
neglect to store the `local_header_offset` or `local_header_size`.
That causes problems later after sorting the file records and then
checking for overlapping files.

CLAM-2884
Uncompressed ZIP-based TNEF message attachments, like OOXML office
document attachments, get double-extracted because of embedded file type
recognition.

To prevent excessive scan times, disable embedded file type recognition
for TNEF files and relay on TNEF parsing to extract attachments.

CLAM-2885
The ZIP single record search feature is used to find specific files when
parsing OOXML documents. I observed that the core properties for a
PowerPoint file were missing in a test as compared with the previous
release.

The error handling check for the unzip search returns CL_VIRUS when
there is a match, not CL_SUCCESS!

CLAM-2886
Previously for documents containing VBA projects, the VBA was treated
as an object within the document and not as a normalized version of
the document. I apparently switched it say that the VBA is a normalized
version of the document. This kind of makes sense in that presently
Javascript extracted from HTML is treated as a normalized version of the
HTML. But it probably shouldn't.

Normalized layers are treated as the same file as the parent.
So now those older signatures that match on VBA projects using
"Container:CL_TYPE_MSOLE2" are failing to match.

So this commit switches it back. VBA project bits written out to a temp
file for scanning will be treated as being contained within the document.

CLAM-2896

Extracted XLM macros had the same issue.
In regression testing against a large sample set, I found that strictly
disallowing any embedded file identification if any previous layer was
an embedded file resulted in missed detections.

Specifically, I found an MSEXE file which has an embedded RAR, which in
turn had another MSEXE that itself had an embedded 7ZIP containing... malware.
sha256: c3cf573fd3d1568348506bf6dd6152d99368a7dc19037d135d5903bc1958ea85

To make it so ClamAV can extract all that, we must loosen the
restriction and allow prior layers to be embedded, just not the current
layer.

I've also added some logic to prevent attempting to extract an object at
the same offset twice. The `fpt->offset`s appear in order, but if you
have multiple file type magic signatures match on the same address, like
maybe you accidentally load two different .ftm files, then you'd get
duplicates and double-extraction.
As a bonus, I found I could also skip over offsets within a valid ZIP,
ARJ, and CAB since we now have the length of those from the header check
and as I know we don't want to extract embedded contents in that way.
I am seeing missed detections since we changed to prohibit embedded
file type identification when inside an embedded file.
In particular, I'm seeing this issue with PE files that contain multiple
other MSEXE as well as a variety of false positives for PE file headers.

For example, imagine a PE with four concatenated DLL's, like so:
```
  [ EXE file   | DLL #1  | DLL #2  | DLL #3  | DLL Cisco-Talos#4 ]
```

And note that false positives for embedded MSEXE files are fairly common.
So there may be a few mixed in there.

Before limiting embedded file identification we might interpret the file
structure something like this:
```
MSEXE: {
  embedded MSEXE #1: false positive,
  embedded MSEXE #2: false positive,
  embedded MSEXE #3: false positive,
  embedded MSEXE Cisco-Talos#4: DLL #1: {
    embedded MSEXE #1: false positive,
    embedded MSEXE #2: DLL #2: {
      embedded MSEXE #1: DLL #3: {
        embedded MSEXE #1: false positive,
        embedded MSEXE #2: false positive,
        embedded MSEXE #3: false positive,
        embedded MSEXE Cisco-Talos#4: false positive,
        embedded MSEXE Cisco-Talos#5: DLL Cisco-Talos#4
      }
      embedded MSEXE #2: false positive,
      embedded MSEXE #3: false positive,
      embedded MSEXE Cisco-Talos#4: false positive,
      embedded MSEXE Cisco-Talos#5: false positive,
      embedded MSEXE Cisco-Talos#6: DLL Cisco-Talos#4
    }
    embedded MSEXE #3: DLL #3,
    embedded MSEXE Cisco-Talos#4: false positive,
    embedded MSEXE Cisco-Talos#5: false positive,
    embedded MSEXE Cisco-Talos#6: false positive,
    embedded MSEXE Cisco-Talos#7: false positive,
    embedded MSEXE Cisco-Talos#8: DLL Cisco-Talos#4
  }
}
```

This is obviously terrible, which is why why we don't allow detecting
embedded files within other embedded files.
So after we enforce that limit, the same file may be interpreted like
this instead:
```
MSEXE: {
  embedded MSEXE #1:  false positive,
  embedded MSEXE #2:  false positive,
  embedded MSEXE #3:  false positive,
  embedded MSEXE Cisco-Talos#4:  DLL #1,
  embedded MSEXE Cisco-Talos#5:  false positive,
  embedded MSEXE Cisco-Talos#6:  DLL #2,
  embedded MSEXE Cisco-Talos#7:  DLL #3,
  embedded MSEXE Cisco-Talos#8:  false positive,
  embedded MSEXE Cisco-Talos#9:  false positive,
  embedded MSEXE Cisco-Talos#10: false positive,
  embedded MSEXE Cisco-Talos#11: false positive,
  embedded MSEXE Cisco-Talos#12: DLL Cisco-Talos#4
}
```

That's great! Except that we now exceed the "MAX_EMBEDDED_OBJ" limit
for embedded type matches (limit 10, but 12 found). That means we won't
see or extract the 4th DLL anymore.

My solution is to lift the limit when adding an matched MSEXE type.
We already do this for matched ZIPSFX types.
While doing this, I've significantly tidied up the limits checks to
make it more readble, and removed duplicate checks from within the
`ac_addtype()` function.

CLAM-2897
Mismatched declaration and definition.
Large range testing identified some files where image fuzzy hashing
produces different hashes with ClamAV 1.5 vs 1.4.

With my investigation, I found the issue is with changes in Rust library
dependencies, though it actually wasn't any change with the 'image' or
'jpeg-decoder' crates. After running a simple `cargo update` to update
all non-pinned versions.
I confirmed that this does not affect the minimum supported Rust version
(MSRV).

CLAM-2899
If the current layer has a file descriptor, ClamAV is passing the path
for that file to the UnRAR module, even if the RAR we want to scan is
just some small embedded bit (e.g. detected by RARSFX signature).

We need to drop the RAR portion to a new file for the UnRAR module
because it does not accept file buffers to be scanned, only file paths.

CLAM-2900

Note this commit also changes `scanners.c` to use `access()` on Windows
instead of `_access_s()`. ClamAV defines `access()` to map to a custom
`access_w32()` function on Windows. We already use it everywhere else.
@val-ms val-ms force-pushed the 1.5.1-prep branch 2 times, most recently from 5e94637 to 7e3cf58 Compare October 14, 2025 22:37
@val-ms val-ms marked this pull request as ready for review October 14, 2025 22:38
@val-ms val-ms merged commit 0a6802e into Cisco-Talos:dev/1.5.1 Oct 15, 2025
22 of 24 checks passed
@val-ms val-ms deleted the 1.5.1-prep branch October 15, 2025 20:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants