microsoft / RecursiveExtractor

About

Recursive Extractor is a Cross-Platform .NET Standard 2.0 Library, Progressive Web App and Command Line Program for parsing archive files and disk images, including nested archives and disk images.

Supported File Types


7zip+	ar	bzip2
deb	gzip	iso
rar^	tar	vhd
vhdx	vmdk	wim*
xzip	zip+

Details

* Windows only
+ Encryption Supported
^ Rar version 4 Encryption supported

Variants

Browser

You can run Recursive Extractor directly in your browser and install it as a Progressive Web App.

It runs entirely locally using WebAssembly with no connectivity requirement.

Cli

Installing

Ensure you have the latest .NET SDK.
run dotnet tool install -g Microsoft.CST.RecursiveExtractor.Cli

This adds RecursiveExtractor to your path so you can run it directly from the shell.

Running

Basic usage is: RecursiveExtractor --input archive.ext --output outputDirectory

Detailed Usage

input: The path to the Archive to extract.
output: The path a directory to extract into.
passwords: A comma separated list of passwords to use for archives.
allow-filters: A comma separated list of regexes to require each extracted file match.
deny-filters: A comma separated list of regexes to require each extracted file not match.

For example, to extract only ".cs" files:

RecursiveExtractor --input archive.ext --output outputDirectory --allow-filters .cs$

Run "RecursiveExtractor --help" for more details.

Library

Recursive Extractor is available on NuGet as Microsoft.CST.RecursiveExtractor.

Usage

This code adapted from the Cli extracts the contents of given archive located at options.Input to a directory located at options.Output.

using Microsoft.CST.RecursiveExtractor;

var extractor = new Extractor();
var extractorOptions = new ExtractorOptions()
{
    ExtractSelfOnFail = true,
    Parallel = true,
};
extractor.ExtractToDirectory(options.Output, options.Input, extractorOptions);

Async Usage

This example of using the async API prints out all the file names found from the archive located at the path.

var path = "/Path/To/Your/Archive"
var extractor = new Extractor();
try {
    IEnumerable<FileEntry> results = extractor.ExtractFileAsync(path);
    await foreach(var found in results)
    {
        Console.WriteLine(found.FullPath);
    }
}
catch(OverflowException)
{
    // This means Recursive Extractor has detected a Quine or Zip Bomb
}

The FileEntry Object

The Extractor returns `FileEntry` objects. These objects contain a `Content` Stream of the file contents.

public Stream Content { get; }
public string FullPath { get; }
public string Name { get; }
public FileEntry? Parent { get; }
public string? ParentPath { get; }

Extracting Encrypted Archives

You can provide passwords to use to decrypt archives, paired with a Regex that will operate against the Name of the Archive.

var path = "/Path/To/Your/Archive"
var directory
var extractor = new Extractor();
try {
    IEnumerable<FileEntry> results = extractor.ExtractFile(path, new ExtractorOptions()
    {
        Passwords = new Dictionary<Regex, List<string>>()
        {
            { new Regex("\.zip"), new List<string>(){ "PasswordForZipFiles" } },
            { new Regex("\.7z"), new List<string>(){ "PasswordFor7zFiles" } },
            { new Regex(".*"), new List<string>(){ "PasswordForAllFiles" } }

        }
    });
    foreach(var found in results)
    {
        Console.WriteLine(found.FullPath);
    }
}
catch(OverflowException)
{
    // This means Recursive Extractor has detected a Quine or Zip Bomb
}

Exceptions

RecursiveExtractor protects against ZipSlip, Quines, and Zip Bombs. Calls to Extract will throw an OverflowException when a Quine or Zip bomb is detected.

Otherwise, invalid files found while crawling will emit a logger message and be skipped. RecursiveExtractor uses NLog for logging.

Feedback

If you have any issues or feature requests you can open a new Issue.

If you have an archive you are having trouble parsing, please include it in your feedback.

Dependencies

Recursive Extractor uses a number of libraries to parse archives.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Jul	AUG	Sep
	22
2019	2020	2021

microsoft / RecursiveExtractor

README.md

About

Supported File Types

Variants

Browser

Cli

Installing

Running

Library

Usage

Exceptions

Feedback

Dependencies

Contributing

About

Contributors 3

Languages

microsoft / RecursiveExtractor

Join GitHub today

Clone with HTTPS

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio

Latest commit

Git stats

Files

README.md

About

Supported File Types

Variants

Browser

Cli

Installing

Running

Library

Usage

Exceptions

Feedback

Dependencies

Contributing

About

Topics

Resources

License

Contributors 3

Languages