"Word Lists" for Software Security Test Cases
Word lists, Dictionary Files, Attack Strings, Miscellaneous Datasets and Proof-of-Concept Test Cases With a Collection of Tools for Penetration Testers
- Brief Introduction to
werdlists - Inspiration Taken from Similar Projects
- Repository Directory Hierarchy and Structure
- Folder Names and Description of Contents
Brief Introduction to werdlists ✂️
This project is a collection of word lists--they are mostly whitespace-delimited
or line-based. Although the passes-dicts folder contains inputs for password cracking,
overall the files amassed here are intended to be useful in facilitating
the creation of insecure program state (with the help of a black-box fuzzer or scanning
tool.) The vast majority of files are simply ASCII with the UNIX
style newline. Beware that this project does not attempt in any way to be minimalist or lack verbosity!
Inspiration Taken From Similar Projects 💭
werdlists is very similar to fuzzdb and
SecLists. SecLists is maintained by my former colleague at IOActive, Daniel Miessler.
Admittedly, werdlists is quite similar in mission as it's a centralized attack strings
and input data resource. Regardless, werdlists expands on a number of concepts: it has its own unique style, organization,
original hand-crafted contents, dataset creation/management/validation scripts, scanner springboards, etc.
Unique Features Only Available With werdlists 💯
werdlists cross-references between the code repositories of third-party scanners and its own datasets that each tool will benefit from.
Moreover, there are specialized parsing scripts exclusive to werdlists that extract results produced through pairing test tools with its own data. Output strings are gathered from those results and fed back into the test tools. In other words, there are a number of interactive and/or
tunable feedback loops implemented. Quite a few of the werdlists data files were created this way.
Repository Directory Hierarchy and Structure 🔩
The scripts folder consists of shell scripts used for repository maintenance.
There is a sub-directory of scripts called init where scripts that initialize data files are stored. If a script filename stored in init contains
two dashes, then it's output should reflect the contents of the associated data file. For example, compare manpages-environ
and clib-package-names. All scripts were written using bash syntax.
The contrib folder is for storing scripts contributed via pull request and the utils
folder contains utilities that aren't necessarily specific to the werdlists project, such as scripts for managing any wordlist file.
Other data files were manually composed by hand and a small handful were created by recycling output strings back into input parameter lists, i.e. dirbdirs-feedback
The tools folder lists security tools that the datasets contained in this repository can be provided as input for.
Individual folders are detailed in the Folder Names and Description of Contents section below.
All files in each dataset directory are detailed in the local README.md file for that folder (as opposed to the global README.md in the root directory being read now.)
Naming Scheme, Syntax and Meaning 💬
Most files have the *.txt extension signifying the text/plain MIME type
Often used formats besides plain text include: Comma-Separated Values (text/csv),
Extended Markup Language (application/xml),
Hyper Text Markup Language (application/html), etc.
Any file that is larger than 1MB uncompressed will be compressed with xz
according to the commands in the scripts/xzlarge-files bash script. Other file extensions in use are:
*.ans, *.asc, *.bin, *.c, *.conf, *.cpp, *.csv, *.html, *.inf, *.ini, *.json, *.md, *.rpz, *.rst, *.sh, *.txt, *.xml, *.yaml, *.yml, *.zip, and *.zone.
Folder Names and Description of Contents 📋
| Folder Name | Description of Contents |
|---|---|
| apple-paths | |
| apple-data | |
| arpa-headers | |
| ascii-art | |
| biology-info | |
| browser-data | |
| cert-data | |
| char-encodes | |
| char-sequence | |
| chat-data | |
| cipher-data | |
| cmd-usage | |
| code-keywords | |
| cpu-arch | |
| crypt-output | |
| database-strs | |
| dns-domains | |
| dns-hostnames | |
| dns-records | |
| dns-servers | |
| dns-toplevel | |
| environ-vars | |
| exploit-info | |
| file-extens | |
| file-specs | |
| ftp-data | |
| glibc-data | |
| html-words | |
| http-agents | |
| http-headers | |
| http-methods | |
| http-params | |
| http-security | |
| http-servers | |
| http-status | |
| inet-addrs | |
| inet-routes | |
| inet-services | /etc/services |
| infosec-people | |
| iso-codes | |
| java-data | |
| linux-data | |
| linux-paths | |
| malware-iocs | |
| mobile-devs | |
| net-attacks | |
| net-ifaces | |
| ntfs-paths | |
| owasp-data | |
| passes-dicts | |
| passes-sites | |
| perl-data | |
| php-data | |
| postal-data | |
| python-data | |
| radio-data | |
| regex-data | |
| ruby-data | |
| search-dorks | |
| smtp-messages | |
| soap-messages | |
| social-data | |
| software-strs | |
| string-enums | |
| system-admin | |
| system-notices | |
| telco-data | |
| text-files | |
| text-words | |
| top-secret | |
| unicode-data | |
| unix-data | |
| unix-paths | |
| uri-attacks | |
| uri-schemes | |
| uri-data | |
| vuln-data | |
| webapp-attacks | |
| webapp-data | |
| webapp-dirs | |
| webapp-files | |
| webapp-paths | |
| webapp-words | |
| web-sites | |
| wifi-networks | |
| windows-data |
ans asc bin c conf cpp csv html inf ini json md rpz rst sh txt xml yaml yml zip zone

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

