Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.
This package features data-science related tasks for developing new recognizers for Presidio. It is used for the evaluation of the entire system, as well as for evaluating specific PII recognizers or PII detection models.
PureKit SDK allows developers to protect users' passwords and sensitive personal information in a database from data breaches and both online and offline attacks and make stolen passwords useless even if a database is breached.
Case study using dotfurther's Open Discover Platform with the RavenDB document store to rapidly create a full-text search/eDiscovery/information governance capable demonstration application.
PureKit PHP SDK allows developers to protect users' passwords and sensitive personal information in a database from data breaches and both online and offline attacks and make stolen passwords useless even if a database is breached.
A Bash script that creates a random CloudApp short URL (ie http://cl.ly/xxxx), checks that URL and if it finds content it, downloads it. Rinses, and repeats.
Redacts the PII information. This package uses Stanford NER package to identify and scrub PII data. It redacts email,ssn,driver license,passport no. It aggressively removes any number with more than 4 consecutive digits. Use AddToWhitelist to whitelist any pattern.
An example demonstrating how Very Good Security can secure a Rails application without any code changes and instantly make it PCI DSS Level 2 compliant.
This easy-to-go solution could make your web-service or website compliant with Russian Federal Law FZ-152. That example will help you setup reverse proxy to catch private data and dump it in a local database.
PureKit SDK allows developers to protect users' passwords and sensitive personal information in a database from data breaches and both online and offline attacks and make stolen passwords useless even if a database is breached.
It is not surprising that deep and shallow scan show different results. Shallow scan only looks at column names. Deep scan looks at a sample of the data. I've even noticed that two different runs of deep scan show different results as sample rows are different. This is the challenge with not scanning all of the data. Its a trade-off between performance/cost and accuracy. There is no right answer.