trawler
=======
Getting started
mkdir ~/.trawler/
cp example_token_file.yaml ~/.trawler/default.yaml
vim ~/.trawler/default.yamlPlace your twitter API tokens in ~/.trawler/default.yaml
./trawler -h
./trawler -sn -sn example_screen_names.txtNotes
Useful scripts
The scripts starting with the word save demonstrate various other functionality.
Rate Limits
Most of the interesting functionality is in the class RateLimitedTwitterEndpoint. The class is a wrapper around the (Twython wrapper around the) Twitter API that handles all of the details of rate limiting. It also robustly handles errors that occur when the Twitter servers are temporarily misbehaving.
Once you've created an instance of RateLimitedTwitterEndpoint, call:
endpoint.get_data(twitter_api_parameters)and get_data() will return the data from the Twitter API as soon as possible without violating the Twitter rate limits (and thus the TOS). This means that get_data() may block for up to 15 minutes. All of the classes used by RateLimitedTwitterEndpoint are thread safe.

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
