ROUGE scoring See et al. (2017)
Repository to replicate the ROUGE scores from See et al. (2017).
We find that the reported scores correspond to those produced by the python re-implementation py-rouge, instead of those by produced by the official Rouge 155 Perl wrapper pyrouge.
The evaluate.py script accepts a 'hypothesis' folder and a 'reference' folder. The ROUGE scores computed with py-rouge and pyrouge respectively are then computed and printed to standard out.
The test_output folder, contains the test outputs from See et al. (2017), that can be downloaded from the README.md of the official repository.
Setup
pip install py-rouge pyrouge
pyrouge prerequisites
Ensure Perl XML library is installed:
On Arch Linux: sudo pacman -S perl-xml-xpath
On Ubuntu: sudo apt-get install libxml-parser-perl
ROUGE 155 install tips/debugging:
https://stackoverflow.com/questions/47045436/how-to-install-the-python-package-pyrouge-on-microsoft-windows
Evaluate
Note that pyrouge evaluates ~4x as slow as py-rouge, so some patience is required.
# Evaluate Pointer Generator
$> python evaluate.py test_output/pointer-gen test_output/reference
Python (py-rouge) scores:
ROUGE-1 (F1): 36.43
ROUGE-2 (F1): 15.66
ROUGE-L (F1): 33.42
Perl (pyrouge) scores:
ROUGE-1 (F1): 36.16
ROUGE-2 (F1): 15.61
ROUGE-L (F1): 33.21
# Evaluate Pointer Generator + Coverage
$> python evaluate.py test_output/pointer-gen-cov test_output/reference
Python (py-rouge) scores:
ROUGE-1 (F1): 39.53
ROUGE-2 (F1): 17.28
ROUGE-L (F1): 36.38
Perl (pyrouge) scores:
ROUGE-1 (F1): 39.24
ROUGE-2 (F1): 17.22
ROUGE-L (F1): 36.15

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.
