X-Git-Url: https://git.cworth.org/git?p=notmuch-wiki;a=blobdiff_plain;f=corpus.mdwn;h=89821b980f96f1f26af209bb19038cf46f50754d;hp=af99c238f4029d23bacde42da505bd9758684ed6;hb=HEAD;hpb=8a192eff3de0432916828607bdc042fa55ef2115 diff --git a/corpus.mdwn b/corpus.mdwn index af99c23..8834572 100644 --- a/corpus.mdwn +++ b/corpus.mdwn @@ -1,6 +1,7 @@ -## Notmuch Email Corpus +[[!img notmuch-logo.png alt="Notmuch logo" class="left"]] +# Notmuch Email Corpus -A corpus of about 108k messages is available for performance testing of +A corpus of about 209k messages is available for performance testing of notmuch (or other uses). The contents are as follows @@ -14,18 +15,19 @@ The contents are as follows - `Mail/enron`: selected data from the EDRM v2 enron data set - CC Attribution: "ZL Technologies, Inc. (http://www.zlti.com)" - + - Downloaded via bittorrent http://www.searchdaimon.com/community/dataset/ - - - massaged with scripts/unpack-enron.sh -Because of the size of the archive, it is not currently available from -http://notmuchmail.org, but can be downloaded from: + - massaged with scripts/unpack-enron.sh (in the corpus tarball) + +- `Mail/lkml`: lkml messages 1000000 to 1100000 from the gmane archive + +The corpus is gpg signed by David Bremner with key fingerprint: -- [UNB](http://tesseract.cs.unb.ca/notmuch/notmuch-email-corpus-0.1.tar.gz) + 7A18 807F 100A 4570 C596 8420 7E4E 65C8 720B 706B -A signature from key "815B 6398 2A79 F8E7 C727 86C4 762B 57BB 7842 06AD" -can be found [here](http://tesseract.cs.unb.ca/notmuch/notmuch-email-corpus-0.1.tar.gz.asc) +You can download the corpus from +- [notmuchmail.org](https://notmuchmail.org/releases/notmuch-email-corpus-0.5.tar.xz) [signature](https://notmuchmail.org/releases/notmuch-email-corpus-0.5.tar.xz.asc)