X-Git-Url: https://git.cworth.org/git?p=notmuch-wiki;a=blobdiff_plain;f=howto.mdwn;h=584d89df2e26b5087d1f19fcacb93485abc0e02e;hp=23fde44aab3b3d725dbe2fbf73ecb5856c1e0aaa;hb=HEAD;hpb=788381deb11b1d352c73dfe673071c59d7e294cc diff --git a/howto.mdwn b/howto.mdwn index 23fde44..584d89d 100644 --- a/howto.mdwn +++ b/howto.mdwn @@ -12,7 +12,7 @@ Notmuch does not fetch mail for you. For that, you need to use an external mail syncing utility. Some recommended utilities are listed below. -Notmuch requires that every individual message be in it's own file. +Notmuch requires that every individual message be in its own file. The well-supported [maildir](http://cr.yp.to/proto/maildir.html) or "mh"-style storage formats are compatible with notmuch. Basically any setup in which each mail is in a file of its own will work. The older @@ -37,7 +37,14 @@ utilities support these formats: * [muchsync](http://www.muchsync.org/) - replicate and synchronize your notmuch database (mail and tags) across machines -* [gmailieer](https://github.com/gauteh/gmailieer) - Fast email-fetching and two-way tag synchronization between notmuch and GMail +* [lieer](https://github.com/gauteh/lieer) - Fast email-fetching and two-way tag + synchronization between notmuch and GMail (Note that lieer was formerly known + as gmailieer.) + +* [mujmap](https://github.com/elizagamedev/mujmap/) - synchronize + notmuch mail with a JMAP server, i.e. synchronizing tags with keywords + and mailboxes. Analogous to lieer, but for [JMAP](https://jmap.io) + supporting mail hosts. See more exhaustive list of [[software]] notmuch works with and the [[initial_tagging]] page for more info on initial tagging of messages. @@ -145,6 +152,33 @@ in a scenario where you have encrypted your hard disk anyway and are comfortable with the security implications (and until notmuch can index encrypted email itself). +## **Index and search emails written in CJK scripts** + +CJK (Chinese, Japanese and Korean) languages do not use spaces for word +separation. The full-text indexer (Xapian) must first perform word segmentation +on the sentence in its TermGenerator. Otherwise, large amount of long terms +will be included in the database, leading to extremely slow indexing and +ineffective searching with CJK search terms. + +Xapian supports [N-gram](https://xapian.org/docs/sourcedoc/html/classXapian_1_1TermGenerator.html) +term generator [since 2011](https://u7fa9.org/memo/HEAD/archives/2012-06/2012-06-01.rst) +to as a simple substitute for word segmentation. It can be turned on by +setting the environment variable + + $ export XAPIAN_CJK_NGRAM=1 + $ notmuch new + +For existing databases, one can reindex the database (since notmuch 0.26) +with + + $ export XAPIAN_CJK_NGRAM=1 + $ notmuch reindex '*' + +Xapian has an on-going [pull-request](https://github.com/xapian/xapian/pull/114) +that adds support for real CJK word-segmentation based on the ICU library. +When it gets merged, one probably will gets better indexing and searching +results with this new method. + ## Translations - A translation of this page into [[Russian|howto-ru]]