Adding Namazu search engine to Mailman archive with user authentication
Introduction
Mailman is one of the most
popular mailing list managers.
Unfortunaly its archiver has no built-in search capabilities.
Namazu is a full text search engine
by Satoru Takabayashi et al.
Tom Morrison has successfully added Namazu to Mailman making archives
searchable:
http://mail.python.org/pipermail/mailman-users/2004-June/037584.html .
Lindsay Haisley has improved Tom's work and he created
Nmzproc .
Nmzproc works well but allows unauthorized persons to have a look
into non public lists.
I installed Nmzproc on our list server and I added basic
authorization capability as well as some I18N code.
So archives of private lists can be searched by members only.
This case study shows this work step by step.
Preparation
Our list server host runs Debian Sarge with previously installed
mailman package.
I installed the namazu2, namazu2-common and
namazu2-index-tools packages from
http://www.namazu.org/#download.
I created a new user called namazu that belongs to the list
group also. Therefore Mailman archives are readable by namazu.
Indexing processes and resulting index files are owned by namazu.
Layout
Figure below shows the location of existing and new files and directories.
(It is far from being complete.
Most components of Mailman are omitted for clarity.)
Sample lists are called foo and bar.
Click on the links in order to get detailed explanation.
/--+--home/namazu/--+--bin/--+--mailman_index
| | |
| | +--nmzproc
| | |
| | +--search.py
| |
| +--etc/templates/
|
+--usr/--+--lib/--+--cgi-bin/--+--namazu.cgi
| | | |
| | | +--mailman/search
| | |
| | +--Mailman/Cgi/search.py
| |
| +--share/namazu/template/
|
+--etc/mailman/*/--+--archtoc.html
| |
| +--archtocnombox.html
|
+--var/lib/--+--namazu/mailman/--+--foo/--+--namazurc
| | |
| | +--mknmzrc
| | |
| | +--NMZ.*
| |
| +--bar/--+--namazurc
| | |
| | +--mknmzrc
| | |
| | +--NMZ.*
| ...
|
+--mailman/archives/private/--+--foo/
|
+--bar/
|
...
Items explained
- /home/namazu/bin/mailman_index
[Download]
- A new script that refreshes Namazu index files. Put it in
namazu's crontab. E.g.:
44 23 * * * ls /var/lib/namazu/mailman | xargs $HOME/bin/mailman_index
- /home/namazu/bin/nmzproc
[Download][Diff]
- A Python script written by Lindsey Haisley and modified by me.
It adds a new Mailman list to the search engine.
Run it manually as namazu user:
namazu@myhost:~/bin$ ./nmzproc --uselower foo
- /usr/lib/cgi-bin/mailman/search
- A setgid list wrapper that calls
/usr/lib/mailman/Mailman/Cgi/search.py.
Create it by yourself from
rmlist (or any other 6 char name wrapper in this directory):
root@myhost:/usr/lib/cgi-bin/mailman# perl -p -e 's/rmlist/search/g' rmlist > search
root@myhost:/usr/lib/cgi-bin/mailman# chown root.list search
root@myhost:/usr/lib/cgi-bin/mailman# chmod 2755 search
- /usr/lib/mailman/Mailman/Cgi/search.py
- A symlink to /home/namazu/bin/search.py. Create it yourself.
- /home/namazu/bin/search.py
[Download][Diff]
- This is a wrapper script that does authorization and sets up
the environment of the search engine.
Finally it calls /usr/lib/cgi-bin/namazu.cgi. It is a
modified version of Lindsay Haisley's nmz_wrapper.cgi.
- /usr/lib/cgi-bin/namazu.cgi
- Off the self search engine as installed from the
namazu2 package.
- /usr/share/namazu/template/
- Directory of original HTML templates as installed from the Debian package.
These are currently unused. Listed for completeness.
- /home/namazu/etc/templates/
[Download dir content]
- Directory of HTML templates. nmzproc
copies NMZ.* files from here to /var/lib/namazu/mailman/foo/.
- /var/lib/namazu/mailman/
- A new directory writable by namazu user. Create it yourself.
mailman_index and
nmzproc scripts put their output here.
- /var/lib/namazu/mailman/foo/namazurc
- Search configuration file for mailing list foo.
Created by mailman_index.
- /var/lib/namazu/mailman/foo/mknmzrc
- Search configuration file for mailing list foo.
Created by nmzproc.
- /var/lib/namazu/mailman/foo/NMZ.*
- Two kinds of files are mixed here.
NMZ.head*, NMZ.body*, NMZ.foot*, NMZ.result*, NMZ.tips*
are customized language dependent
HTML snippets created once by nmzproc
from /home/namazu/etc/templates/* templates.
Rest of the NMZ.* files are indices of the search engine. They
are created/refreshed by mailman_index
periodically.
- /var/lib/mailman/archives/private/foo
- Archive of Mailman list foo. It must be readable by
mailman_index started by namazu
user.
- /etc/mailman/*/archtoc.html
[en version]
[hu version]
/etc/mailman/*/archtocnombox.html
[en version]
[hu version]
- Language dependent HTML templates of Mailman. A search form is added as
done by Tom and Lindsay. Edit your templates manually as you need.
Operations
Adding a new list
Run nmzproc (see above)
once for each list you want to make searchable.
The modified script looks into Mailman configs to retrieve all
allowed languages of the list foo
then creates necessary /var/lib/namazu/mailman/foo/NMZ.*
files as well as /var/lib/namazu/mailman/foo/mknmzrc.
Indexing
Run mailman_index a few times a day or once an hour or as you wish
for each lists to be indexed. This script finally calls mknmz
that reads new mails archived since the last indexing and updates
/var/lib/namazu/mailman/foo/NMZ.* files.
Search
When the user fills the search form on a Mailman archive web page and clicks on
Submit button cgi program /usr/lib/cgi-bin/mailman/search
is started by web server tipically running with UID www-data.
Search engine must read archives and index files therefore this
program is just a setgid list wrapper that calls
/usr/lib/mailman/Mailman/Cgi/search.py. The latter is just
a symlink to /home/namazu/bin/search.py. In case of
private lists this program checks if user is authorized to access
archive content. (Note: authorization is based on regular membership
only. Server and list administrator access rights are disregarded. Enabling
admin staff to search archives may be subject of further development.)
A live example
Bonetools
is an English language mailing list of archaeologists.
Archive can be found
here.
Go to the bottom and search "bone".
Lincense
All new programs as well as modifications of existing ones are licensed
under GNU GPL.
Contact
If I was too terse, if you did not understand something or you had
any problem with installation send a mail to <kissg@ssg.ki.iif.hu>.
I hope I can help.
Gábor
March, 2007
See also enhancements of Ferenc Wágner.
February, 2022