Preparation and make

You need the following softwares to build Namazu 2.0.

Name Description Status Current Version Required Version File name Development and Distribution Sources(Example) Others
Perl Perl Language Required 5.10.0 >= 5.004 perl5.005_03.tar.gz Larry Wall GNU CPAN CPAN
make maintain groups of programs
3.81
make-3.81.tar.gz FSF GNU Required, when it cannot compile by make of system attachment.
gettext translate message Required only because of a multi-language message. 0.17 >= 0.13.1 gettext-0.17.tar.gz FSF GNU Solaris is indispensable.
nkf Network Kanji Filter for Japanese processing only 2.0.8 >= 1.71 nkf207.tar.gz Shinji Kono
Rei FURUKAWA
nkf_utf8 avoid using version 1.90, 1.92, 2.0.0 - 2.0.3 (See notes)
NKF nkf Perl Module for Japanese processing only. ++ 2.0.8 >= 1.71
KAKASI Japanese/Romaji Conversion for Japanese processing only. ** 2.3.4 >= 2.x kakasi-2.3.4.tar.gz KAKASI Project namazu.org
Text::Kakasi KAKASI Perl Module for Japanese processing only. ++ 2.04 >= 1.05 Text-Kakasi-2.04.tar.gz NOKUBI Takatsugu
Dan Kogai
CPAN dist
ChaSen (ChaSen) -- Japanese Morphology Analyzer for Japanese processing only. ** 2.3.3 >= 2.0x chasen-2.3.3.tar.gz Nara Institute of Science and Technology Distribution Policy For libchasen.a in ChaSen 2.02 or earlier, refer below.
Text::ChaSen ChaSen Perl Module for Japanese processing only. ++ 1.04 <= Text-ChaSen-1.04.tar.gz NOKUBI Takatsugu Text::ChaSen
MeCab Yet Another Japanese Morphology Analyzer for Japanese processing only. ** 0.97 >= 0.6 mecab-0.97.tar.gz Taku Kudo MeCab from Namazu 2.0.15 (It corresponds since Namazu 2.0.16 since MeCab 0.90.)
mecab-perl MeCab Perl Module for Japanese processing only. ++ 0.97 >= 0.76 mecab-perl-0.97.tar.gz Taku Kudo MeCab from Namazu 2.0.15 (It corresponds since Namazu 2.0.16 since MeCab 0.90.)
File::MMagic File Type Included 1.27 >= 1.20 File-MMagic-1.27.tar.gz NOKUBI Takatsugu CPAN dist This is packaged in Namazu distribution.

(Notes listed below are for Japanese processing only.)

Japanese Environment

Since 2.0.6, the handling of environment variables was changed. Besides, new command line option was added in mknmz.

environment variables

To use Namazu 2.0 under Japanese environment, you may need to set up environment variables for language selection.

With 2.0.5 (or earlier), the same environment variables were used to switch for both message translations and internal text processing.

Environment variable names for language selection (priority with left to right)
Message translations LANGUAGE LC_ALL LC_MESSAGES LANG
Text processing LANGUAGE LC_ALL LC_MESSAGES LANG

With 2.0.6, We modified as follows.

Environment variable names for language selection (priority with left to right)
Message Translations LANGUAGE LC_ALL LC_MESSAGES LANG
Text processing
LC_ALL LC_CTYPE LANG

The typical example to process Japanese is to set following values, depending on your system environment.

Setting language Sample
Unix OS ja
Windows ja_JP.SJIS

The actual command to set value show above may again depend your shell,

C shell Bourne shell etc
setenv LANG ja LANG=ja; export LANG

With above example, value(ja) is set for LANG, and all the processing will be for Japanese. Some system may require ja_JP, ja_JP.eucJP, ja_JP.EUC, ja_JP.ujis instead of just ja.

If the variables are not properly set when mknmz is executed, the resulting index files are not in good shape. If you browse one of the file, NMZ.w, supposed to have one (Japanese) word per line, instead, you have long sentence not segmented in each line. In that case, namazu or namazu.cgi execution will not show you the correct results.

--indexing-lang command line option (mknmz)

Since 2.0.6, the --indexing-lang=LANG option has been added in mknmz command.

You can specify language-processing-type with the option like --indexing-lang=ja (command line option given overrides environment variable). Some system may require ja_JP, ja_JP.eucJP, ja_JP.EUC, ja_JP.ujis instead of just ja.

Test before "make install"

If you wish to test mknmz before make install, do
cd namazu-2.0.x ( ... where you have unpacked *.tar.gz)
env pkgdatadir=`pwd` scripts/mknmz (in case csh/tcsh)
or
pkgdatadir=. scripts/mknmz (in case with sh/bash).
These will refer adjacent pl,filter,template etc, not exisiting stuff under /usr/local/share/namazu etc).

(To know more about this, see $PKGDATADIR variable in mknmz etc.)

You may try following examples for the first time to see the configuration, help, and to generate indexes for ~/Mail stuff, respectively.

    ./mknmz -C
    ./mknmz --help
    ./mknmz -O /tmp ~/Mail

Help Menu

If you just type mknmz or namazu with no argument, a short usage will be displayed. If you feed --help as an argument, a long usage will be displayed. The option -C will display the configurations at the time. Useful to remember these 3 option usages.

How to get help menus in command-line
Argument Meaning Other Arguments
None Short Usage Cannot add any argument
--help Long Usage Ignores other arguments
-C Configurations Other arguments will have meanings.

Running mknmz

First, create index. (If you wish to run mknmz before make install, please see Test before mknmz make install)
Format are changed slightly from versions 1.4.0.8. URI replacement is dealt with by specifying --replace option. URI replacement can be done during namazu/namazu.cgi execution. In this case, run mknmz without --replace option, and setup .namazurc so that URI replacement is performed during namazu/namazu.cgi execution.

Run mknmz as follows.

mknmz [options] target directory

The above example creates index in the current directory. Use -O option to specify the output directory.

For example,

      mkdir /tmp/index
      mknmz -O /tmp/index \
      --replace='s#/foo/bar/doc/#http://foo.example.jp/software/#' \
      /foo/bar/doc

mknmz will output the following messages during the creation of index. If you wish to display messages in Japanese, please refer to Japanese Environment.

    14 files are found to be indexed.
    1/14 - /foo/bar/acrobat3.pdf [application/pdf]
    2/14 - /foo/bar/excel97.xls [application/excel]
    3/14 - /foo/bar/html.html [text/html]
    4/14 - /foo/bar/mail-multipart.txt [message/rfc822]
    5/14 - /foo/bar/mail.txt [message/rfc822]
    6/14 - /foo/bar/man.1 [text/x-roff]
    7/14 - /foo/bar/msg00000.html [text/html; x-type=mhonarc]
    8/14 - /foo/bar/plain.txt [text/plain]
    9/14 - /foo/bar/plain.txt.Z [text/plain]
    10/14 - /foo/bar/plain.txt.bz2 [text/plain]
    11/14 - /foo/bar/plain.txt.gz [text/plain]
    12/14 - /foo/bar/rfc0000.txt [text/plain; x-type=rfc]
    13/14 - /foo/bar/tex.tex [application/x-tex]
    14/14 - /foo/bar/word97.doc [application/msword]
    Writing index files...
    [Base]
    Date:                Thu Mar 16 22:14:01 2000
    Added Documents:     14
    Size (bytes):        58,701
    Total Documents:     14
    Added Keywords:      95
    Total Keywords:      95
    Wakati:              module_kakasi -ieuc -oeuc -w
    Time (sec):          14
    File/Sec:            1.00
    System:              linux
    Perl:                5.00503
    Namazu:              2.0.X

Customizing mknmz

Namazu was originally developed for processing HTML documents, Namazu can now deal with various document styles. You will find useful scripts in /usr/local/share/namazu/filter, and detailed explanation will be found in Document filters in Namazu manual.

Mails in MH format
run mknmz
% mknmz ~/Mail/foobar
MHonArc
Namazu will do specific processing for MHonArc HTML.
hnf
.mknmzrc for hnf and guide can be obtained from Hyper NIKKI System
Documents stored in other machines
Cannot search documents using Namazu alone. Need to use other tools (eg. wget, NFS) that transfer the documents in combination.

For mknmz command-line arguments, you get usage information from mknmz --help. With -C option, you get the configurations of the time.

    Loaded rcfile: /home/foobar/.mknmzrc
    System: linux
    Namazu: 2.0.X
    Perl: 5.00503
    File-MMagic: 1.27
    NKF: module_nkf
    KAKASI: module_kakasi -ieuc -oeuc -w
    ChaSen: module_chasen -i e -j -F "%m "
    MeCab: module_mecab -Owakati -b 8192
    Wakati: module_kakasi -ieuc -oeuc -w
    Lang_Msg: C
    Lang: C
    Coding System: euc
    CONFDIR: /usr/local/etc/namazu
    LIBDIR: /usr/local/share/namazu/pl
    FILTERDIR: /usr/local/share/namazu/filter
    TEMPLATEDIR: /usr/local/share/namazu/template
    Supported media types:   (42)
    Unsupported media types: (2) marked with minus (-) probably missing application in your $path.
      application/excel: excel.pl
      application/gnumeric: gnumeric.pl
      application/ichitaro5: taro56.pl
      application/ichitaro6: taro56.pl
      application/ichitaro7: taro7_10.pl
      application/macbinary: macbinary.pl
      application/msword: msword.pl
      application/pdf: pdf.pl
      application/postscript: postscript.pl
      application/powerpoint: powerpoint.pl
      application/rtf: rtf.pl
      application/vnd.kde.kivio: koffice.pl
      application/vnd.kde.kpresenter: koffice.pl
      application/vnd.kde.kspread: koffice.pl
      application/vnd.kde.kword: koffice.pl
      application/vnd.oasis.opendocument.graphics: ooo.pl
      application/vnd.oasis.opendocument.presentation: ooo.pl
      application/vnd.oasis.opendocument.spreadsheet: ooo.pl
      application/vnd.oasis.opendocument.text: ooo.pl
      application/vnd.sun.xml.calc: ooo.pl
      application/vnd.sun.xml.draw: ooo.pl
      application/vnd.sun.xml.impress: ooo.pl
      application/vnd.sun.xml.writer: ooo.pl
      application/x-apache-cache: apachecache.pl
      application/x-bzip2: bzip2.pl
      application/x-compress: compress.pl
    - application/x-deb: deb.pl
    - application/x-dvi: dvi.pl
      application/x-gzip: gzip.pl
      application/x-js-taro: taro7_10.pl
      application/x-rpm: rpm.pl
      application/x-tex: tex.pl
      application/x-zip: zip.pl
      audio/mpeg: mp3.pl
      message/news: mailnews.pl
      message/rfc822: mailnews.pl
      text/hnf: hnf.pl
      text/html: html.pl
      text/html; x-type=mhonarc: mhonarc.pl
      text/html; x-type=pipermail: pipermail.pl
      text/plain
      text/plain; x-type=rfc: rfc.pl
      text/x-hdml: hdml.pl
      text/x-roff: man.pl

Targets of index creation

short name long name description
-F --target-list=FILE read in list of target files for index creation
-t --media-type=MTYPE specify the document format of target files
--allow=PATTERN specify the regular expression of target file names.
--deny=PATTERN specify the regular expression of to-be-excluded file names.
--exclude=PATTERN specify the regular expression of to-be-excluded path names.

Running namazu

To search documents, do

      % namazu query index

If you omit index, namazu will assume /usr/local/var/namazu/index as target.

Set up for namazu command will be done in namazurc. An example of namazurc can be found in /usr/local/etc/namazu/namazurc-sample in Namazu distribution package.

To use CGI on the web, you need to do various configuration. For Apache (Configuration)

ScriptAlias /cgi-bin/ /usr/local/apache/cgi-bin/ directory alias to /cgi-bin/ in URI
AddHandler cgi-script .cgi execute cgi for files ending with ".cgi"
AllowOverride All Allow .htaccess configuration (Web administrator)
Options ExecCGI Allow cgi-bin execution
DirectoryIndex index.html file name to display when specifying directory in URI

.htaccess can do configurations other than the one indicated by (Web administrator). (Note that these configuration may be forbidden in Apache configuration.)

What you can do with Namazu

What is written here is not "guarantee". Just introduce the advanced usage that developers have in mind.

Information provided by http://www.namazu.org/

mantova unix user group | Atricoli | Lele
mossberg 500 | honeywell thermostat | hydraulic pump | garmin etrex | garmin nuvi | Logitech G500 | apple magic mouse | apple iPad tablet | cisco 3750 | oral herpes symptoms