Perlfect Search is an integrated, general purpose, site indexer and search engine. It comes as a pair of distinct scripts, the indexer and the search engine. The indexer automatically scans and indexes a Web site, and the search engine is a CGI script that serves search queries for keywords over the index and displays results pages in html. This is in a standard format, including title, description, and relevance ranking for each matching document. Advanced features include stopwords, a potent exclude mechanism and a handy automatic installation and configuration utility.
For installation instructions, please click here.
Indexing your site
After the script has been installed, you will need to index your site in order to be able to perform searches.
Most people want to index the files as they are on the server's disk, and this is what will happen by default. If your pages are generated dynamically (e.g. via PHP) you will want to index them via http. This is also important for security reasons, since dynamic files might contain passwords that should not be indexed in their source. To index dynamic pages, load conf.pl into an editor and set $HTTP_START_URL.
Indexing Using ssh/telnet
Indexing Using a Web Browser
Depending on how large your site is, you will need to wait for some time while the indexer digests all of your site's content. If you stop the indexing (e.g. with Ctrl-C if you are in a shell), your index will not be updated. Perlfect Search will continue to use the old index.
Putting a Search Box on Your Pages
The setup utility will have the search script installed inside your cgi-bin directory in a subdirectory called /search. If your cgi-bin is at the URL http://>yourdomain.com/cgi-bin/, the location of the search script will be http://yourdomain.com/cgi-bin/search/search.pl. Point your browser to this URL to see if it works. If the script has been installed correctly and an index has been successfully created using indexer.pl, this URL should return a results page for an empty query (i.e. a page that tells you there are no results). You can then use the following HTML code to insert the search box in any of your pages (or use search_form.html, which contains this code):
<form method="get" action="cgi-bin/search/search.pl">
<input type="hidden" name="p" value="1">
<input type="hidden" name="lang" value="en">
<input type="hidden" name="include" value="">
<input type="hidden" name="exclude" value="">
<input type="hidden" name="penalty" value="0">
<select name="mode">
<option value="all">Match ALL words</option>
<option value="any">Match ANY word</option>
</select>
<input type="text" name="q">
<input type="submit" value="Search">
</form>
You might have to change the form's action attribute to fit your local setup. Here's a list of the possible fields (the defaults are okay for most people, so you probably don't need to change anything):
Customizing the Results Page
Inside the directory where Perlfect Search was installed, you will find a directory called templates. Inside it, there are the files search.html and no_match.html. You can open these files with your favorite text editor and edit them to customize the look of the results page. It is like a regular HTML file, but there are some comments in it that tell the Perlfect Search where to insert the dynamic results.
The result pages are valid XHTML. Please support web standards and test the pages for correctness at validator.w3.org if you make changes to them.
NOTE: Template files themselves are not valid XHTML, but the generated pages that show the result of a search are. To test a template, search for something, save the result page and upload that file to the validator.
Highlighting Matched Terms
Perlfect Search allows you to display the documents with all search terms highlighted. Each search result has a "highlight matches" link for that. This feature is limited to HTML pages that follow some simple restrictions:
<script>
<!—here comes the javascript// -->
</script>
If your documents don't follow these restrictions, the pages may be displayed garbled. You should then disable this feature by setting $HIGHLIGHT_MATCHES = 0; in conf.pl. You can use @HIGHLIGHT_EXT to set which files have a "highlight matches" link. Usually these are just HTML files, including HTML files generated by PHP etc. (only if $HTTP_START_URL is set), but not for PDF files etc.
The "highlight matches" feature takes a URL as a parameter—still it will refuse to work on any URL that was not actually indexed. This is a security measure so people cannot just load any file from your server or view any URL on the web via your server.
Excluding Directories or Files from the Index
Local filesystem
Inside the directory where Perlfect Search was installed, you'll find a directory called conf. Inside it there's a file called no_index.txt. Open it with your favorite text editor and add the paths of any files you want to exclude from indexing, one on each line. The use of the wildcard character * is supported, so for example a line containing /dir1/dir2/file.* will match any file in /dir1/dir2/ that starts with file. If you want to exclude a whole directory, use /dir1/dir_to_exclude/*
You need to run indexer.pl again after making changes to this file.
Files fetched via http
If you are using the $HTTP_START_URL option to fetch your files via http you can also exclude certain files from the index by adding this meta tag to their head:. The robots.txt file in the document root of your web server is also taken into consideration.
Searching