Perlfect Search

Perlfect Search is an integrated, general purpose, site indexer and search engine. It comes as a pair of distinct scripts, the indexer and the search engine. The indexer automatically scans and indexes a Web site, and the search engine is a CGI script that serves search queries for keywords over the index and displays results pages in html. This is in a standard format, including title, description, and relevance ranking for each matching document. Advanced features include stopwords, a potent exclude mechanism and a handy automatic installation and configuration utility.

Installing Perlfect Search

For installation instructions, please click here.

Using Perlfect Search

Indexing your site

After the script has been installed, you will need to index your site in order to be able to perform searches.

Most people want to index the files as they are on the server's disk, and this is what will happen by default. If your pages are generated dynamically (e.g. via PHP) you will want to index them via http. This is also important for security reasons, since dynamic files might contain passwords that should not be indexed in their source. To index dynamic pages, load conf.pl into an editor and set $HTTP_START_URL.

Indexing Using ssh/telnet

  1. Log in to your account using ssh/telnet if it is on a remote machine.
  2. Go to the directory where the script was installed. The setup utility will have installed the script in a directory perlfect/search/ inside your cgi-bin directory.
  3. Run the indexer program with the command: perl indexer.pl and wait until it's finished.

Indexing Using a Web Browser

  1. If you cannot log in to your server via telnet/ssh, you can start the indexing process with your browser. This is less secure than logging in via ssh to start the indexer, so it should only be used if absolutely necessary.
  2. Set a password at $INDEXER_CGI_PASSWORD in conf.pl.
  3. Load the index_form.html HTML file into an editor and change the action attribute value of the <form> tag so that it points to your server.
  4. Load index_form.html with a browser, enter your password, and submit the form. indexer.pl must be executable by your server for this, and the data and temp directories need to be writeable, so you have to set the according permissions with your FTP program.

Depending on how large your site is, you will need to wait for some time while the indexer digests all of your site's content. If you stop the indexing (e.g. with Ctrl-C if you are in a shell), your index will not be updated. Perlfect Search will continue to use the old index.

Putting a Search Box on Your Pages

The setup utility will have the search script installed inside your cgi-bin directory in a subdirectory called /search. If your cgi-bin is at the URL http://>yourdomain.com/cgi-bin/, the location of the search script will be http://yourdomain.com/cgi-bin/search/search.pl. Point your browser to this URL to see if it works. If the script has been installed correctly and an index has been successfully created using indexer.pl, this URL should return a results page for an empty query (i.e. a page that tells you there are no results). You can then use the following HTML code to insert the search box in any of your pages (or use search_form.html, which contains this code):

<form method="get" action="cgi-bin/search/search.pl">
<input type="hidden" name="p" value="1">
<input type="hidden" name="lang" value="en">
<input type="hidden" name="include" value="">
<input type="hidden" name="exclude" value="">
<input type="hidden" name="penalty" value="0">
<select name="mode">
<option value="all">Match ALL words</option>
<option value="any">Match ANY word</option>
</select>
<input type="text" name="q">
<input type="submit" value="Search">
</form>

You might have to change the form's action attribute to fit your local setup. Here's a list of the possible fields (the defaults are okay for most people, so you probably don't need to change anything):

Customizing the Results Page

Inside the directory where Perlfect Search was installed, you will find a directory called templates. Inside it, there are the files search.html and no_match.html. You can open these files with your favorite text editor and edit them to customize the look of the results page. It is like a regular HTML file, but there are some comments in it that tell the Perlfect Search where to insert the dynamic results.

The result pages are valid XHTML. Please support web standards and test the pages for correctness at validator.w3.org if you make changes to them.

NOTE: Template files themselves are not valid XHTML, but the generated pages that show the result of a search are. To test a template, search for something, save the result page and upload that file to the validator.

Highlighting Matched Terms

Perlfect Search allows you to display the documents with all search terms highlighted. Each search result has a "highlight matches" link for that. This feature is limited to HTML pages that follow some simple restrictions:

<script>
<!—here comes the javascript// -->
</script>

If your documents don't follow these restrictions, the pages may be displayed garbled. You should then disable this feature by setting $HIGHLIGHT_MATCHES = 0; in conf.pl. You can use @HIGHLIGHT_EXT to set which files have a "highlight matches" link. Usually these are just HTML files, including HTML files generated by PHP etc. (only if $HTTP_START_URL is set), but not for PDF files etc.

The "highlight matches" feature takes a URL as a parameter—still it will refuse to work on any URL that was not actually indexed. This is a security measure so people cannot just load any file from your server or view any URL on the web via your server.

Excluding Directories or Files from the Index

Local filesystem

Inside the directory where Perlfect Search was installed, you'll find a directory called conf. Inside it there's a file called no_index.txt. Open it with your favorite text editor and add the paths of any files you want to exclude from indexing, one on each line. The use of the wildcard character * is supported, so for example a line containing /dir1/dir2/file.* will match any file in /dir1/dir2/ that starts with file. If you want to exclude a whole directory, use /dir1/dir_to_exclude/*

You need to run indexer.pl again after making changes to this file.

Files fetched via http

If you are using the $HTTP_START_URL option to fetch your files via http you can also exclude certain files from the index by adding this meta tag to their head:. The robots.txt file in the document root of your web server is also taken into consideration.

Searching

  1. Type in one ore more words into the search field and click "Search" (or press Return).
  2. If Match ALL words is selected, only those documents are returned that contain all of your search terms. With Match ANY word, all documents are returned that contain at least one of your search terms. Alternatively, you can put a plus sign (+) directly in front of one or more words to only get those files that include all of those words. Words with a minus (-) sign directly in front of them change the result so that only documents are listed that don't contain any of those words.

    NOTE: Phrase searches are not supported, so it does not work to put quotes around your query "like this."

  3. The results are ordered by relevance with the most relevant documents listed first. Relevance depends on the number and position of matched words in the documents.