The Global Proteome Machine Organization
  GPM case studies
The case studies review and analyze problems that have been encountered by individuals when trying to install the GPM on their servers. They can be used as intructive models for those who are having installation issues or who have limited experience configuring a web server to run web based programs.

case 1 - A university lab wants to set up an implementation of GPM on a linux server running Red Hat 7.2.
case 2 - A group of researchers from a private lab visited a conference which showcased GPM. They decided to get their very own version of GPM to run on their server.

Case 1

Hi,
Sorry to bother you but I really need some pointers on the installation of "the GPM". I just installed the GPM on a Linux system (RH7.2). Since I have other programs using the Apache server I could not make The GPM dir the root directory as stated on the installation instructions. So I placed the "thegpm" folder inside my root directory and added the following to my httpd.conf:
#The GPM specific requirements are as follows

ScriptAlias /thegpm-cgi/  "/var/www/html/thegpm/thegpm-cgi/"
AddHandler cgi-script  .pl
<Directory "/var/www/html/thegpm/thegpm-cgi">
    AllowOverride ALL
    Options ExecCGI
    Order allow,deny
    Allow from all
</Directory>

Also in the Directories... section

<Directory "/var/www/html/thegpm">
    AllowOverride ALL
    Options ExecCGI FollowSymLinks
    Order allow,deny
    Allow from all
</Directory>

<Directory "/var/www/html/thegpm/tandem">
    AllowOverride ALL
    Options ExecCGI FollowSymLinks
    Order allow,deny
    Allow from all
</Directory>

I also had to add a few symlinks to the Apache root directory pointing to the following directories: pics tandem archive fasta Since some of the links on the web forms were not working properly. However I am still getting "Internal Server Errors": This is from the Apache error.log

1 at /var/www/html/thegpm/thegpm-cgi/thegpm.pl line 34.
../tandem/archive/52db749.xml not found at
/var/www/html/thegpm/thegpm-cgi/plist.pl line 38.
[Thu Feb 19 10:54:48 2004] [error] [client 99.99.99.99] Premature end of
script headers: /var/www/html/thegpm/thegpm-cgi/plist.pl

I read the FAQ and checked the permissions of all the directories involved and still cannot get it to fly. I would really appreciate any help/pointers you can give me to get this program to work.

Hi,
No need to apologize, we are here to help.
I looked at the apache configuration and instead of repairing that, I have included a new configuration that you can just copy and paste into your configuration file. So, just remove what you have added and replace in the appropriate sections with the following:

Alias /tandem/ "/var/www/html/thegpm/tandem/"
<Directory "/var/www/html/thegpm/tandem">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

Alias /pics/ "/var/www/html/thegpm/pics/"
<Directory "/var/www/html/thegpm/pics">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

Alias /cache/ "/var/www/html/thegpm/cache/"
<Directory "/var/www/html/thegpm/cache">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

ScriptAlias /thegpm-cgi/  "/var/www/html/thegpm/thegpm-cgi/"
<Directory "/var/www/html/thegpm/thegpm-cgi/">
    AllowOverride None
    Options None
    Order allow,deny
    Allow from all
</Directory>

AddHandler cgi-script  .pl

I'm not sure that you need the AddHandler line, as this is not required on my set up, though I don't think it will hurt.

Hope this helps. If not, please feel free to contact me.

Thanks.

Hi,
Thank you very much for your help.
I changed my httpd.conf according to your instructions (and restarted Apache), but I still get the same error.
I looked more carefully and discovered that there was another message on the browser before the "Internal Server Error". It happens so quickly that it is hard to notice.
The message on this web page is:

Cannot open file input 12cdd8d4.xml Parameter transmission failed.

And has an URL that ends ../thegpm-cgi/thegpm.pl
I guess my input is the problem ??
I hope this will give you a clue to my problem, thanks a lot again.

Hi,
This looks to me like a permissions problem. The perl script (thegpm.pl) is trying to create a file (input 12cdd8d4.xml) for writing in the current folder (thegpm-cgi). This file contains, in xml format, the input parameters that are passed to the tandem executable. Make sure this folder has 755[1] permissions. This file is deleted at the end of the perl script (thegpm.pl), so it will not appear in that folder once the tandem program has finished running and control has been returned to the perl script.

If this is not the problem, it may be possible that the file name is being created incorrectly. In the error message it refers to a file called "input 12cdd8d4.xml". This file name should not have a space in it. The line that creates this file is in the perl script "thegpm.pl". It is as follows:

my $input_xml = "input" . sprintf("%x",int(rand(time|$$)*rand())) . ".xml";

As a test you could try changing this to:
my $input_xml = "inputTest.xml";
I don't think this is the problem, but I guess it's worth a try if the first fix doesn't work.

Let me know how it goes, or if you need any more help.

Thanks.

Hi, You were 110% right! Permission permissions permissions... The cgi directory had 755 but for some reason it did not work. So I changed to 777 and voila! Then I hit another problem, my libraries were too old (RH 7.2).[2] So I cheated and got an old PC installed RH 9.0 and installed the GPM. Bingo runs like a charm.
However, I have one more question If I want to install a new database what kind of format do I need? I look at the provided databases and have something called x-bang-pro-fasta-format, but to be honest I am not familiar with this format.

Thanks again!

Hi,

Glad to hear this has worked out!
The databases are just fasta format, but they have been optimized by using a command line program called fasta_pro.exe[3]. This program can be found in the fasta folder. It is run as follows:

fasta_pro.exe filename [ens|nr]

where filename is the name of a fasta format file and [ens|nr] is the type (ensembl or nr(ncbi))

It is not neccessary to use this on your fasta files in order for them to work with tandem, but it will increase the speed at which the searches are run. If you are unable to convert your fasta file, send me a few lines from it so I can look at the format and hopefully find a fix.

Take care.

notes

[1] This should have been permissions set to 777.
[2] Another way to solve this problem would be to download the latest source for tandem from our ftp site and recompile to create an executable that will run on Red Hat 7.2.
[3] fasta_pro.exe is a command line program that optimizes regular fasta file to be used by tandem. It reads each line from the fasta file and counts the number of bytes on each line and stores that value for each line. It rewrites the file with the same name adding a .pro extension. This increases the speed because the tandem fread calls can be told exactly how many bytes to read for reach line, instead of it testing each character in a line for the newline character.


Case 2

Dear Sir/Madam,

I'm a Bioinformatics Analyst. Thanks very much for providing GPM. I'm in the process of setting up a web-based tool using GPM for our researchers. I'm just curious why you designed your interface to run as the root directory of a web server instead of using relative paths (ie /tandem as opposed to ../tandem etc). I'm finding myself going into individual perl scripts and editing the paths in them! Since the web server is used for several other purposes as well, I cant just install the 'thegpm' directory as the root directory and I assume that will be the case with several others. I would greatly appreciate it if you would consider this suggestion in your future versions.

Thanks a bunch.

Hello,

It is not necessary to make thegpm the root directory of the server in order to make thegpm run on your server without any changes to the perl scripts. What you need to do is add Alias directives to the httpd.conf file if you are using Apache or set up virtual directories if you are using IIS for the tandem,cache and pics folders within thegpm folder. For Apache server add the following lines the httpd.conf file:

Alias /tandem/ "C:/Program Files/Apache Group/Apache2/htdocs/thegpm/tandem/"
<Directory "C:/Program Files/Apache Group/Apache/htdocs/thegpm/tandem">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

Alias /pics/ "C:/Program Files/Apache Group/Apache2/htdocs/thegpm/pics/"
<Directory "C:/Program Files/Apache Group/Apache/htdocs/thegpm/pics">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

Alias /cache/ "C:/Program Files/Apache Group/Apache2/htdocs/thegpm/cache/"
<Directory "C:/Program Files/Apache Group/Apache/htdocs/thegpm/cache">
    Options Indexes MultiViews
    AllowOverride None
    Order allow,deny
    Allow from all
</Directory>

Change the paths to where ever you have thegpm folders located on your server.

You may have to rollback the changes you made to the perl scripts to make this work.

If you have anymore problems feel free to contact us.

Thanks.

Hi,

Thanks very much for the advice. By making changes in the Apache conf file I'm able to get away without modifying the scripts. However, now I have the raw perl script (thegpm.pl) printing out into my browser when I run it. It is in my CGI directory and is all set to executable. I'll need to look into it a little more. However when I run it off IIS on one of our Windows servers it runs fine! If you have come across a similar situation before, please advise.

By the way, thegpm is a great tool and it looks like it can cut down our processing time signifcantly too.

Thanks again.

Hi,

Could you send me your httpd.conf file? Also, is it possible for me to view the site from here? If so please let me know the ip address so I can take a look first hand.

Thanks.

Hi,

The site IP is http://999.99.99.99/thegpm
It is hosted off my desktop linux box and I'm setting up interfaces on it for the researchers in the lab.

The Apache conf file is attached. I'm pretty sure its something in the conf file, but I'm in the process of learning Apache, so I cant pin it down yet.

Thanks very much for the help.

Hi,

I attached your httpd.conf with a couple of modifications.
I added a ScriptAlias directive for thegpm-cgi and commented out the Alias directive for thegpm-cgi. In order for thegpm to work without any script path changes the thegpm folder must maintain its structure with tandem and thegpm-cgi folders at the same level. By looking at the Alias you set up for thegpm-cgi, you had moved it into the cgi-bin folder. The reason it is printing the raw script to the browser is because thegpm-cgi didn't have a ScriptAlias, instead it had an Alias.

I put a small note in the file where the changes were made.

Let me know if this works for you.

Thanks.

Hi,

Thanks very much for the help. The processing program works fine now.

I feel guilty bothering you with this, but since we've gotten this far I figured I'll mention to you one last problem that I'm having. And that is that the final list printout (plist.pl parsing and printing out the XML output file) hangs after printing a certain number of entries. After that the browser keeps saying 'fetching data', but does nothing. The same works fine from the IIS server. Could be yet another Apache conf thing. If you've come across this before, please advise. If you're busy, please dont bother, you've helped enough!

Thanks a bunch for your time and attention.

Hi,

No problem. We are here to help.
Please send me the spectra file that is causing this problem and I will see if I can duplicate this behavior on my Linux machine. My email can take up to 10mb file, though an intermediate server may complain. If it is too big for you too send, can you put it on an ftp site from which I can download.

Thanks.

Hi,

I think you are right about the timeing out issue. I tried running this on my server and it works fine, though it does take a while as this is a fairly large file. I have attached my httpd.conf file so you can take a peak at it. Hope this helps. Feel free to ask if it doesn't.

Good luck.

Hi,

Thanks very much for the help. I compared your conf file with mine and they are exactly the same except for the Aliases and ScriptAliases that I have. Its really strange, I can run plist.pl from the command line with the same xml file, save the output as an html file and see it just fine from a browser. I can also also do smaller spectra files and get it to work from the browser. So I'm at a loss to explain why this is happening for large files. Anyway, it all works fine from the IIS server, so I can rely on that for now.

Thanks very much for your time, help and suggestions.

Hi,

It is possible that the version of perl or cgi.pm you have on the Linux machine needs to be updated. The versions I am running are: perl version 5.8.0 and cgi.pm version 3.04. The newest version of cgi.pm can be found at: http://search.cpan.org/~lds/CGI.pm-3.04 .

You could also try adding some print statements like below. It needs to have the flush_buffer() function added as Apache will buffer any output you have until it reaches a certain number of bytes in the buffer.

print "Got here - line 50<br>";
flush_buffer();
 
sub flush_buffer {
	my $a = 0;
 	while($a < 400) {
  		print "                    ";
  		$a++;
 	}
}

Hi,

Thanks very much for the suggestions. I'm using perl 5.8 and I downloaded the latest version of CGI.pm from Lincoln Stein's website. It didnt say the version, I'll need to check. I'll also flush the buffer like you suggested.

I was just looking at some of the code and I dont know whether I'm looking at the right place, but I dont see anything for fetching data for Swiss-Prot entries. I see the PostEnsembl, PostNcbi etc subroutines in get_info.pl, but I dont see anything for Swiss-Prot (the sp|XXXXX labels) . Am I looking at the right place?

Just thought I'd ask. Please get back to me at your convenience.

Thanks

Hi,

Unfortunately, thegpm will not be supporting swiss prot databases. It has proven to be a problem for us in the past. Good luck with your perl issue!

Hi,

I guess its the frequently changing formats that has forced you to not support Swiss-Prot. But we have a lot of Swiss-Prot entries in our databases, so I added some code to the gpm suite to account for swiss-port/trembl formats. Seems to be working fine for now. I'm sure you've gone through this before and probably even have the code you wrote for it, but if you want I can send you what I added. I just duplicated a few functions that are already there and changed the reg exes to account for the new format.
I still havent figured out the perl issue I had! But it seems to be working for the smaller files. And most of the users still use the IIS web server where everything works fine, so I havent messed with it more.

Thanks very much for all the help. I really appreciate it.

Hi,

Glad to hear you got the Swiss Prot entries working. If you would like to send your script modifications to us, I will take a look.

I haven't come up with a solution for the timeout problem, but I'm sure you are right in thinking it is a web server issue of some kind.

Hope things continue to go well with the GPM. I will let you know if we have any updates for you, as we are looking at April for our next release date. Feel free to contact us if there are any more questions or concerns.

Take care.

Hi,

Sorry for the delay in getting back to you. I'm sending you a text file [1] where I noted down all the modifications I made (I test the modifications in my linux box and then make the corresponding changes on our Windows server that hosts the web) for Swiss-Prot. These are not in order as I noted them down as and when I made them, but it has all of them. Hope it is useful.

best regards

notes

[1] The swiss prot code can be found here.


Copyright © 2004-2011, The Global Proteome Machine Organization