NAME InSilicoSpectro - Open source Perl library for proteomics INSILICOSPECTRO PROJECT DESCRIPTION This is the description of the entire InSilicoSpectro project; a description of the InSilicoSpectro.pm module is provided hereunder. InSilicoSpectro is a proteomics open-source project intended to cover common operations in mass list file format conversions, protein sequence digestion, theoretical mass spectra computations, theoretical and experimental MS data matching, text/graphic display, peptide retention time predictions, etc. The problems of raw data processing, storage and database searching are not addressed by the InSilicoSpectro project. InSilicoSpectro is released under the LGPL license and it is available from a dedicated web site at http://insilicospectro.vital-it.ch. The general design of the modules follows the object oriented programming (OOP) model and most of the modules are class definitions actually. The module that implements most of the theoretical mass computation routines supports a dual OOP and procedural programming model. InSilicoSpectro modules make use of some Perl modules that are not part of the standard Perl distribution, such as Statistics:Regression, XML:Twig, GD, and IA:NNFlex. We have developed a simple and minimal hierarchy to represent protein sequences and peptides (as digestion product) in a way that, on the one hand, fits the needs of the computations we perform and, on the other hand, stays relatively neutral in its design. Thus it should be possible to combine the latter classes with existing projects at users sites, e.g. via multiple inheritance, or to use them as the basis of more sophisticated objects. InSilicoSpectro Perl code is documented mainly via pod and a wide collection of simple and focused examples. An introductory explanation is provided here to guide new users and give them an understanding of the library that should be sufficient such that pod and the examples are the only necessary documentation. Installation Library organization InSilicoSpectro modules (lib/InSilicoSpectro) are organized according to their function. At the more general level there is a module named InSilicoSpectro.pm (This one!!) that provides general functionalities for initializing all other modules. More specialized modules are grouped in three folders: Spectra, for mass list-related; InSilico, for computational modules; Utils, for a few utility modules. In addition, illustrative examples can be found in three folders: scripts, which contains a set of tools implemented with InSilicoSpectro modules; cgi, which contains scripts implementing a simple web-based set of tools; t, which contains test programs that are examples as well. Now, by considering the main topics we cover in InSilicoSpectro one after another, we introduce the main modules and examples the user should try and look at to gain autonomy with the whole library. Mass list file format conversion A general purpose conversion program, convertSpectra.pl in folder scripts, allows you to convert one mass list format to another. A CGIzed version exists in the cgi folder: cgiConvertSpectra.pl. convertSpectra.pl is a good starting point to see a high-level usage of the basic methods implemented in the underlying modules. InSilicoSpectro::Spectra::ExpSpectrum is the basic class for representing spectra, i.e. a list of peaks (namely a list of pointers to peaks). Peaks are represented as list of attributes such as mass, intensity, SN, etc. The order of the attributes in these lists is given by an object of class InSilicoSpectro::Spectra::PeakDescriptor. See t/Spectra/testExpSpectrum.pl and t/Spectra/testPeakDescriptor.pl. By means of classes InSilicoSpectro::Spectra::MSSpectra, InSilicoSpectro::Spectra::MSMSSpectra, InSilicoSpectro::Spectra::MSMSCmpd, and InSilicoSpectro::Spectra::MSRun we represent PMF (MS) and MS/MS spectra, and HPLC runs. See t/Spectra/testSpectra.pl. Utils The module InSilicoSpectro::Utils::IO.pm contains miscellaneous utilities for accessing compressed files, defining a common verbose variable, etc. pI estimations scripts/computePI.pl is a tool that exemplify the usage of the class InSilicoSpectro::InSilico::IsoelPoint. Examples of how to use it can be found in t/InSilico/examples_rt_pi. See also the example in t/InSilico/testIsoelPoint.pl. A CGI version of computePI.pl can be found in cgi folder. Retention time prediction scripts/computeRT.pl is a tool that exemplify the usage of the class InSilicoSpectro::InSilico::RetentionTimer. Examples of how to use it can be found in t/InSilico/examples_rt_pi. See also the examples in t/InSilico/testPetritis.pl and t/InSilico/testHodges.pl. A CGI version of computeRT.pl can be found in cgi folder. Enzymes Enzymes are modeled by class InSilicoSpectro::InSilico::CleavEnzyme. See t/InSilico/testCleavEnzyme.pl. PTMs and other modifications Modifications of residues are modeled by class InSilicoSpectro::InSilico::ModRes. See t/InSilico/testModRes.pl. Protein and peptide sequences The basic class for biological sequences is InSilicoSpectro::InSilico::Sequence. We then define InSilicoSpectro::InSilico::AASequence to represent protein sequences with their modifications. A class InSilicoSpectro::InSilico::Peptide is used for enzymatic digestion products as we need special data in this case that are not part of a standard protein model. Examples can be found in t/InSilico: testSequence.pl, testAASequence.pl, testPeptide.pl. Protein digestion and mass computations The main module for digestion and mass computations is InSilicoSpectro::InSilico::MassCalculator. Examples of digestions and protein/peptide mass computations, including in the presence of fixed/variable modifications, are found in t/InSilico: testCalcDigest.pl, testCalcDigestOOP.pl, and testCalcVarpept.pl. OOP means an example with the OOP model as MassCalculator supports both an OOP and procedural interface. PMF The match between theoretical peptide masses and PMF experimental data is made by functions found in InSilicoSpectro::InSilico::MassCalculator. In the OOP model it is possible to represent PMF matches in objects of class InSilicoSpectro::InSilico::PMFMatch. See t/InSilico/testCalcPMFMatch.pl and t/InSilico/testCalcPMFMatchOOP.pl. Peptide fragmentation Theoretical fragment masses are computed by functions found in InSilicoSpectro::InSilico::MassCalculator. In the OOP model, theoretical MS/MS spectra can be represented as an object of class InSilicoSpectro::InSilico::MSMSTheoSpectrum, which represents in turn the various ions as InSilicoSpectro::InSilico::InternIonSeries and InSilicoSpectro::InSilico::TermIonSeries. The match between experimental and theoretical masses is also computed by InSilicoSpectro::InSilico::MassCalculator and in the OOP model the class InSilicoSpectro::InSilico::MSMSTheoSpectrum can store the match in addition to the theoretical spectrum. See in t/InSilico: testCalcFrag.pl, testCalcFragOOP.pl, testCalcMatch.pl, testCalcMatchOOP.pl, getIonIntensities.pl, ionStat.R. Graphical display of MS/MS spectra/matches The class InSilicoSpectro::InSilico::MSMSOutput instanciates objects aimed at providing different formats in order to represent MS/MS spectra and matches. See in t/InSilico: testMSMSOutText.pl, testMSMSOutLatex.pl, testMSMSOutHtml.pl, testMSMSOutPlot.pl, testMSMSOutLegend.pl. Mini web site In folder miniweb we provide a perl script build-miniweb.pl that builds, from CGI scripts in folder cgi, a simple web site for protein digestion, mass computations, and pI and retention time estimations. MODULE DESCRIPTION The module InSilicoSpectro.pm comprises generic functions that are useful for the whole project. FUNCTIONS saveInSilicoDef([$out]) Saves all registered definitions into the configuration file named $out, e.g. insilicodef.xml getInSilicoDefFiles() Returns the list of configuration files given by the operating system environment variable, whose name is stored in $InSilicoSpectro::DEF_FILENAME_ENV (default "INSILICOSPECTRO_DEFFILE"). The environment variable can point more than one file (separated by ':'), or be a glob ('...*...' expression). init([@files]) Loads a list of configuration files given as parameter or the default configuration files as returned by getInSilicoDefFiles. SEE ALSO InSilicoSpectro::InSilico, InSilicoSpectro::Spectra, InSilicoSpectro::Utils COPYRIGHT Copyright (C) 2004-2005 Geneva Bioinformatics (www.genebio.com) & Jacques Colinge (Upper Austria University of Applied Science at Hagenberg) This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA AUTHORS Jacques Colinge, www.fhs-hagenberg.ac.at Alexandre Masselot, www.genebio.com