'''Study Title''' Proteomic Characterization of Cell Lines 

This page: [https://nciphub.org/groups/nci_physci/wiki/PSON0008]

'''Study Contact'''  Parag Mallick; email: paragm@stanford.edu

'''Overview'''

The Janmey lab at the
University of Pennsylvania grew each of the 9 cell lines on each of
the 7 growth matrices for a total of 63 different conditions.
Approximately 9 million cells were initially plated (across 6-9
plates) for each condition.  Cells were grown for 24 hours.  After 24
hours, cells were isolated, pelleted, flash frozen and then
transferred to Stanford for proteomics analysis.
	
Protein was extracted from the pellets.
Next, a small portion of this was precipitated and used for QC
analysis.  Specifically, protein amount was estimated by !MicroBCA.  A
Coomassie stained gel was also run to verify that there was no
significant contamination or degradation.  Next, allocations for each
of the downstream assays were determined and protein aliquoted. Each
of these aliquots was precipitated and then directed to its subsequent
assay (iBAQ, !Phospho, TMT), as detailed below.  

'''iBAQ Analysis Description'''

The goal iBAQ analysis was twofold. First, we wanted to generally
verify that the sample was of sufficient quality to support broader
proteomics analysis.  Second, we wanted to generate data that might be
used for either spectral count or iBAQ analysis to support absolute
quantification.

Sample Preparation and Mass Spectrometric Analysis Workflow

Our overall iBAQ process begins with the precipitated protein
generated as described above.  This precipitate was re-suspended and
then split into three aliquots. To facilitate iBAQ analysis, UPS2, a
standard protein mixture (Sigma) was spiked into one of the aliquots.
We denote this aliquot plusUPS2. As downstream analysis of these
aliquots was performed independently they serve as sample preparation
triplicates, which we refer to as preplicates. As shown on Page 3,
each preplicate was processed using our iBAQ protocol to generate
tryptic peptides.

Duplicate injections of these peptides were analyzed by a Thermo LTQ
Orbitrap Velos generating a family of approximately 6 .RAW files per
condition.  

2.3 iBAQ Computational Analysis Workflow

Raw instrument files (*raw) are converted to an intermediary file
(mzXML format) using !ProteoWizard msConvert
[http://proteowizard.sourceforge.net].  Next, files are uploaded to
Labkey Server (http://www.labkey.org).  Labkey server uses a typical
Transproteomic Pipeline Process to match MS data to peptides and
proteins from the Human2015.fasta FASTA Database.  Results are stored
as open-format pepXML and protXML files. We have additionally used
Labkey server to convert these results into a simple .tsv file.
					
Brief descriptions of the pepXML and protXML files is given below.
NOTE: !PepXML and !ProtXML files are only generated for the iBAQ analysis workflow.

'''Phospho Sample Preparation and Mass Spectrometric Analysis Workflow'''

Our overall phosphopeptide process begins with the precipitated protein
generated as described above.  This precipitate was re-suspended and
then split into aliquots.  0-3 aliquots were generated depending upon
sample amount. As above, downstream analysis of these aliquots was
performed independently.  Consequently, they serve as sample
preparation replicates (when possible), which we refer to as
preplicates.  Each
preplicate was processed using our phospho-enrichment protocol to
generate an enriched pool of phosphorylated tryptic peptides. Briefly,
between 250 and 750ug of protein is digested with trypsin into
peptides.  These peptides are then run through a !TiO2 column to enrich
for phosphopeptides and then a graphite column to clean and desalt the
peptides.  Enriched phosphopeptides were analyzed by a Thermo Orbitrap
Fusion to generate a family of up to 3 preplicate raw files per
condition.  

3.3 Phospho Computational Analysis Workflow

Phospho data was analyzed by Proteome Discoverer software 
[http://www.thermoscientific.com/en/product/proteome-discoverer-software.html]
 to match MS data to
peptides and proteins from the Uniprot_HUM_031914_cRAP.fasta FASTA
file.  Proteome discoverer produces an .msf file which is converted to
an easily readable tsv file. The .msf (Magellan Storage File)
extension files are SQLite files generated by the Proteome
Discoverer. The .msf format is another file format (like pep.xml,
prot.xml) to capture all the relevant output information from the
Proteome Discoverer search results.

'''TMT Analysis Description'''

The goal of Tandem-Mass-Tag (TMT) analysis is to provide deep relative
quantification across conditions.  Unlike the phospho and iBAQ studies
where there are RAW files for each condition, !TMT data spans multiple
substrates.
4.2 TMT Sample Preparation and Mass Spectrometric Analysis Workflow

Our overall TMT workflow begins with the precipitated protein generated as described above.
The precipitate for each condition was re-suspended and then split
into 3 preplicate aliquots.

For each preplicate, the 7 substrate conditions are digested into
peptides and then labeled with a TMT reagent.  For example, in the
FIGURE: substrate S1, is labeled with TMT-127; substrate S2 is labeled
with TMT-128, etc.

Next, the 7 vials of labeled peptides are combined with a reference
lysate (a pool of 4 cell lines) and two internal replicates to make a
TMT-10-plex mixture. This 10-plex mixture is fractionated to achieve
greater depth.  The TMT-10-plex is fractionated by high-PH reverse
phase fractionation into 8 reverse-phase fractions (RP Fractions).
These 8 fractions are subsequently re-combined into 3 fractions.
Fractions were re-combined for cost and time reasons. Recombination
was done as follows: RP fractions 1,4 and 7 were pooled to make one
pooled sample; RP fractions 2,5,8 were pooled to make the second
sample; RP fractions 3 and 6 were pooled to make the third pooled
sample.  These samples were analyzed by a Thermo Orbitrap Fusion to
generate three raw files per cell line preplicate.

4.3 TMT Computational Analysis Workflow 

TMT data was analyzed by Proteome Discoverer software (Link) to match MS data to peptides and
proteins from the Uniprot_HUM_031914_cRAP.fasta FASTA
Database. Proteome discoverer produces an .msf file which is
converted to an easily readable tsv file. The .msf (Magellan Storage
File) extension files are SQLite files generated by the Proteome
Discoverer. The .msf format is another file format (like pep.xml,
prot.xml) to capture all the relevant output information from the
Proteome Discoverer search results.


'''OTHER REFERENCES'''

Proteome Discoverer User Guide
[https://tools.thermofisher.com/content/sfs/manuals/Man-XCALI-97506-Proteome-Discoverer-14-User-ManXCALI97506-A-EN.pdf]

Quantification with Proteome Discoverer
[https://tools.thermofisher.com/content/sfs/manuals/Quantification-with-Proteome-Discoverer-1-2.pdf]

Descriptions of proteome discoverer column results for phospho samples can be found at:
[https://sites.psu.edu/msproteomics/tag/psm/]