Main
GStream is a method that combines SNP and CNV genotyping with unprecedented accuracy. This new method outperforms previous well-established SNP- and CNV-genotyping software for the Illumina platform.
With GStream, genetic researchers will have a fast and powerful tool to leverage Illumina genotyping microarrays and find new genetic variants and associations of interest.
GStream method and its corresponding software has been developed at the Grup de Recerca de Reumatologia (GRR) which is a research group from the Institut de Recerca de l'Hospital Universitari Vall d'Hebron (VHIR).
GStream software
A software to apply our genotyping method to a set of SNP probes can be freely downloaded from this link as precompiled binaries or as a source code. To compile the code you need a c++ compiler, the standard libraries, and both Armadillo and IT++ C++ libraries properly installed.
As an input, the code use a data file format that can be directly extracted from GenomeStudio projects (tab-separated columns):
| Name | Chr | Position | Samp1.X | Samp1.Y | Samp2.X | Samp2.Y | ... |
|---|---|---|---|---|---|---|---|
| rs2106940 | 7 | 88560825 | 0.335 | 0.385 | 0.578 | 0.010 | ... |
| rs2106943 | 7 | 24824508 | 0.030 | 0.915 | 0.508 | 0.509 | ... |
| rs2106949 | 16 | 18032061 | 1.489 | 0.117 | 1.549 | 0.207 | ... |
| rs2106966 | 7 | 117796542 | 1.032 | 0.063 | 0.650 | 0.778 | ... |
| ... | ... | ... | ... | ... | ... | ... |

| --goutput <file> | Defines the SNP genotypes output file (default=GStream.snp) | |
|---|---|---|
| --cnvoutput <file> | Defines the CNV scores output file (default=GStream.cnv) | |
| --outliers <file> | Defines the sample outliers file | |
| --og | Computes only SNP genotypes | |
| --noz | Turns off homozygous deletion detection (--og turns on automatically) | |
| --gqt X | Sets the quality score threshold for genotyping. X is a floating point number (default=0) | |
| --stdcnp X | Minimum distance (number of STDs) to call amplifications/deletions under the one component mode. X is a floating point number (default=8) | |
| --res N | Sets the number of BAF bins used for density estimation. N is an integer number (default=40) |
SNP genotypes output file
| SNP | CHR | BP | QC | Samp1 | Samp2 | ... |
|---|---|---|---|---|---|---|
| rs12905389 | 15 | 20071673 | 0.871 | 3 | 2 | ... |
| rs12909397 | 15 | 20071765 | 0.816 | 3 | 2 | ... |
| rs10163108 | 15 | 20151610 | 0.888 | 3 | 3 | ... |
SNP genotypes quality scores output file
| SNP | Samp1 | Samp2 | ... |
|---|---|---|---|
| rs12905389 | 0.993 | 0.966 | ... |
| rs12909397 | 0.950 | 0.885 | ... |
| rs10163108 | 0.998 | 0.995 | ... |
CNV scores output file
| ID | CHR | BP | NC | MaxRat | Samp1 | Samp2 | ... |
|---|---|---|---|---|---|---|---|
| rs12905389 | 15 | 20071673 | 1/1 | 0.559/1.181 | 1.91 | 2 | ... |
| rs12909397 | 15 | 20071765 | 2/2 | 0.612/0.679 | 1.777 | 2 | ... |
| rs10163108 | 15 | 20151610 | -1/1 | -1/1 | 1.915 | 2.141 | ... |
GWAS-based CNV SCAN on human traits
In order to identify putative causal CNVs we have analyzed the linkage disequilibrium patterns between all the trait-associated SNPs reported by the catalog of published genome-wide association studies (NHGRI) and the CNV microarray markers detected over the HumanOmni1-Quad platform. Trait-associated SNP genotypes were extracted from the 1KGP data and CNV genotypes were called with GStream. All the HumanOmni1-Quad markers that presented a non-diploid frequency greater than 1% (CNV markers) were included in the analysis.
The following link provides access to regularly updated results of new CNV associations within known human risk loci identified with this method:
Downloads
| GStream for Windows | Download |
|---|---|
| GStream source | Download |
| Visualization script | Download |
Example case
Running GStreamAn example dataset can be downloaded from this link. For running GStream with the default parameters download it to the same directory than the executable gstream.exe and run:
gstream.exe --input GStream.txt
Visualizing the outputThe output files created by GStream and the input file can be used by the script plotGTCN.py to visualize the results. Since all these files have the same prefix (“GStream”) we can use the option “–p GStream” to do that:
plotGTCN.py -p GStream -d figures -f 0
which will create a plot per probe within the ./figures directory. The remaining options are available in the script help:
plotGTCN.py -hUsage: plotGTCN.py [options]
This program generates SNP and CNV genotype plots from GStream output data.
Options:
| -h | show this help message and exit | |
|---|---|---|
| -p PREF | prefix for intensity, SNP, CNV and log files | |
| -i FXY | intensity file (required if prefix not provided) | |
| -s FSNP | SNP genotype file (required if prefix not provided) | |
| -c FCNV | CNV genotype file (required if prefix not provided) | |
| -L FLOG | GStream log file (required if prefix not provided) | |
| -d DIR | directory to save figures (required) | |
| -f F | lower frequency threshold for plotting SNP probe (optional) | |
| -A TA | score threshold for amplifications (optional) | |
| -D TD | score threshold for deletions (optional) |