Individual Statistics Pipeline
Individual Statistics Pipeline
This repository provides a unified interface for performing SNP-level, window-based, and gene-centric association inference using knockoff-based or SCANG-based methods. It supports both uncorrelated and correlated samples, integrates SAIGE for mixed-model GWAS, and provides automated FDR control.
Supported Methods
| Method | Input Data | Sample Type | Description |
|---|---|---|---|
| KnockoffScreen | SNP genotypes | Uncorrelated | SNP-level and sliding-window inference |
| GeneScan3DKnock | SNP genotypes | Uncorrelated | Gene-centric inference via multiscale aggregation |
| BIGKnock | SNP genotypes (&GRM) | Correlated | Gene-centric inference with GLMM models |
Installation
1. Install SAIGE (with conda environment)
conda env create -f environment-RSAIGE.yml
conda activate RSAIGE
FLAGPATH=`which python | sed 's|/bin/python$||'`
export LDFLAGS="-L${FLAGPATH}/lib"
export CPPFLAGS="-I${FLAGPATH}/include"2. Install Dependencies
Rscript install_packages.RInput Requirements
Required Inputs
| Argument | Description |
|---|---|
--pheno_file | Phenotype file (CSV/TSV) containing ID column |
--geno_file | Genotype file prefix in PLINK bed/bim/fam format |
--phenotype | Column name of phenotype |
--genome_build | “hg19” or “hg38” |
--sample_uncorrelated | TRUE/FALSE |
Optional Inputs
| Argument | Description | Default |
|---|---|---|
--sliding_window_length | Comma-separated list of window sizes (in bp) for window-based inference. Only used when --test_type = "Single_Window". | "1000,5000,10000" |
--M | Number of knockoff copies. | 5 |
--geno_missing_imputation | Missing genotype imputation method (fixed or mean). | "fixed" |
--plink_path | Path to PLINK executable. | "plink" |
--genome_build | Genome build (hg19, hg38) used for annotation and window mapping. | "hg19" |
--sample_uncorrelated | Whether samples are uncorrelated (TRUE/FALSE). | TRUE |
--fdr | Target false discovery rate. | 0.1 |
--grm_file | GRM matrix (.grm + .grm.id) required only for correlated sample methods, e.g., BIGKnock. Ignored otherwise. | NULL |
--pheno_id | Column name of sample ID in phenotype file. Required when phenotype table contains ID-like columns. | NULL |
--covariates | Comma-separated covariate names. Optional for all methods; used only when covariates are included in association models. | NULL |
--user_cores | Number of CPU threads used. | 1 |
Pipeline Usage
SNP-level and window-based inference (Uncorrelated Samples)
Rscript pipeline.R \
--outdir "result/" \
--test_type "Single_Window" \
--pheno_file "$pheno_file" \
--grm_file "$grm_file" \
--geno_file "$geno_file" \
--phenotype "Y" \
--pheno_id "id" \
--covariates "X1" \
--user_cores 4 \
--sliding_window_length "1000,5000,10000" \
--geno_missing_imputation "fixed" \
--plink_path "plink" \
--M 5 \
--genome_build "hg19" \
--sample_uncorrelated TRUE \
--fdr 0.1Gene-Centric Inference (Uncorrelated Samples)
Rscript pipeline.R \
--outdir "result_gene/" \
--test_type "Gene_Centric_Coding" \
--pheno_file "$pheno_file" \
--geno_file "$geno_file" \
--phenotype "Y" \
--pheno_id "id" \
--covariates "X1,X2" \
--sliding_window_length "20000,50000" \
--genome_build "hg19" \
--sample_uncorrelated TRUE \
--M 10 \
--fdr 0.1Gene-Centric Inference (Correlated Samples)
If --grm_file is not provided, the pipeline will generate GRM using SAIGE automatically.
Rscript pipeline.R \
--outdir "result_bigknock/" \
--test_type "Gene_Centric_Coding" \
--pheno_file "$pheno_file" \
--grm_file "$grm_file" \
--grm_id_file "$grm_id_file" \
--geno_file "$geno_file" \
--phenotype "Y" \
--pheno_id "id" \
--covariates "age,sex,PC1,PC2" \
--genome_build "hg38" \
--sample_uncorrelated FALSE \
--M 5 \
--fdr 0.1Output Files
Each pipeline run outputs:
For SNP-level and Window-level inference
Single_Window_results_chr*.csv- Manhattan/Q–Q plots
For gene-centric inference
Gene_results_chr*.csv- Manhattan/Q–Q plots
Example Directory Structure
- genotype.bed
- genotype.bim
- genotype.fam
- phenotype.txt
- pipeline.R
- install_packages.R
- Single_results_chr1.csv
- Window_results_chr1.csv
- Gene_results_chr1.csv