Haplotype networks in R

Haplotype networks are pretty useful for genomic analyses, especially when you’re looking at phylogeography (Shannon et al 2015) or if you’re looking at a region of special interest (Huerta-Sánchez et al 2014). A quick google search will turn up a number of blog posts that tell you how to use pegas to create a haplotype network, but many of them use the provided wood mouse dataset, which is already nicely formatted. If you’re using whole genome data (e.g. Illumina), it turns out to be a little more complicated than just importing a fasta file. In this post, I’ll show you how to get from a whole genome VCF to a haplotype network.

Pegas requires a fasta file (there are other input methods, but I found fasta to be the most straightforward) with your locus of interest (e.g. mitochondria) and requires all individuals to have the same sequence length. The data I’m starting with is a whole genome, all sites VCF file. In order to extract the mitochondria, I wrote a python script available here. To use it, you should supply a vcf file (gzipped or not is fine), an output filename, and a chromosome of interest:

python vcf2fasta.py -v individuals.vcf.gz -g -o mitochondria.fasta -c mitochondria

This script will go through the VCF file and append the appropriate reference or alternate bases for each individual and output a valid fasta file.

In theory, we should be able to plug it into pegas now because it’s a fasta file with all individuals and each individual should have the same sequence length in the mitochondria, especially since it’s already been aligned to the reference. In practice, however, this is not the case. Some individuals will be missing bases, and sometimes the reference has weird issues where sites show up twice, etc. To fix this, I use clustal to align the fasta file again. Either clustal omega or clustalw should work, but clustal omega is supposed to be faster. After running clustal, you should output a fasta file where every individual has the same number of bases. Once this is done, we can finally use pegas to build a network.

input <- "MChloroplast.fasta"
d <- ape::read.dna(input, format='fasta')
e <- dist.dna(d)
h <- pegas::haplotype(d)
h <- sort(h, what = "label")
(net <- pegas::haploNet(h))
ind.hap<-with(
stack(setNames(attr(h, "index"), rownames(h))),
table(hap=ind, pop=rownames(d)[values])
)
plot(net, size=attr(net, "freq"), scale.ratio=0.2, pie=ind.hap)
legend(-8, 0, colnames(ind.hap), col=rainbow(ncol(ind.hap)), pch=19, ncol=2)

If everything goes right, you should get something like this:

unnamed-chunk-3-1

Advertisements

13 thoughts on “Haplotype networks in R

  1. Min Huang

    hello,i used your pytohon script,but it told me “No module named argparse”,why?and i need your your help thank u very much!

  2. Min Huang

    hi,i have install the package argparse,but when i run the csript,there is a mistake:_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) line 29,and then i changed ‘r’ to ‘rt’,it run,but there are nothing ,just like this:”>XXH_XXH8-1″,but no sequence,why?thanks

  3. Hi,

    I’m running into problems with the python script. I phased my vcf file through ShapeIt and I’ve been using the following command:

    python vcf2FASTA.py -v Cadam.phased.vcf -o CadamAllPhasedSamples.fasta -c Cadamanteus_PLA2region_BAC_29M24

    Every time I try to run the script, I get this error message:

    Traceback (most recent call last):
    File “vcf2FASTA.py”, line 52, in
    base = alt_list[int(call)-1]
    ValueError: invalid literal for int() with base 10: ‘0|0’

    I’m not entirely familiar with coding, but I’m guessing that it’s rejecting my format. Otherwise, I’m not sure what I’m doing wrong.

    Any suggestions?

    1. Hi,

      Could you post a few lines of your VCF? I suspect this has to do with the fact that the script expects haploid calls (e.g. 0) instead of diploid calls (e.g. “0|0”). It should be straightforward to add in another line that splits the genotype call at “|”.

  4. marino

    Hi,
    I am having the same problem as abigelowsite. This is how my vcf file looks like for some individuals:
    GT:DP:DPR:RO:QR:AO:QA:GL 0/0:9:9,0:9:347:0:0:0,-2.70927,-31.6156 1/1:4:4,4:0:0:4:141:-13.0425,-1.20412,0 0/1:6:6,4:2:70:4:151:-12.1613,0,-4.84382

    It was not clear to me how to add in another line that splits the genotype call at “|”. It would be great if you could explain that in a bit more detail.
    Thank you so much!

  5. J

    Hi, thanks for the post! Is there a way to change the colors of the pie charts? I am using the col but that doesn’t change them

  6. Christian Anderson

    Note that Pegas is unable to create loops in networks even where it is appropriate to do so. For example:
    AAAAA
    AAAAT
    AAAAC
    should create a triangular network, but instead creates a V.

  7. José

    Thanks for the post, but I get this error that I do not fully understand

    > ind.hap<-with(
    + stack(setNames(attr(h, "index"), rownames(h))),
    + table(hap=ind, pop=rownames(d)[values])
    + )
    Error in (function (classes, fdef, mtable) :
    unable to find an inherited method for function ‘raster’ for signature ‘"integer"’

  8. Dargwin

    Hi, i get the following error:

    unable to find an inherited method for function ‘raster’ for signature ‘”integer”

    In the step:

    stack(setNames(attr(h, “index”), rownames(h))),
    table(hap=ind, pop=rownames(d)[values])
    )

    I suspect that is because i also have installed (but not loaded) the raster package.

    Do somebody knows how can i fix this error?

    Thanks!

  9. Zach

    Hey Arun,

    This is a great little script. It was extremely to use. Thank you so much for posting it. I’m using the figure in an upcoming paper, and I would like to give you credit.

    Did you use this script in a publication or a thesis/dissertation or something, so I can cite it? I’d feel a lot better if I knew you were able to get credit for it.

    Good luck with your work.

    Zach

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s