Genomes
The Genomes controlled vocabulary contains the names, descriptions, and chromosome names and sizes of different Genome assemblies, e.g., hg19.
To begin with, DeepBlue contains the assembly hg19, but new genome assemblies can be inserted easily using the add_genome command.
The following code example shows how to insert a new genome assembly with the chromosome names and sizes.
The variable genome_data
contains text with the chromosome name and its size on each line.
genome_data = """chr1 1000000
chr2 900000
chr3 500000
chrX 100000"""
print server.add_genome("hgX", "Example of Genome for the Manual",
genome_data, user_key)
Please be aware that not all users have permission to insert a new genome into DeepBlue.
The command chromosomes lists all the names and sizes of the chromosomes of a genome assembly:
print server.chromosomes("hgX", user_key)
The add_genome
command also creates an annotation containing all chromosomes and their sizes.
The annotation name is identical to that of the genome.
The annotation can be found using the list_annotations:
print server.list_annotations("hgX", user_key)
Use the select_annotations and get_regions commands to obtain the genome's annotation:
(s, chromosomes_annotation) = server.select_annotations("hgX", "hgX", None,
None, None, user_key)
(s, regions) = server.get_regions(chromosomes_annotation, "CHROMOSOME,START,END",
user_key)
print regions
It will print:
chr1 0 1000000
chr2 0 900000
chr3 0 500000
chrX 0 100000
The select_annotations
and get_regions
commands will be explained in more detail in the Exploring the Data section.
Genomic Sequences
DeepBlue also stores genomic sequences from the genome assemblies which can be used for data filtering and analysis.
The sequences of hg19
genome assembly chromosomes are already included in DeepBlue.
The upload_chromosome command is used to upload the genomic sequences of other genome assemblies:
# Example sequence!
data = "ACTGACTGCG" * 100000
print server.upload_chromosome("hgX", "chr1", data, user_key)
The Working with Sequences section discusses how to access and use the genomic sequences.
A list of all possible commands that can be applied within the genomes controlled vocabulary is available at DeepBlue API - Inserting and listing Genomes.