Aggregation

DeepBlue can aggregate groups of regions into larger regions. The aggregate command requires three parameters: query_data_id, query_regions_id, and the data column name that will be the aggregation pivot.

It is possible to retrieve the aggregation result with the aggregation metafield:

Aggregation Command Description
@AGG.MIN Regions minimum value
@AGG.MAX Regions maximum value
@AGG.MEDIAN Regions median value
@AGG.MEAN Regions mean
@AGG.VAR Regions variance
@AGG.SD Regions standard deviation
@AGG.COUNT Regions count

Please, remember that DeepBlue does not aggregate inter-region regions, only the regions that are fully overlapped by some query_region_id.

In the following example, we aggregate the retrieved data into tiling regions of length 100000. In the, end we remove the aggregated regions that do not contain any region:

import xmlrpclib
import time

url = "http://deepblue.mpi-inf.mpg.de/xmlrpc"
server = xmlrpclib.Server(url, allow_none=True)
user_key = "anonymous_key"

(status, experiments) = server.list_experiments("hg19", None,
                                      "DNA Methylation", "GM19240", "",
                                      "RRBS", "ENCODE", user_key)

experiments_name = [e[1] for e in experiments]
(status, data_id) = server.select_regions(experiments_name, None, None,
                                          None, None, None,
                                          "chr22", None, None, user_key)

(s, regions_id) = server.tiling_regions(100000, "hg19", "chr22", user_key)

(status, aggr_id) = server.aggregate(data_id, regions_id, "SCORE", user_key)

(status, aggr_filter) = server.filter_regions(aggr_id, "@AGG.COUNT",
                                              ">", "0", "number", user_key)

(status, request_id) =  server.get_regions(aggr_filter,
 "CHROMOSOME,START,END,@AGG.MIN,@AGG.MAX, @AGG.MEDIAN,@AGG.MEAN,@AGG.SD,@AGG.COUNT",
                         user_key)

# Wait for the server processing
(status, info) = server.info(request_id, user_key)
request_status = info[0]["state"]
while request_status != "done" and request_status != "failed":
  time.sleep(1)
  (status, info) = server.info(request_id, user_key)
  request_status = info[0]["state"]
  print request_status


(status, regions) = server.get_request_data(request_id, user_key)

print regions

results matching ""

    No results matching ""