The complete 1,000 Genomes Project is now available on Amazon Web Services (AWS) as a publicly available data set. This was announced by AWS and the U.S. National Institutes of Health (NIH) at the White House Big Data Summit at the end of March.
This announcement highlights the largest collection of human genetics available to researchers worldwide, completely free of charge. The project is an international research effort coordinated by a consortium of 75 companies and organizations to establish a detailed catalog of human genetic variation.
The project has grown to 200 terabytes of genomic data including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research. The 1,000 Genomes Project aims to include the genomes of more than 2,600 individuals from 26 populations around the world. The NIH serves as one of the data coordinators for the 1000 Genomes Project, and will continue to add the remaining genome samples to the public data set.
Public Data Sets on AWS provide a centralized repository of public data stored and manipulable on remote servers (i.e., in “the cloud”), eliminating the need for researchers to move the data in-house and then procure enough technology infrastructure to analyze it effectively. On the other hand, while the data is free, and can be downloaded, AWS computing services for manipulating the data remotely may require fees.