The Pistoia Alliance Sequence Squeeze Competition

Congratulations!

James Bonfield of the Sanger Institute was the overall winner. Read full details on the Pistoia Blog.

The volume of next-generation sequencing data is a big problem. Data volumes are growing rapidly as sequencing technology improves. Individual runs are providing many more reads than before, and decreasing run times mean that more data can today be generated by a single machine in one day than a single machine could have produced in the whole of 2005.

Storing millions of reads and their quality scores uncompressed is impractical, yet current compression technologies are becoming inadequate. There is a need for a new and novel method of compressing sequence reads and their quality scores in a way that preserves 100% of the information whilst achieving much-improved linear (or, even better, non-linear) compression ratios.

The Pistoia Alliance, in the interests of promoting pre-competitive collaboration, is putting forward a prize fund of US$15,000 to the best novel open-source NGS compression algorithm submitted before the closing date of 15 March 2012.

Follow us on Twitter: @SeqSqueeze

Current leaderboard

Times are in seconds, memory is average total use in Kbytes. Measured using the time command with %e and %K options. ALL valid entries shown (valid = ran successfully and produced at least one correct fully matching result).



Name, institute & entry ID (link to source)
Compress.
ratio
 
Compress.
time
 
Compress.
memory
 
Decompress.
time
 
Decompress.
memory
 
Header
mismatches
 
Sequence
mismatches
 
Quality
mismatches
 
Matt Mahoney, Dell Inc.(96) 0.0287 905.79 5293280 394.63 5294704 0 0 24033620
Armando J. Pinho, IEETA / Universidade de Aveiro(78) 0.0536 3035.17 5200 3010.02 5200 16629547 16629548 16629548
Armando J. Pinho, IEETA / Universidade de Aveiro(79) 0.0536 3018.11 5200 3026.36 5200 16628724 16628724 16628724
Armando J. Pinho, IEETA / Universidade de Aveiro(69) 0.0546 3404.17 68986880 3389.73 68984480 16512460 16512461 16512461
James Bonfield, Sanger Institute(101) 0.1141 3280.24 13235808 325.25 4381552 0 0 0
James Bonfield, Sanger Institute(104) 0.1142 10299.99 13272736 342.79 4381552 0 0 0
James Bonfield, Wellcome Trust Sanger Institute(17) 0.1154 208.42 932976 293.05 932880 0 0 24030818
James Bonfield, Sanger Institute(97) 0.1162 3276.83 13240864 318.46 4186912 0 0 0
Matt Mahoney, Dell Inc.(99) 0.1166 1218.88 5398224 769.26 5399808 0 0 0
Matt Mahoney, Dell Inc.(90) 0.1183 1062.8 5463456 610.63 5464128 0 0 0
James Bonfield, Sanger Institute(86) 0.1245 11585.89 13239776 334.89 1635408 0 0 0
Matt Mahoney, Dell Inc.(85) 0.1254 930.87 5716624 653.64 5756576 0 0 0
Matt Mahoney, Dell Inc.(82) 0.1255 924.38 5738432 660.85 5769200 0 0 0
James Bonfield, Sanger Institute(62) 0.1273 10476.97 13231488 335.6 2063344 0 0 0
James Bonfield, Sanger Institute(60) 0.1273 22063.57 13516544 123.86 4198048 23834062 23834062 23834062
Davide Cittaro, Center for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute(34) 0.1323 3194.87 228512 5398.99 193584 0 0 24032852
Davide Cittaro, Center for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute(32) 0.1328 5020.04 123504 6250.11 124320 1319193 0 24032852
Armando J. Pinho, IEETA / Universidade de Aveiro(108) 0.1693 11820.43 21999248 11245.79 21998992 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(102) 0.1695 8107.2 17505408 8272.33 17505184 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(98) 0.1695 8123.18 33889264 8091.45 33889056 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(100) 0.1696 7814.66 17112192 7903.5 17112000 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(103) 0.1698 4819.01 16980272 4887.3 16980448 0 0 0
James Bonfield, Sanger Institute(72) 0.1709 525.41 22240640 600.29 22231504 0 0 0
Daniel Jones, University of Washington(105) 0.1718 306.67 2747120 333.76 1631600 0 0 0
Daniel Jones, University of Washington(92) 0.172 244.89 2472832 284.79 1638800 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(80) 0.1726 8184.44 8725168 8282.31 8724928 0 0 0
James Bonfield, Sanger Institute(52) 0.1727 429.94 21285824 483.35 21284640 0 0 0
Daniel Jones, University of Washington(64) 0.1729 392.08 2097280 484.24 1543568 0 0 0
James Bonfield, Sanger Institute(66) 0.1729 362.7 1937056 434.2 1935472 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(71) 0.173 8509.41 8725168 8573.91 8724944 0 0 0
Daniel Jones, University of Washington(47) 0.1743 250 1699328 495.06 1699104 0 0 0
Daniel Jones, University of Washington(91) 0.1743 154.58 1581120 291.66 1580960 0 0 0
James Bonfield, Sanger Institute(36) 0.1744 347.64 1877504 472.03 1877456 0 0 0
Daniel Jones, University of Washington(44) 0.1748 270.42 2080960 1.28 2206160 24040369 24040699 24040699
Daniel Jones, University of Washington(45) 0.1748 272.66 2015696 491.61 2121696 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(87) 0.175 4949.94 4325680 5101.93 4325440 0 0 0
Seth Hillbrand, Columbia University(30) 0.1754 741.79 3685616 726.04 3621744 0 0 0
Matt Mahoney, Dell Inc.(70) 0.1756 1149.7 5790848 1275.28 5937152 0 0 0
Daniel Jones, University of Washington(38) 0.1758 234.57 1866304 448.68 1958336 0 0 0
James Bonfield, Wellcome Trust Sanger Institute(16) 0.1762 228.55 933248 328.48 933200 0 0 0
Seth Hillbrand, Columbia University(25) 0.1767 792.24 1545808 739.96 1514176 0 0 0
Matt Mahoney, Dell Inc.(67) 0.177 1065.51 5863568 1203.09 5933024 0 0 0
James Bonfield, Sanger Institute(83) 0.177 112.69 244736 142.05 236032 0 0 0
Daniel Jones, University of Washington(37) 0.1774 207.7 838592 396.21 889680 0 0 0
James Bonfield, Sanger Institute(51) 0.1775 128.25 334800 150.26 333616 0 0 0
Seth Hillbrand, Columbia University(24) 0.1779 596.33 1423024 544.76 1391296 0 0 0
Seth Hillbrand, Columbia University(18) 0.1781 796.83 3750080 851.56 3718256 0 0 0
Daniel Jones, University of Washington(29) 0.1793 203.58 1063712 571.36 1276576 0 0 0
James Bonfield, Sanger Institute(61) 0.1797 109.9 75664 156.98 74480 0 0 0
Seth Hillbrand, Columbia University(14) 0.1797 859.58 1686176 827.45 1659056 0 0 0
Yongwook Choi, J. Craig Venter Institute(77) 0.1803 1445.22 2348160 1591.64 2348160 0 0 0
James Bonfield, Sanger Institute(35) 0.1803 164.52 98768 219.13 98704 0 0 0
Seth Hillbrand, Columbia University(13) 0.1813 794.66 2744944 765.04 2690816 0 0 0
James Bonfield, Wellcome Trust Sanger Institute(15) 0.1818 137.56 149888 187.31 149840 0 0 0
Seth Hillbrand, Columbia University(12) 0.1824 518.82 10897392 474.45 10639488 0 0 0
Daniel Jones, University of Washington(23) 0.1841 274.99 1010736 395.09 1147136 0 0 0
James Bonfield, Sanger Institute(33) 0.1847 135.95 35760 189.27 35680 0 0 0
James Bonfield, Sanger Institute (+denizens of encode.ru)(8) 0.1878 141.2 423360 164.19 411936 0 0 0
Armando J. Pinho, IEETA / Universidade de Aveiro(93) 0.1884 2322.17 68736 2394.99 68480 0 0 0
Ibrahim Numanagic, Simon Fraser Uni(107) 0.1993 691.24 25973392 414.91 6067872 153675 0 0
James Bonfield, Wellcome Trust Sanger Institute(7) 0.2206 188.38 36512 100.91 16016 0 0 0
Markus Krisch, i-novation.de(109) 0.2241 449.62 24651072 454.91 5526784 0 0 0
Inbal Landsberg, Or Peled, Barak Yacov, Yonatan Amir, Dan Benjamin and Ron Reiter, Interdisciplinary Center (IDC) Herzliya(95) 0.2289 5565.23 6594288 3850.37 5973312 198799 2403 2403
Ron Reiter, Interdisciplinary Center (IDC) Herzliya(9) 0.24 738.27 27664 351.21 16032 0 0 0
Ryan Braganza, Intersect Australia(39) 0.2563 6753.88 14272 296.78 54900656 23834062 23834062 23834062
Competition Baseline, SequenceSqueeze(6) 0.3007 1020.97 5200 104.5 5008 0 0 0
Ryan Braganza, Intersect Australia(28) 0.301 1299.66 15040 331.02 13472 0 0 0
Ryan Braganza, Intersect Australia(43) 0.3072 1595.34 16144 493.9 53856 0 0 0
Ryan Braganza, Intersect Australia(40) 0.3072 1555.38 18960 1120.29 59433232 23834062 23834062 23834062
Ryan Braganza, Intersect Australia(41) 0.3072 1540.58 17152 1118 59433792 23834062 23834062 23834062
Ryan Braganza, Intersect Australia(42) 0.3072 1550.46 18976 997.72 17120 23834062 23834062 23834062

The judging panel

Yingrui Li is the Duty Operation Officer of Science & Technology Department of the BGI-Shenzhen. His research focuses on genomics and related bioinformatics analysis, the bioinformatics analytical platform based on the next-generation sequencing of BGI was constructed with his great help. He organized or took part in a number of national projects, including ""Yan Huang Project", "1000 Genomes Project", "Yan Huang Whole Genome Methylation", "Cancer Genome Project", etc. He has authored 28 high-impact papers, first/co-first author for 11 of them. Among those 28, 16 are published in Nature (including Nature series) and Science.

One of the founders of the Pistoia Alliance and president of the Alliance since 2009, Dr. Nick Lynch works with the board on strategy and leads the operational team that focuses on project delivery and daily maintenance of the Alliance.

Guy Coates leads the Informatics Systems Group at the Wellcome Trust Sanger Institute. His team is responsible for designing and maintaining the high performance compute (HPC) environment which support the Institute's high-throughput DNA sequencing platforms. As well as traditional HPC, his group also has an interest in promoting cloud and grid technologies to enable investigators to mange and share their data effectively.

Tim Fennell is the Assistant Director for Sequencing Pipeline Informatics at the Broad Institute. His team developed and maintains the Java language bindings for the popular SAM and BAM file formats as well as the Picard suite of tools for working with Next-Generation sequencing data. In his role at the Broad Tim and his team are responsible for all aspects of production pipeline development and operations for the Broad's fleet of next-generation sequencing instruments which are capable of producing several terabases of sequence per day.