The first immortal human cell line ever produced, HeLa, originated from a cervical adenocarcinoma taken from Henrietta Lacks. The cell line grew so well that it was used in many laboratories and soon was found to contaminate other cell lines. Now HeLa RNA has made its way into human sequence databases.
Although the cause of Henrietta Lacks’ cervical tumor was not known in her lifetime, we now understand that it was triggered by infection with human papillomavirus (HPV) type 18. When this virus infects the cervical epithelium, the viral DNA may integrate into the host genome, causing the cells to become transformed and eventually malignant. HeLa cells are known to contain integrated HPV18 DNA.
There are many different types of cancer, each caused by errors in DNA. The Cancer Genome Atlas (TCGA) is a database for collecting the DNA sequences of diverse cancers from many different individuals. It was established to help understand what mutations cause various types of cancer. As viruses are known to be responsible for about 20% of human cancers, searching this database for viral sequences can advance our understanding of their role in this disease. For example, almost every genome from patients with cervical cancer contains HPV DNA.
A recent search of the TCGA for viral sequences revealed that, in addition to cervical cancer, HPV18 sequences were found in many other cancers, including colon, head and neck, kidney, liver, lung, ovary, rectum, and stomach. The HPV18 sequences in non-cervical cancers resembled the viral sequence found in HeLa cells, both in integration site and single nucleotide variations. In other words, the HPV18 in these cancers closely matches that of the viral genome integrated into HeLa cells, and their presence is likely due to contamination.
Further analysis revealed that the contaminated samples originated from only two genome sequencing centers, the University of North Carolina Lineberger Comprehensive Cancer Center, and the Michael Smith Genome Sciences Centre of the British Columbia Cancer Agency. All the contamination took place in 2011 and 2012, and was limited to 18 (6%) of the sequencing machines.
The contamination with HeLa nucleic acid was observed only in datasets derived from sequencing of RNA, not DNA. I asked the senior author Jim Pipas how he thought this contamination might have taken place:
I can think of two possibilities. One is that the RNA isolated from the tumor was somehow contaminated with HeLa sequences. The other is that HeLa cell RNA was sequenced on the same machine as the tumors and the contamination is from the sequencing machine itself.
It is well known that nucleic acids can become contaminated during their manipulation in the laboratory. The use of sensitive techniques such as PCR and deep sequencing reveal such contamination when it previously went unnoticed. High profile examples of nucleic acid contamination include the retrovirus XMRV associated with chronic fatigue syndrome, and a virus believed to cause hepatitis (a contaminant from laboratory plasticware).
As virus discoverer Eric Delwart noted on TWiV 86, ‘DNA is a real problem. It’s everywhere’. Apparently so is HeLa cell RNA.