The first immortal human cell line ever produced, HeLa, originated from a cervical adenocarcinoma taken from Henrietta Lacks. The cell line grew so well that it was used in many laboratories and soon was found to contaminate other cell lines. Now HeLa RNA has made its way into human sequence databases.
Although the cause of HenriettaÂ Lacksâ€™ cervical tumor was not known in her lifetime, we now understand that it was triggeredÂ by infection with human papillomavirus (HPV) type 18. When this virus infects the cervical epithelium, the viral DNA may integrate into the host genome, causing the cells to become transformed and eventually malignant. HeLa cells are known to contain integrated HPV18 DNA.
There are many different types of cancer, each caused by errors in DNA. The Cancer Genome Atlas (TCGA) is a database for collecting the DNA sequences of diverseÂ cancers from many different individuals. It was established to help understand what mutations cause various types of cancer.Â As viruses are known to be responsible for about 20% of human cancers, searching this database for viral sequences can advance our understanding of their role in this disease. For example,Â almost every genome from patients with cervical cancer containsÂ HPV DNA.
A recentÂ search of the TCGA for viral sequencesÂ revealed that, in addition to cervical cancer, HPV18 sequences were found in many other cancers, including colon, head and neck, kidney, liver, lung, ovary, rectum, and stomach. The HPV18 sequences in non-cervical cancers resembled the viral sequence found in HeLa cells, both in integration site and single nucleotide variations. In other words, the HPV18 in these cancers closely matches that of the viral genome integrated into HeLa cells, and their presence is likely due to contamination.
Further analysis revealed that the contaminated samples originated from only two genome sequencing centers, the University of North Carolina Lineberger Comprehensive Cancer Center, and the Michael Smith Genome Sciences Centre of the British Columbia Cancer Agency. All the contamination took place in 2011 and 2012, and was limited to 18 (6%) of the sequencing machines.
The contamination with HeLa nucleic acid was observed only in datasets derived from sequencing of RNA, not DNA. I asked the senior author Jim Pipas how he thought this contamination might have taken place:
I can think of two possibilities.Â One is that the RNA isolated from the tumor was somehow contaminated with HeLa sequences. The other is that HeLa cell RNA was sequenced on the same machine as the tumors and the contamination is from the sequencing machine itself.
It is well known that nucleic acids can become contaminated during their manipulation in the laboratory. The use of sensitive techniques such as PCR and deep sequencing reveal such contamination when it previously went unnoticed. High profile examples of nucleic acid contamination include the retrovirus XMRVÂ associated with chronic fatigue syndrome, and a virus believed to cause hepatitis (a contaminant from laboratory plasticware).
As virus discoverer Eric Delwart notedÂ on TWiV 86,Â ‘DNA is a real problem. Itâ€™s everywhereâ€™.Â Apparently so is HeLa cell RNA.
Miguel Romero says
The contamination reported by Drs. Cantalupo, Katz, and Pipas in regard to HeLa Nucleic Acid and HPV18  appears to also apply to HIV. In 2014 I was able to find HIV-1 DNA in a surprisingly wide variety of unrelated taxa. Like Professor Racaniello I assumed “their presence is likely due to contamination”. However it is a mystery why I am not able to find any contaminations with the far more prevalent Hepatitis B and Hepatitis C viruses (which are associated with HIV infections). My results can be found here .
Similar cases of contaminations may occur when laboratories use â€œlab strainsâ€ such as JR-CSF or HXB2 to validate their PCR primers. Then when they amplify and sequence virus from patients some turn out to be the â€œlab strainâ€. On the other hand, labs may harbor sequences from one patient contaminated by another patient. In situations such as these it will not be possible be certain about the true origin of the sequences.
1. Cantalupo PG, Katz JP, Pipas JM. HeLa nucleic acid contamination in The Cancer Genome Atlas leads to the misidentification of human papillomavirus 18. J Virol 2015; 89: 4051-7.
2. Romero FernÃ¡ndez-Bravo M. Contamination of genomic databases by HIV-1 and its possible consequences. A study in Bioinformatics. 2014. http://openaccess.uoc.edu/webapps/o2/handle/10609/31361