AboutBlogResearchPress & MediaResourcesConsultingContactLive

What’s in a DNA database?

What’s a DNA database and how is it used?

The US has expanded its collection of DNA information. How do these systems work?

US Senator Ron Wyden is requesting that the US Department of Homeland Security and Department of Justice explain details about the collection and storage of DNA information from immigrants. Wyden showed data indicating DHS has stored information on samples from 133,000 migrants younger than 18 in the FBI’s CODIS database, which is used to identify suspects in violent crimes. What can and can’t you tell from this kind of information?

Back to the basics, what is DNA?

DNA is present in almost every cell in our body, and we leave cells behind all the time. Dead skin, hair follicles, blood, saliva and other bodily fluids. If you’ve watched any CSI show, you’ll know that this “left behind” DNA is used by forensics to identify people, human remains, and define paternity. It looks simple on TV but is more complex in real life!

DNA needs to be extracted from the sample, generally 1 nanogram is enough. Then it is amplified using PCR.

How do we use DNA to tell people apart?

The human genome is overall identical amongst individuals – we all have genes for bones, skin etc. But there are small regions of variation (ex: hair color, height) that are different for each person. There are also non-coding regions of our DNA that are unique to each individual.

One specific area of these “non coding regions” called short-tandem repeats or STRs, which contain multiple short (3-4 nucleotides) repeated sequences. STRs are relatively easily to measure and compare between individuals, and the FBI established the frequency at which each STR occurs for individuals, and the frequency within different ethnic backgrounds, which narrows down individual identities, and creating “DNA-fingerprints”. The possibility of two people within the same ethnicity sharing the same DNA fingerprint is 1 in 575 trillion.

To ID a suspect, for example from a crime scene, you have to match the STRs between the suspect and the sample taken from the scene. If they do match, you then do a probability calculation to determine the frequency of each STR in the individual’s ethnic group. 

 Does this always work at 100%?

In the legal system, DNA is considered more reliable than other forms of evidence but it is not 100%. DNA can be poorly preserved or the quantity can be too small for analysis. Specifically, with degraded DNA, the probability for a false positive is higher because certain regions can be amplified more than others. Overall it is still a probability calculation, but a rather rigid one.

Once DNA is collected, where is the information stored?

In the US, the Combined DNA index system or CODIS is a FBI database where DNA profiles are stored. It has been historically used to identify criminal suspects, the idea being that samples are taken from criminals, whom may them be likely to re-commit a crime and those samples can later be matched to crime scene samples.

CODIS has a the software that uses supercomputers to identify matches between stored DNA and evidence samples.

But what can’t a DNA database tell you?

DNA is good at paternity testing, but not great at other relationships. For example, it is terrible at matching siblings, and even worse at cousins. It can only do an exact match, for example DNA from your toothbrush vs your coffee mug would be easy to match because they come from the same person.

We discussed the possibility of false positives – and you also can’t match a sample that doesn’t exist in the database.

Are there any problems with these practices?

Currently, outside of convicted criminals, it is illegal to collect DNA samples without consent. Note that when you voluntarily provide your DNA for companies like 23&Me, your data is not necessarily protected, as it forgoes HIPPA.

Lawyers also argue that by including DNA from these sampled minors in CODIS, their data gets queried every time the database is searched, which treats them as suspects. Recent analysis by the Georgetown law center on privacy and technology reveals that over a quarter million DNA samples have been added to CODIS in the past 4 months alone. They call this practice genetic surveillance and assert that it normalizes the use of genetic profiling – thus, this practice is a slippery slope.

Source: https://www.wired.com/story/dhs-and-doj-face-new-pressure-over-collecting-childrens-dna/

As covered in a segment on the Daily Tech News Show on July 19th 2025. For a more detailed discussion of the topic, listen here:


Discover more from Nicole Ackermans, CVN lab

Subscribe to get the latest posts sent to your email.