The M and M example
Steps
For this example, I had two types of bags, one had pictures of the movie Frozen and the other Starwars. You don't have to use those pictures. Marshmellows are our spike ins. Therefore, The one thing I tell the students is that the number of mashmellows is consistant between the two bowls.
- Have half of the room use Frozen bag and fill it with one cup of candy from Bowl 1. Have the other half of the room fill a starwars bag with 2 cups of candy.
- Have everyone count each color of M and M and each size of marshmellow. Each person then puts it into the google doc under the "master numbers" tab. Eah individal gets to decide how they "count" the skittles.
- Sample size-- How similar are biological replicaes?-- Depends on seq depth and expression level of the gene.:
- Each person does fold change with a partner that has the same bag type as them. The fold change is parter_1_value/partner_2_value. For example, Partner1 has 4 blue M and Ms and partner 2 has 2 blue M and Ms so the fold change is 2. Put those numbers in the partner fc tab. What should the fold change be? How variable are the fold changes across the colors and the partner_sets?
- Now divide all M and Ms by the total number of M and Ms you collected in your cup. Do fold change again. What should the fold change be? How variable are the fold changes across the colors and the partner_sets?
- Copy one of the two bags (I used frozen) into a new tab on the spead sheet and do fold change on 1/2 the group vs the other 1/2. Is it near 1? (do this with both sample size corrected data and not correced data.) Why does sample size correction change the numbers. Should we correct for sample size?
- Two sample comparison
- Now have each person get a partner with a different bag. Calcuate fc. Calcuate fc while correcting for sample size. What is happened?
- Now divide by marshmellows instead of sample size. Does it give a different answer.
- Now caluate the final values across the class, not correcting, correcting for sample size, and correcting for marshmellows. Which is most acurate and why?
- Oddness in "skittles". Skittles represent reads in our fastq that we don't know what to do with.
- What are they: Skittles can be multipke mapping reads. What do we do with multipily mapping reads? Sometimes we put them in multipe places, some times we isgnore them. Skittles can also be reads from contamination. Again, what do we do with them.
- Answer, is there is no real right answer. We just have to make a choise and home they are in both of our samples, and therefore the "noise" is not that important.