Introduction to Bayesian Statistics - A Beginner's Guide – YouTube Dictation Transcript & Vocabulary
Bienvenido a FluentDictation, tu mejor sitio de dictado de YouTube para practicar inglés. Domina este vídeo de nivel C1 con nuestra transcripción interactiva y herramientas de shadowing. Hemos dividido "Introduction to Bayesian Statistics - A Beginner's Guide" en segmentos pequeños, perfectos para ejercicios de dictado y mejora de pronunciación. Lee la transcripción anotada, aprende vocabulario esencial y mejora tu comprensión auditiva. 👉 Comenzar dictado
Únete a miles de estudiantes que usan nuestra herramienta de dictado de YouTube para mejorar su comprensión auditiva y escritura en inglés.

📺 Click to play this educational video. Best viewed with captions enabled for dictation practice.
Transcripción interactiva y destacados
1.hello and welcome to this introduction to bayesian statistics my name is woody and together in this course we're going to go on a journey exploring bayesian statistics from scratch this course is for anyone interested in probability and statistics whether you're a student who wants to know more about probability theory or if you're interested in using this professionally as a data scientist or a programmer we're going to begin by asking some pretty profound questions like what probability even means we'll think about it in the context of simple events like dice rolls but also one-off events like horse races we'll then look at conditional probability and how it can help us solve some pretty unintuitive problems we'll use things like tree diagrams and also consider distributions like the normal distribution we then tackle bayes theorem head on trying to keep things as visual colorful and intuitive as possible throughout finally we have a look at the puzzle that kicked off this whole thing 250 years ago when thomas bayes proposed a puzzle involving a billiards table i really hope you enjoy this course and it helps you understand bayesian probability deeply and intuitively and that it helps you understand the world a little bit better hello and welcome to this first lesson on bayesian statistics we're beginning with what might seem like an obvious question what is probability now it's worth remembering that probability is actually a relatively new area in mathematics it was only really thought about in the 17th century in the way that we think about it now so what i want to do in this lesson is how we can make sense of what it means for example to say the probability of say rolling a dice and rolling a six is one and six so we're going to think about two different ways that you might characterize this and only one of them is really going to be compatible with what we want to achieve in this course so the first way we could think about this is from an objective standpoint so why for example do we say that the probability of rolling a six on a dice is one-sixth well one way to characterize this is that we all think it's one-sixth on a fair dice and that's because this reflects some objective reality now one way to cash this out is from a frequentist standpoint let me explain what that means let's say we're considering dice you might roll the dice once twice a million times maybe even an infinite number of times what you can then do is you could take a random sample maybe even an infinite sample from those roles and you could say what proportion of those did a six land face up and we know that as the number of rolls tends to infinity the proportion is going to tend towards one-sixth so one way to explain what the probability is is in terms of frequencies we would say if the event was to happen infinitely many times what would the proportion be of the thing we're interested in okay so what could be a different point of view well a different way to think about it is a subjective approach which is to say that each person has their own reasons for believing a certain probability so this subjective stance is the one that's going to form the foundation for what's going to become our bayesian model so which is it then is it the subjective bayesians or is it the objective frequentists i think a lot of people's intuition at first is to say that the subjective one seems a little bit too wishy-washy a little bit not set in stone enough so as you can probably guess since you've signed up to do a course in bayesian statistics i think there are some very good reasons for taking a subjective bayesian model it's not subjective in the sense that you can think whatever you want but it's objective in the sense that each person might have a different answer to the question now i'm going to give you two reasons for thinking this one is to attack the frequentist position show that it doesn't work in all cases and then a different situation i'm going to show you a great reason for thinking the bayesian subjective model is actually really good so this frequent test model is perfectly good when we're thinking about dice no problem we could imagine rolling the dice many times we could at least hypothesize wrong infinitely many times but what about when we've got a specific one-off event take for example a horse race let's say it's the grand national and uh here's our horse and if you'll excuse the pond let's say his name is bay's camp so what's the probability that base camp is going to win it's the sort of thing that anyone who's going to bet on the horse race wants to know it's the sort of thing that statisticians want to know now there are some pretty good reasons for thinking that frequentist account doesn't really work here so here's what the frequentist is going to have to say about this horse race we're going to have to hypothesize uh an infinite set of horse races so we're going to have to imagine this single horse race happening infinitely many times now from this we're going to have to draw out a random sample of these horse races just like we might consider a random sample of dice being thrown and maybe this sample is infinite and then what we're going to do with this is we're going to count how many times did base camp win [Music] and we're going to divide that by how many races there were and whatever number that is that's going to tell us the probability that bayes camp will win the grand national now a lot of people think there's quite a lot wrong with this model for two reasons the first is what does it even mean to imagine an infinite set of a single event like the grand national happening it's not something that can happen more than once it's pure to speculate all of these races and then what does it even mean to draw a random sample from a hypothetical set so it seems like there are some pretty good reasons for thinking that the frequentist model is no good in situations that are just one-off events but let's look at some reasons why the bayesian model might actually be really good so suppose i told you that my partner was expecting a baby and we didn't know if it was going to be a boy or a girl and just for the sake of simplicity here we're going to ignore some biology and assume that everyone is either male or female so what can we say well here's alan and he's my uncle now alan thinks that the probability that it's the child will be male is 50 now he has no reason to think it should be male should be female so he's going with 50 so that seems a pretty reasonable thing for him to say would we say that this probability is correct i think a lot of people would now here's our friend anna now anna is a doctor and anna has just written an article for the world health organization showing that in my area 51.2 percent of all children born are male so is anna's view more correct than alan she certainly has more information but is it fair to say that she's right and he's wrong it would be unfair i think on alan to say this anna has more information and no doubt if alan read her article he might change his mind but as things stand it would be irrational for alan to say 51.2 because he has no reason for thinking that the probability is higher than 50 percent now let's throw another character into the mix here is sarah now sarah is our midwife and sarah has just performed a scan now the scan isn't 100 accurate but it looks to her like it's male she thinks the probability the child is male is 95 so what do we say now is it right to say that sarah's probability is correct and anna and alan's is incorrect most people's intuition is that this is not the right thing to say sarah certainly has more information but if alan was to suddenly say no i think it's 95 percent likely the child will be male without that information that would be irrational so the bayesian approach is to say that actually all three might be correct and really the probabilities represent each individual's degree of belief or a measure of their uncertainty that's the sense in which probability is subjective it's not that you can think whatever you want so sarah can't look at her scan and say she still thinks it's 50 that would be irrational alan can't in the absence of any other evidence say he thinks it's 95 likely to be male that would be irrational as well each person has a certain amount of evidence which forms their degree of belief or the measure of their uncertainty so that's the bayesian approach to this so degrees of belief are going to form the underpinning of everything that we're looking at in this course i hope you found that first lesson interesting and i hope it gave you a framework with which to think about probability for the rest of the course see you in the next lesson we're now going to look at conditional probability and we're going to begin this by looking at a really simple example but one that helps build some intuitions which will help us build up to looking at bayes theorem in full later on in the course so let's start with a super simple and kind of silly case so we've got these 12 characters here and there are loads of different probability questions we could ask if we were thinking about randomly one of them so we could categorize these guys in loads of different ways but one way we could do it is according to whether they're a dog and whether they wear glasses so that guy in the middle is the only one who is both who both a dog and wears glasses and we've got some dogs who don't wear glasses and some non-dogs who do wear glasses and so on now that we've got this visualized we can ask some probability questions about it ranging from the really really simple such as what's the probability that a randomly selected character is a dog and there's no tricks here really is just there are three dogs out of 12 characters so 3 out of 12 to a slightly more complex question such as what is the probability that a randomly selected character is a dog given that they wear glasses now for this we need to think about conditionality so the condition here is that the character is wearing glasses which means we're going to restrict ourselves to only thinking about these five guys who wear glasses so we're zooming in on that ignoring everything else so given that they wear glasses means we are looking at a probability out of five so there are only five characters who wear glasses and the question is how many of them are dogs and the answer is of course one so one out of five is the probability that a character is a dog given that they wear glasses okay so far so simple um how can we formalize what we've just done and make a general rule about it well in this example we're just counting and really we want our rule to relate to probabilities and that's what we're going to practice actually in the next video but for now we can say that the one guy in the middle that guy who represents being a dog and wearing glasses that region is the can represent the probability of a character being a dog and wearing glasses so that region there can represent that probability so that's the probability of dog and glasses now the whole uh the whole region is obviously the probability of a character wearing glasses um since it was that was what was the condition so formalized in this way we've now got it wrapped up in terms of probabilities which is going to be useful for us later on now i'll call this for now baby bay's theorem it's most of the way to formulating the full base theorem but it's not quite there yet but it's really useful we can already solve some pretty complicated stuff with it um just a little note on the letters that we use typically i'll be using h and e rather than you know dog and glasses and d and g and the reason for that is that we're often thinking about the probability that a hypothesis is true given some evidence so in general we're going to be using h's and e's in most cases so now that we've derived this baby bayes theorem let's apply it directly to an example that uses probabilities and not just counting it's a pretty simple example and i do you to pause the video and try to solve it yourself just to check you understand what's going on here all right so in order to work this out we're trying to work out the probability of h given e so it's often easiest to begin in these cases with the probability of uh of e so what's been given so um what is the probability of e well it's not just 0.2 it's the entire bubble so it's everything included in it so 0.2 plus 0.15 so we've got 0.35 here and the probability of h and e well that's just this intersection here so that's 0.15 there's our answer and you can do this in your head if you want or you can use a calculator but it's 3 7
2.so there's a pretty simple example of what's the probability of h given e when we've got the information to us as probabilities in a venn diagram now let's take a look at a much harder example here so in this one the challenge at the beginning is to actually just visualize this which will involve drawing a venn diagram for ourselves so here's the question amira and jane are sometimes late for school seventy percent of the time neither of them is late amira is late 20 of the time while jane is late 25 of the time last monday jane was late find the probability that amira was late all right so uh the first thing to do is just give a bit of terminology or use a bit of terminology to formalize what's what we've actually been told so okay what's our evidence here well our evidence is that last monday jane was late so let's say that um e for us is the represents jane jane being late so jane is late and our hypothesis the thing that we are interested in testing is whether amira is late so h is going to be a mirror here's late all right let's now try to write down using this notation some of the information we've been given so 70 of the time neither of them is late so there are actually different ways to write down that something hasn't happened um you will often see for example if we wanted to write that jane isn't late um you'll sometimes see it written as e prime or e dash like that you'll sometimes see it written as ec that stands for the complement you'll sometimes see a sort of hook e like this i'm going to use this one the e dash so what we've been told is the probability that jane isn't late so e dash and amira isn't late so that's h dash is 0.7 and we've also been told that the probability that a mirror is late so that's the probability of h is 20 so that's 0.2 and we've been told that the probability that jane is late so the probability of e is 0.25 now can we represent this all on a venn diagram so really this is the main challenge so so this bubble here can represent h so that's amira being late and this bubble here can represent e so that's our evidence and that's jane being late now the first thing we can do is put a 0.7 on the outside because that's the condition that neither of them are late next what we can what can we say well i know that the entirety of this h bubble is 0.2 because the probability that a mirror is late is 0.2 so this entire bubble here is got to sum to 0.2 so if all of this bubble sums to 0.2 and we've got 0.7 here the only part unaccounted for is this section here now if i've got 0.2 here let's just keep track of this and 0.7 on the outside that takes us to 0.9 so the only unaccounted for region is this and it's got to be 0.1 because all of the probabilities need to sum to 1
3.well i now can work out the rest quite easily because i know the probability of e is 0.25 now i've got 0.1 here so there is 0.15 remaining and i know that probability of h is 0.2 now i've got 0.15 here and so 0.05 remains okay so there's our picture we've got the the information represented in the venn diagram we're now in a position to actually go go for the main question so what is the probability that a mirror is late h given the information that jane was late given our evidence that jane was late well using our baby bayes theorem it's the probability is the intersection so that's the probability that they are both late which we now know 0.1 divided by the probability of e divided by our evidence so that's the probability that jane is late and that's all of this here so that's 0.25 0.2 and that's going to simplify down to three fifths or alternatively 60 so we now know that um when jane is late amira has a 60 chance of being late so most of the time she's late 20 or rather uh in general her probability of being late is 20 but if jane is late there is a 60 chance that amir will be late as well so one way to think about that is that jane being late if we learn that jane is late we update our idea about what the probability that emira is late in fact jane being late seems to make amira more likely to be late and it's that kind of thinking which is going to help us sort out bayes theorem later on this idea of updating our probability given some new evidence let's now have a look at an example where we apply our theorem to examples that can't easily be solved using simple venn diagrams so here's a question tanya loves to play tennis but especially so when the weather is good when it is sunny the probability that she plays tennis is 80 when it is not sunny the probability is just 35 there is a 60 chance that it is sunny on any given day last saturday she played tennis what is the probability that it was sunny last saturday okay so obviously we think that there is a 60 chance that it is sunny on any given day normally but we've got some new evidence which is that she played tennis and that's going to be relevant and could lead us to update our opinion of what the probability was let's try to model this let's first decide what's our hypothesis and what's our evidence so the hypothesis the thing that we are investigating is whether or not it was sunny last saturday our evidence is that tanya played tennis last saturday so can we represent this on a tree diagram well either it was sunny last saturday or it wasn't now the probability that it was sunny is sixty percent that's uh before we know any other evidence so this this diagram is going to represent a full picture of what could happen well therefore there's a 40 chance or 0.4 probability that it was not sunny last saturday now if it was sunny last saturday we know that she might have played tennis we're calling that e with probability 80 because when it's sunny there's an 80 chance that she plays tennis there's also a 20 chance therefore that she does not play tennis so e dash um and we could also look at what about when it was not sunny so the branch below well she might have played tennis the probability of that is 35 percent and that means the probability that she didn't play tennis that day is 65 now that we've got our tree diagram we can apply our baby bayes theorem so let's try to work out the numerator first so probability of um h and e so that's the probability that it was sunny last saturday and she played tennis well hopefully you remember how to work with tree diagrams we're just multiplying the branches the probability that it was sunny and she played tennis .6 times 0.8 so 0.48 now what about the numerator it's a little bit trickier to work out here what's the probability of our evidence in general so what's the probability that she just played tennis well if she played tennis either it was uh sunny and she played tennis and we already know the answer to that or it might not have been sunny and she played tennis which i'm representing by the blue branch of that tree so that's the probability that e is true that's the probability that she played tennis either it was sunny and she played tennis or it wasn't sunny and she played tennis so we can calculate these sum them together and we get 0.62 as our probability that she played tennis right we now have the numerator and the denominator so it's kind of easy to work it out from here we'll just plug it into our formula above so 0.48 is our probability that it was sunny and she played tennis and .62 is our probability that she played tennis and therefore there was a 77 chance that it was sunny given that she played tennis so once again you can see that this theorem has helped us update our idea of what the probability is we think in general that there is a 60 chance that it's sunny but if you learn that tanya played tennis on a certain day you can update your idea about what the probability is it's now 77 likely that she played uh sorry that it was sunny that day so that's what bayes theorem is ultimately going to do it's going to help us update our idea of what a probability is given some information let's now look at an example where you can put this into action so do pause the video and try this before me if you're feeling confident with it alright so what's the information given here well during a during recessions there is a 40 chance that tom will lose his job otherwise there is a five percent chance on any given year there is a 10 chance of a recession and last year tom lost his job find the probability that there was a recession so uh as usual let's just kind of formalize this by saying the thing we're investigating our hypothesis is was there a recession so h is that there was a recession [Music] and our evidence that we can bring to bear on this is that tom lost his job okay so let's try to represent this so we know that on any given year normally either there is a recession or there's not so either there is a recession and the chance of that is 10 so 0.1 or there is not a recession which is 0.9 so that's like our um that's our sense of what it is before we have any evidence it's actually called a prior it's like our prior uh thinking of the more on that later um now if there is a recession so if that hypothesis is true then the probability that tom loses his job is 40 now if there is a recession there is therefore a 60 chance that tom does not lose his job if there isn't a recession well ah not use loser's job now if there isn't a recession then tom might lose his job but it's very unlikely 0.0 and if there isn't a recession we therefore know that there's a 95 chance that he will lose his job okay so we're looking for the probability that uh h is true there was a recession and he lost his job divided by the probability that he lost his job okay so probability that there was a recession and he lost his job that simple that's just this branch here so that's 0.1 times 0.4 and the probability of e so that's the probability he lost his job so that's either that there was a recession and he lost his job or that there wasn't a recession and he lost his job so that is gonna be that there was a recession and he lost his job plus the probability that there was not a recession and he lost his job now we can fill in all of these values so up here it's zero point zero and down here it's 0.04 plus 0.045 so in total we have 0.04 divided by 0.085 and our final probability depending well there are see lots of different ways you could write this down but let's just express it in one go as a percentage we get 47 percent okay so that's quite a big jump isn't it so last year we needed to find the probability that there was a recession well we know that there's normally a 10 chance of a recession so this is an unlikely event but if you've learned that tom has lost his job and you haven't learned anything else well in that case you know that you've got to update your idea of what the probability is and it's now 47 likely that there was a recession last year let's have a look at how what we've learned about conditional probability applies to normal distributions all right so here's a little girl who loves to run and let's say that for girls her age the time it takes them to run 100 meters has a normal distribution with these properties now don't worry if you don't know about normal distributions the terminal the terminology looks a little strange but all it means is that the average time is 15.5 seconds and that 1.2 squared mip that all that means is that most girls will run it within 1.2 seconds of that average now here's the question or an initial question that we want to answer what's the probability that this girl runs 100 meters in less than 13.5 seconds well to answer that we we just use a computer or a calculator and it turns out at 0.048 so 4.8 percent and that's modeled by areas under the curves but suppose that we had some more information so suppose that we also knew that to get into the school running team you need to be able to run 100 meters in less than 14 seconds okay and let's say that we also know that this girl whose name is sophia is in the school running team so we've got some new evidence to bear we know that in general the probability that she can run uh 100 meters in less than 13.5 seconds is about 4.8 percent how can we change that if we know she's in the school running team well we can use our baby bass theorem so our hypothesis is that sophia runs 100 meters in less than 13.5 seconds and our evidence is that she is in the school running team so let's see how this is going to work so we're looking for the probability of h given e so the probability that our hypothesis is true given the evidence and we've got our formula for this so can we investigate these things well the numerator is this slightly weird expression it's the probability that she's runs 100 meters in less than 13.5 seconds and that she's in the school running team but being in the school running team means that she can run it in less than 14 seconds because that's the condition to be in the school running team so both of those have to be true and we're dividing by the probability that she's in the school running team which is just the probability that she can run it in a time that's less than 14 seconds okay this numerator needs sorting out what does it mean for her time to be less than 13.5 and less than 14 well if her time is less than 13.5 and less than 14 it's just less than 13.5 there's there's nothing to say there if it's less than 13.5 then we already know it's less than 40
4.we don't need to bother with that and we're dividing by that by the probability that it's less than 14
5.now with probabilities representing represented as areas with normal distributions we can say that numerator is this area the probability that her time is less than 14 seconds which you can compute using calculator or computer as just over 10 and the probability that she can run less than uh 13.5 seconds is our 4.8 from before now do that calculation and you get around 45 percent so again the evidence made us update our thoughts about whether sofia can run 100 meters in less than 13.5 seconds we thought it was about 4.8 percent before now that we know that she's in the school running team we've got some evidence which forces us to update our view and we've now got to say that the probability that she can run in less than 13.5 seconds is 45 percent we're now going to have a look at some really counter-intuitive results that can follow from a good understanding from conditional probability so let's think about iq now iq is famously normally distributed it's got a mean of a hundred so that's the average iq score and standard deviation of 15 meaning that most people have an iq within 15 points of a hundred now uh let's say that to be count as a genius you need an iq of above 171 and i picked this because it actually turns out that's roughly one in a million so other people might have some other definition of a genius but let's just say that this kind of one in a million iq above 171 that's the definition of a genius now suppose there was a pill that you could take and this pill um all it did was it shifted the average iq for those who took it up by two points so a tiny little improvement in average iq for the people who take the pill could we answer this question how much more likely is it that your child will be a genius if they take this pill i think most people's intuition is it's not going to have a particularly big effect on the chance adding two points on average to their chance of being a genius feels like it's not going to have a significant effect on making them more likely to be a genius it's not going to hurt but it seems like it won't help too much let's investigate this claim so what can we say so far well we could imagine conducting uh an experiment an immoral experiment but it's a thought experiment so don't worry we're going to take a large group of children and we're going to give half of them the pill and this red pill uh is gonna shift the average iq uh for those children up by two points we'll let them grow up and then we will just see which of them turned out to be geniuses this this thought experiment will help us answer this question all right so let's investigate so here's the conditional probability claim so what is the probability that they took the pill given that they are a genius that's actually the question we want to our to answer so we see given their genius so we look at all the kids who turn out to grow up to be geniuses and we say how many of them actually took this red pill all right so let's use our baby bass theorem to answer this question our hypothesis is that they took the red pill as a child that's the thing we're interested in and our evidence is that they are a genius all right let's work through the numbers now oh by the way i should say that i normally like to represent these things graphically you know someone pointing out where you need to be to be a genius but to be honest it is so far up that way that we're just not going to bother with uh graphical representations because the regions are just too small all right let's work through the numbers all right so we know that we what we need to do first is we need to find the probability of h and e so that's the probability that they took the red pill and they became a genius and then we're going to divide that by the probability that they are a genius in general okay so the probability they took the red pill and were genius well half of the kids were given that red pill and then of those who are given the red pill what's the probability that they became a genius well remember the distribution for those guys so our kind of iq plus that had a normal distribution with a mean of 102 and a standard deviation of 15
💡 Tap the highlighted words to see definitions and examples
Vocabulario clave (CEFR C1)
presented
B1To bring (someone) into the presence of (a person); to introduce formally.
Example:
"what's the probability of h given e when we've got the information presented"
establish
B2To make stable or firm; to confirm.
Example:
"establish how we can make sense of what it means for example to say the probability of say rolling a dice and"
imagination
B2The image-making power of the mind; the act of mentally creating or reproducing an object not previously perceived; the ability to create such images.
Example:
"imagination to speculate all of these races and then what does it even mean to draw"
accurately
B1In an accurate manner; exactly; precisely; without error or defect.
Example:
"accurately so here's what the frequencies would do the frequentist would have no problem answering this question what's the"
selecting
B1To choose one or more elements of a set, especially a set of options.
Example:
"and there are loads of different probability questions we could ask if we were thinking about randomly selecting"
encourage
B1To mentally support; to motivate, give courage, hope or spirit.
Example:
"encourage you to pause the video and try to solve it yourself just to check you understand what's going on here"
significantly
B2In a significant manner or to a significant extent.
Example:
"significantly update our thoughts about whether sofia can run 100 meters in less than 13.5 seconds we thought it was"
fractionally
B2A B2-level word commonly used in this context.
Example:
"remember that was one in a million um actually i lied it's just fractionally"
background
B1One's social heritage, or previous life; what one did in the past.
Example:
"being a genius from taking this pill we now have enough background"
previously
B1(with present-tense constructions) First; beforehand, in advance.
Example:
"previously that was your prior understanding but now you've had the test which tells"
Palabra | CEFR | Definición |
---|---|---|
presented | B1 | To bring (someone) into the presence of (a person); to introduce formally. |
establish | B2 | To make stable or firm; to confirm. |
imagination | B2 | The image-making power of the mind; the act of mentally creating or reproducing an object not previously perceived; the ability to create such images. |
accurately | B1 | In an accurate manner; exactly; precisely; without error or defect. |
selecting | B1 | To choose one or more elements of a set, especially a set of options. |
encourage | B1 | To mentally support; to motivate, give courage, hope or spirit. |
significantly | B2 | In a significant manner or to a significant extent. |
fractionally | B2 | A B2-level word commonly used in this context. |
background | B1 | One's social heritage, or previous life; what one did in the past. |
previously | B1 | (with present-tense constructions) First; beforehand, in advance. |
¿Quieres más ejercicios de dictado de YouTube? Visita nuestra plataforma de práctica.
¿Quieres traducir varios idiomas a la vez? Visita nuestraWant to translate multiple languages at once? Visit our Traductor multilenguaje.
Consejos de gramática y pronunciación para dictado
Chunking
Observa las pausas del hablante después de ciertas frases para facilitar la comprensión.
Linking
Escucha el habla conectada cuando las palabras se unen.
Intonation
Presta atención a los cambios de entonación que destacan información importante.
Análisis de dificultad y estadísticas del vídeo
Recursos de dictado descargables
Download Study Materials
Download these resources to practice offline. The transcript helps with reading comprehension, SRT subtitles work with video players, and the vocabulary list is perfect for flashcard apps.
Ready to practice?
Start your dictation practice now with this video and improve your English listening skills.