Support us

Connect

Google system design interview: Design TikTok (with ex-Google EM) – YouTube Dictation Transcript & Vocabulary

Bienvenido a FluentDictation, tu mejor sitio de dictado de YouTube para practicar inglés. Domina este vídeo de nivel B2 con nuestra transcripción interactiva y herramientas de shadowing. Hemos dividido "Google system design interview: Design TikTok (with ex-Google EM)" en segmentos pequeños, perfectos para ejercicios de dictado y mejora de pronunciación. Lee la transcripción anotada, aprende vocabulario esencial y mejora tu comprensión auditiva. 👉 Comenzar dictado

Únete a miles de estudiantes que usan nuestra herramienta de dictado de YouTube para mejorar su comprensión auditiva y escritura en inglés.

📺 Click to play this educational video. Best viewed with captions enabled for dictation practice.

Transcripción interactiva y destacados

1.[Music] hello welcome to another system design  mock interview this is where we get one of our   top coaches to play the role of candidate and take  on a system design question for you to watch and   learn with and I'm very pleased to say that today  we have Mark with us again Mark how you doing   great thanks Tom great to have you with us uh  quickly do you want to just introduce yourself   and tell people a bit about your background  sure yeah so I am a former engineering manager   at Google mostly at Google sometime at Uber and  uh spent a lot of time on large-scale distributed   systems so that's I think what we're going to be  doing here and of course I've been doing coaching   I'm available as a coach and interview coach but  I hear the tables have turned so here we go again   all right well uh let's turn those tables  let's crack on with the question and   yeah Mark the question I want to ask you  today is how would you design TikTok

2.okay TikTok all right uh popular vertical format   video sharing application uh popular with  lots of people around the world I guess   and so uh it's kind of a social media platform and  uh pretty pretty interesting and pretty complex I   think so uh tell me a little bit about maybe uh  use cases here that we should be thinking about   yeah sure let's not spend time on the mobile  app and and content Creation in the app I'd   like you to focus on on a back-end distributed  system that supports uploads and okay that's good uh yeah because certainly the  the application is the uh is the the the most   central point of the whole thing and you could do  lots of cool things in terms of generating your   being a Creator so I'm glad we don't have to worry  about that that's also not my forte so in terms of   uh sign in and sign up and things like  that can we kind of skip that too that's   going to be pretty standard I think I think  that's that's you're right that's the less   interesting part so let's skip over that yeah  okay so you want to focus on sort of the video   uploads into the system and then streaming  the videos or consuming the videos I guess   absolutely okay so tell me a little bit  about you know how many users uh how many   videos uploaded what's the scale of this of  this thing what are we thinking about here   okay yeah well we've got uh let's say one  billion users uh in around 150 countries   okay and let's assume one billion video views  a day and I think there were I think I saw in a   recent year there were 10 billion videos uploaded  in one year so so let's use that figure as well   okay all right so let me change my little and  then go to my little diagram here it's a TikTok   system design all right okay so let me let me  capture just capture some of this stuff so make   sure I get this right here so we've got I think  you said a billion users uh uh across 150 150   you said 150 countries yeah I think you said uh a  billion videos per day that's right is that right   yes and you said uh 10 10 billion videos  uploaded uh last last year or something   is that something I say yeah yeah okay per year  all right okay uh all right that makes sense um um all right that sounds reasonable I  mean it's that's big and that's good   because we want it to be big we want  it to be a big distributed system so   if I if I think about this app though so  uploading down streaming viewing even though   we're not going to talk about the app too much  or you know we're kind of putting that aside uh   we probably should think about some uh success  metrics so would you do you have any thoughts   on success metrics do you I mean I can  think of some but do you have any yeah   I'd prefer to hear your thoughts on it yeah  give me some suggestions uh okay all right so   I mean I guess in a classic sort  of web application kind of thing   daily active users or even an application phone  app daily active users make sense I think for   TikTok the simple metric I guess would probably  be let's see so success metric something like   time and app uh so how long are people spending  in the app I met I mean I I'm not a heavy TikTok   user at all but I imagine that people use it in  different ways some maybe just scroll through the   videos some watch the entire thing and react and  share and do lots more interesting things and then   there's the creators but time and app is probably  the simplest metric to think about so maximizing   time and app and uh I think there was some I  don't know you might know this do you have a   do you know how long people spend in the app  do you have a do you have a sense of that um I think let's say like around I think  let's say the average is around an hour a day   okay does that sound okay sound about  right sure so let's let's assume that   uh okay that sounds fine here I'm gonna  try and just uh where's my trying to oops trying to see if I can out well I'll just  leave this okay so all right so time and app   okay so having had uh kids and thinking about  screen time and all this kind of stuff I can   imagine enhancing this if I weren't you know sort  of thinking more about the the the online Health   maybe maybe I'd want to even as a as a working  professional maybe I wouldn't want to maximize   the time in my app during business hours I can  imagine some future thing maybe to maximize the   time in the app outside of working hours or school  hours or something like that but that could be a   that would be a future enhancement I don't I don't  know that I would yeah for now let's just focus on   this time in app sounds sounds reasonable it's  always a good idea to narrow down the scope of   the question to make it achievable within the hour  here Mark did this well confirming that he could   just focus on TikTok's back end distributed system  rather than the TikTok app he made another good   call in suggesting we skip designing the login  process covering that very standard part of the   system might have wasted time that Mark wanted to  spend on more interesting aspects that gave them   greater opportunities to impress the interviewer  of course always confirm your suggestions with   the interviewer to make sure they're on board  with the direction you're taking right so okay   so time and time and app uh currently about  an hour per day sounds good all right so   let me see if I can make some  more detailed assumptions here some of this stuff I need to make this a little  bit smaller so I have some room here all right so   detailed assumptions I'm going to assume that so  this is vertical it's a vertical form factor video   sharing thing as I as I mentioned and typically  these videos are like you know they're was it   called 1080p but they're rotated so like 1080  pixels by 1900 pixels or something like that so   those kinds of videos the the typical I think the  video range time range for these things is Clips   is like three seconds to a minute but uh just for  sake of simplification maybe we can assume uh 10   second video on average if that's does that  seem okay yeah yeah I think that's it works   and so uh and I think I said 1080 times 1920  pixels which you know in a video uh a video for 10   seconds I think we can probably assume a megabyte  per video if that seems it seems all right uh okay   so megabyte per video let me just see if there's  any other things I'm I'm thinking about here   all right let me so with those assumptions maybe  I can do some really rough calculations here about   in terms of the the scope of this thing and and  see if let's let's test this and see expand this   a little bit to see how big this is going to get  so if we say that we've got a billion let's look   at uploads first let's look at uploads because  that translates to storage in addition to some   traffic and it looks uh looks pretty heavy so  if we say 10 billion videos per year uh then   if we do the rough math and we divide by  360 well we divide by 365 days in a year   and that comes out to approximately let's see 10  million 365

3.I'm going to say 30 million per day   uh 30 million videos upload loaded per day and  if I if I say uh actually let me let me back   it let me put that let's hold hold that thought I  mentioned storage and I want to finish the storage   calculation first so 10 billion a year times  a megabyte I think I said video per per video   so if you do a million times 10 billion what you  wind up with is so 10 billion is you basically   10 gigabytes times a thousand would be 10  terabytes times a million which would be   Omega million would be uh petabytes so this is  going to be 10 petabytes of uh of videos per year   that's just doing doing Simple Math now in reality  uh we have to factor in possibly some replication   to have redundancy also possibly we have to have  different formats for different types of devices   so you know you could I mean if we just sort of do  a super super simple calculation here I might say   so devices and replication uh times 10 so then  we'll say 100 petabytes per year of storage   requirements okay so if if that and that's a  lot that's a lot of storage but a I think that   uh these days you can you can scale so this the  reason I'm mentioning this now when I'm doing this   calculation now is because it uh helps me think  about uh systems that could support this kind of   scale and just naturally megabytes or uh videos  or blobs or objects and so I'm already leaning   towards a storage solution like a blob storage  solution of some some kind and modern solutions   out scale uh quite well in that in that way so I  I think this could that gives me sort of an idea   of what kind of uh you know video storage solution  I might I might utilize foreign if I think about   so is that does that seem reasonable so far  in terms of calculation okay yeah that seems   okay and then okay and then so now uh I I'm gonna  put here I need to clarify this just a little bit   let's let me think also about the that was  the raw video data that I was talking about   let me also just briefly talk about or do a quick  calculation on like video what I would call video   metadata which might be this might be like uh an  ID for a video when was it created how long is it   maybe a number of likes ultimately because we  want to be able to track how many people like   this video so so something like that so if I  think about that kind of a thing metadata and   I say let's say it's a kilobyte of data uh for for  metadata so now instead of 10 uh 10 petabytes it's   a factor of that's a factor of a thousand less  so we're talking about 10 terabytes of metadata   uh actually I should put metadata here video  metadata and so 10 terabytes of video metadata   and so again I'm mentioning this I'm doing this  math just to kind of get an idea of where would   I put this metadata this information that needs  to be potentially utilized for purposes of of   uh figuring out which videos to show and and  identifying the videos and so on and Counting   stats like likes and so on and so where would  I put this kind of a thing and so I probably   because these videos are not I don't need to do  I think I probably don't need to do super complex   calculations across the videos they're connected  to each other so they don't need to relate to   each other so I probably would use some sort of uh  non-relational or or no SQL storage solution for   this kind of thing this would be like key value  basically key value pair storage and there's lots   of solutions out there for that sort of thing  but that's probably where I would put this   so does that make sense so far yeah makes  sense so I initially started off by doing   a uh calculation where I was trying to figure  out the traffic the incoming traffic and I   put that aside because I wanted to look at  storage first so now let me look at traffic   so if I say that there's again 10  billion videos uploaded per year   and they divide that by 365 days in a year  then that gets me approximately 30 million   videos per day and if I then uh divide that by  the approximate approximate number of seconds   in a day which is a hundred thousand then uh  that gets me to 30 sorry 300 videos per second so we're talking about uploading 300  videos per second if you assume that   there's some unevenness during the day  and that might actually translate to   more like you know you could say a factor of  let's pick a factor of three just to make it   easy and let's just say uh a thousand per second  at Peak because the 300 per second is an average   so all right so we've got a thousand videos  uploaded for uh per per second and what we   that's that's not a huge deal because if you think  about servers that can handle uploads out in the   cloud that's that's a very very small amount if  I think about a thousand now per second uh videos   per second and I multiply that by a megabyte  each and a megabyte is approximately 10 megabits   and we I mentioned I think I mentioned that we do  Network traffic and megabits not megabytes so then   that would give me uh let's see 10 million times  a thousand would be 10 billion or 10 megabits per   second would be the the the traffic uh coming  in Ingress basically that's not a lot uh modern   network cards on on servers are 10 gigabits per  second so this is this is not a bottleneck so at   least we understand that this is not a big big  bottleneck now uh I should probably I should do   the same thing on the on the way out meaning and  I realize we're spending a lot of time on on math   here but uh what we can assume is that with the  uh I think you said a billion videos viewed per   day and we just did the calculation of 10 billion  sorry 30 30 million per day but if I say a billion   videos viewed per day and I divide that by again  a hundred thousand seconds approximately uh   then that's uh going to be let's see 100 thousand  million no sorry uh no no it's a thousand million so this is billion uh million so it's it's a  a million thousand videos per day divided by   a hundred thousand is going to give us a ten  thousand uh per second so we're viewing ten   thousand videos per second and again if we do  that uh similar math you can kind of see this   is a factor of ten of the of the other number  so instead of 10 gigabits ultimately what this   means is that we would have 100 gigabits  egress which is which is also not a lot   so okay I think we've done done with some high  level math I think these are all numbers that   can be worked with and sustained the one  other math that I didn't really do but I   could I could also do is that you know users if  there's a billion users you could think about   another profile user profile how much would  that cost to store and you could imagine that   that could be maybe 10 kilobytes or something like  that and so that would be uh maybe 10 terabytes or   or something of that nature so we could talk about  that later I think we can skip that for now it can   be useful to do some high level maths to inform  your thinking and your solution so make sure you   practice multiplying and dividing big numbers like  these as part of your interview preparation as if   you're not used to it it's very easy to get lost  in all the zeros Mark did some solid calculations   to work out the storage for the raw data and  metadata however he didn't do the Maths for the   user data storage requirements instead moving on  to make the traffic calculations and queries per   second calculations it would have been a better  idea to finish the storage calculations first as   these requirements were more likely to influence  the design as a general rule when you're working   on storage calculations finish them yeah okay  so I think we should we should probably move on so let me let me just draw a couple boxes here  some some shapes here I'm going to draw a little   uh TikTok app and then of course we're we're  going to do the typical thing I don't really   want to spend too much time on uh you know load  balancers here or you know you've got your you're   going to have some sort of uh I'm going to call  this maybe this is a TikTok uh and like an app   server or something like that uh yeah application  yeah TikTok app service this is the thing that   would run in the cloud this would be one of  my servers and there'd be multiples of these   and you might have well actually let me  see I take that I think I take that back   I think TikTok we're going to separate these two  because we you asked me to talk about uploads and   and viewing and streaming right I think those are  the two use cases that we want to spend time on   yes so there's going to be some sort of a  TikTok uh yeah like a Content uh service   I guess application Services maybe is maybe  okay maybe that's all right because it's it's   a service that generates for the application all  right so let's go ahead and leave that and then   let's go ahead and create another one which is  the TikTok upload service because we're going   to have we're going to be uploading videos and  I think I might keep these separate separate   just because they serve slightly different  purposes and you know microservices architecture   best practices you would you might separate these  things out uh and this app service when you when   I first opened my application and my my TikTok  app and I come through the load balancer and I   hit one of these app Services the very first thing  that I might see is all of the stuff the videos   Etc and the followers and all these kinds of  things but I might be able to if I click on   upload or create a new video then I might  get it redirected to this upload service   does that make sense okay yeah these are  kind of the front ends if you will to the   to the services okay okay so let's talk about uh  let's talk a little bit more about the databases so I think I would like to let's  talk about the video database first um and I'm I'm I realize I'm already making  giving myself not enough room here in this uh   in this screen so I'm going to move stuff around a  little bit make things a little bit more palatable   and interview candidates you know I would  recommend don't spend too much time on on   making things pretty but uh being able to see  stuff is good all right so uh here we go so   we've got uh this is our so the the one the first  thing I talked about was the video lob storage   and uh that's where the actual Raw videos would  go and so this has to be sort of a kind of like   a file system almost but I probably would use  something like Amazon S3 for this because that   scales really well and it's it lends itself  nicely to this kind of stuff and it uh you know   it's purpose-built for this kind of thing okay  so that would be the video of The Blob storage   and by nature of uh Amazon web services you can  nicely you can do a couple of really cool things   you can replicate the data for better availability  you can have Regional uh replicas so that it it   knows it can be sort of more aware of regions  or you can you can utilize the naming for that   and the other thing that you can do is you can  tier the storage data so we talked about I think   I said 100 petabytes of raw data that for a  year and so that can accumulate quickly and   so uh you know with TikTok videos my I have make  I'm making assumption that a video's life span is   not more than maybe a couple of months for most  videos that you don't go back and watch you know   three-year-old videos too much that it's all  about the most recent stuff the influencers   creators whatever it is which means if that's  true assuming that assumption is true then that   means that you can tier your data such that older  data or data that hasn't been touched or looked at   as often can go into Cold Storage so I'm going to  put a keyword in here which is this tiered AWS S3   so what that means is that if a video is let's  say a year old and nobody's really looked at it   it could go into really slow offline storage  which means it doesn't cost as much and it's   uh it's more cost effective versus stuff that  is most recently uploaded does that make sense   yeah makes sense okay okay so that's kind of like  a blob storage I'm going to use yet another one   of these shapes because we're talking about  two different things so then video metadata   and uh video metadata I think I mentioned  this is things so what would I think about   for video metadata actually let me draw  another box here because I think it's two   and we can talk about I want to get into that  in a little bit more but uh video meta data   uh I'll get into that later I want to talk  about the tables the type of what's actually   inside this but video metadata I think whatever  it might be I think it's going to be I think I   mentioned that this is what do we say this  is going to be 10 terabytes I think I said   so I and this is not and I said no SQL storage  uh no no SQL so it doesn't need to be relational   but it's not really blob storage because there  are some some fields in there that you might   want to might want to look at and so on so  what I probably would do is something like   uh you could do Cloud spanner although  that's more like a SQL thing or you could   do AWS you know dynamodb as an example uh  and and I'm using just to be clear here   I am biased for purposes of an initial solution  to get it off the ground to hosted Solutions like   Amazon provides because they take a whole bunch  of main a mono or and load   off of off of uh the engineering team and so as  an engineering manager I recognize that there's   a cost to the engineers that need to be do writing  code and I don't want them to reinvent Wheels like   lob storage and and uh indexed key value storage  and things like that so I'd rather use something   existing even if initially it means we have to  pay a little bit more maybe we can eventually move   that but right now this would be a really good  solution because dynamodb scales really nicely in   terms of the the rows that you number of rows you  can have and so on so these two things together   uh if I if I draw yet another box around this and  yes that's of course that's what's going to happen   uh so these two things would make up basically the  the the uh the video storage database if you want   to call it that okay so this is for the actual  videos okay good so far yeah we're good so far   okay go ahead yeah I wanted to I wanted to  ask how you think the different regions and   countries could be reflected in your design  are you are you taking them into account yeah good question so um yeah so I I think I so the nice thing about   another nice feature of some of these existing  hosted Solutions is that you can include Regional   they have Regional features and so that's not  really answering your question I think uh if   I think about it that you I can imagine having  different data centers in different locations   because you want the you want the response so if  you're in India uh you want the responsiveness   to be fast if you're in Europe you want the  responsiveness to be fast of the videos if   you're in China us same kind of thing and so  having the data local to near to where you are   can be uh super well is what you kind of you have  to do this as a global application and so one of   the ways to do that is by by partitioning the data  possibly based on region or language or things   like that and locating the data in different  places so that could be one of the aspects of a   kind of a regional you know how would regions and  countries play into this uh another one would be   kind of related to this is uh we when we get  to the serving side or the streaming side in   order to not overload these these storage systems  you really need to have and I guess I better draw   this go ahead and draw this so we have it uh some  sort of a a CDN a Content delivery Network like   or something like this and what this  does is it's basically a set of caches in uh near   the the closer close to the user that cash data  that is typically uh popular let's say so videos   that are popular in a particular region would be  in this CDN stored in the CDN and thus made to be   available faster and uh the I think the regional  the regionality might happen automatically meaning   that you probably would get videos created in  India to be more commonly located in the cdns   in India versus Europe U.S China whichever it  might be so I think that there there's some   there's definitely some Regional aspects to  this I'm not sure whether I would explicit how   explicitly I would partition the data based on a  region or language or something like that that I   don't know yet that's a good question but I think  you would get some you can do it and you would get   some benefits by having this this CDN which is  kind of necessary for this kind of kind of stuff   so I'm not sure if I answered your question yeah  more or less yeah please uh please continue okay   all right so the piece that we're missing the  piece that I haven't talked about yet and I'll   just mention briefly is well as brief as I can  uh let me draw and yet another database because   we also mentioned uh let's see this is a video  database I I mentioned kind of like a uh a user   user data I'm just going to call that and if  I I should just think about this for a moment   a billion users across 150 countries but they have   friends and followers and connections and  this is where this whole social graph kind   of comes into play that you know a social  network kind of a thing so because of those   connections and those connections may very well  span regions and countries and things like that   you uh you probably want to have this user data  not be explicitly partitioned out into separate   separate databases that can't be talked to each  other so I I probably would want to put this into   some sort of a SQL storage solution and I still  would want it to scale so I might use if I'm if   I'm going to stick with Amazon here Amazon RDS  relational data data store allows you to use SQL   basically use SQL solutions to to store your data  and I'm I'm hand waving a little bit about the   amount of data here I think at a billion users  and a kilobyte gig terabyte that's a that's is   that right a billion is a gig uh kilobyte is  a terabyte and if 10 a kilobyte might be small   depending on what's in there but let's say it's a  kilobyte so a terabyte of data that fits you can   do that with with modern uh SQL Solutions that's  possible but we're starting to get into some   some bottlenecks there potentially so that might  be something we need to look out for but I'm going   to hand wave over that for the moment and say  that we can store the user data in a relational   database oh something like Google Cloud spanner  actually scales you know much more broadly than   that and gives you some relational features  so you could use you could use that too but   anyway so user data so that that would be that's  another big aspect of this so uh if I talk about   if I talk about the upload path and I don't know  if I need to draw arrows here I probably should   probably good idea to draw some arrows uh you know  the the the upload that you know you you the app   uploads a video let's say I create a video and  by the way I'm again I'm glad that you mentioned   that we don't have to worry about the app because  there's so many cool features in The TikTok app   how to create a video I mean you've there's  there's this uh duet and uh a feature and   there's there's there's you know you can clip  and trim and you can bring in there's filters   there's all sorts of cool features but once you  have the video now you you know upload it and so   the TikTok app would upload that and send that  to the upload service and then that that has to   go from there to two places it really needs  to go to the uh to the The Blob storage whoops   it needs to go to the blob storage and it needs to  go to the to the metadata store because you need   to be able to to to add the information about  the video uh so that you can you can figure   out can it be shown does it make sense to show to  people there's a yeah so so that's kind of that's   roughly the flow of an up that's a very super  simplified flow of an upload but what I probably   should think about for just a moment is uh what's  in that data but I I can I can also talk about the   let me think about this for a moment what would  be better to talk about let me finish the floats   let me let me talk about the uh the video uh  streaming flow if that's okay yeah go for it because now I'm gonna get this is going to get  much more complicated I think because I think   there's some magic that I've completely hand  waved over with with that TikTok does and I'm   going to just draw a box here and I'm going  to call this uh I think the feed is called   for you so it's a curated customized feed that  you get it's the video stream just for you uh   so I'm going to call this the for you generator  maybe I'll call it that I I think that's yeah and   so let me see if I can I can figure this out  so if I'm when I load my TikTok when I first go   into my TikTok app I'm already seeing a bunch of  information like friends who who I should follow   uh I don't know maybe some explore some some  options in the in my in my app but the most   important thing and the center piece of all is  I'm already seeing videos so I there's there's no   lag there's no clicking on things you I'm already  as soon as a video is in focus it starts playing   and so that means that my app is actually uh  getting multiple has already gotten multiple   videos by the time I start the app so let's see  here if I so if I go to the app service the one   of the first things that I think the app service  has to or one of the important things I should say   that the app service has to do is it has to talk  to this generator and by the way this generator I   this is also going to have to be multiples there's  got to be lots of these because we're scaling   but this this app service needs to needs to be  able to talk to some back-end uh algorithm that   says what should I show the user what are the cool  videos that I what's this for you list that I need   to show and I think that this is actually rather  complicated and sophisticated and if I were doing   this uh today I would probably employ some sort of  machine learning system to figure out what to show   the user because there's just too many variables  to try and build a rule-based system a handcrafted   you know if if this this many likes then show  to them if they're you know I think that that   would get out of hand pretty quickly and I think  would be really hard code to maintain so I think I   would I would I would think about a an algorithmic  solution to this a sort of a either neural network   or some sort of a you know a similar similar  type of learned online learning solution and so   machine learning solution I guess is  that is really that's the over overview   uh and so this TikTok app service needs to  contact this and then the question is what is   this thing doing how is it actually figuring out  what these what these videos are that it needs   to show because ultimately let's assume it's a  black box for now and it does Something Magic   ultimately what it's going to do is it's  going to return back to this application   service a set of let me draw do a little text  box here uh and I'm going to make this small   so I can fit stuff but uh this is going to be  sort of among other things a list of videos   uh sort of probably actually video IDs so yes it's  going to return back the list of users it's going   to return back you know uh the user profile maybe  some some other types of things but I think the   most important thing that's going to return back  is it's going to return a list of video IDs that   the application ultimately should show to the user  and so uh good good does that make sense so far okay so the application contacts talks you know  reaches the service the service uh builds up   what it needs what needs to be shown in terms of  content but but the most important thing is that   it gets a list of video IDs to show that are the  the uh the best videos to show for this user the   for you list of videos and actually maybe that's  I could be super explicit about this for you video   IDs if I really want to be explicit okay so then  what happens with that so the the application   service I think would wind up using uh oh sorry  I'm trying to draw another arrow here okay so   the application service I think would utilize  the would get the from those video IDs would   get metadata information about the videos how long  they are what is the ID the URL for where to find   this this this video so that I actually can go and  fetch it maybe some other things like who created   the the video how many likes it has things like  that so this this app service would go straight to   this database to get that uh metadata information  so that it could pass that on to the application   okay and that again I'm going to go and uh you  know draw too many arrows here probably don't need   that many but it would return turn that list so  this would be the metadata now the key the reason   I'm drawing this is because the metadata is only  one aspect of course the biggest part of this is   that you actually want the actual videos to be  to be returned to the to the application so how   is that happening and so that that's really  happening in a kind of a direct way so oops   need to drum better arrows here so from the  uh the application is going to get back a   as part of the metadata the list of videos so  that let me see if I can draw this better here   or give a little bit of a I'm going to put a  I need to put somewhere put some text in here   and I apologize this is again kind  of small here but this is video uh uh   info or maybe I'll call I'm going to call this  video URLs these are like the The Blob the IDS   of where to fetch the the videos because these  are actual like https or some sort of URLs and   I am also hand waving over authentication here  by the way so you know we're assuming https this   is you know secure login et cetera Etc so  there's a lot of hand waving going on here   so this app service among other things returns  these video URLs and then the TikTok app for each   of those app video URLs goes and fetches them and  it fetches them basically by just going to those   URLs and it turns out that that URL is going to  land uh let me draw another little uh shape here   to make this super super explicit pretend  there's a little Cloud magic Cloud but the   TikTok application is just going to go go to  the cloud and specify this URL get me this URL   that will hit a CDN a Content delivery Network  now if that content delivery network has the   video because it's a popular video uh in its in  its local cache then it it will just go ahead   and uh and deliver that and that's just how  cdns work this is nothing special I'm not you   know I'm not trying to recreate uh it meant  the wheel here either so if this if the CDN   has it it'll just serve it and it'll be fast  and for popular videos that's super fast if   not then it'll wind up going to actually going  to the back end storage Service uh like S3 and   going and fetching it and then delivering it and  then it'll decide based on whatever its logic is   whether to store it it's cash or not and that's  I'm going to again hand wave over that so uh the   the the key thing I wanted to just be clear about  here is that from a video serving or perspective   that the application there's a two two parts  to it there's the actual video data which goes   through the CDN and the blobs and then there's  the metadata which will which would come through   the uh the application service and all of the  content that it returns back to the app here's   here's everything you need to show that's small  and and structured data that you need to to make   your make your phone application look look good  so I'm going to stop there does that make sense   that makes sense Mark yep marked her a good  high-level diagram he did an excellent job   talking the interviewer through the upload and  download flow and then for the video streaming   flow 2

4.you might have noticed that Mark Drew  lots of arrows to illustrate the flow of data   you don't have to do this talking it through  with the interviewer is sufficient if you   prefer don't feel that your diagram has to be  totally self-explanatory that said it's really   important to be with the diagram  drawing tool that you're using so that you can   use it to help and enhance your explanation rather  than having to struggle with it Mark also covered   the download flow for video streaming and how the  four you feed is generated algorithmically even   though he did this in a very simplified way it was  important to touch on because its key to upmates   TikTok unique this way he prepared the ground to  drill down into it later in the interview alright   so if that seems good so far then there's two  things I'd like to do if we still have time   one of them is I'd like to just talk a little bit  more about the the data that I think that's the   schema of the data what are the elements of the  different the two different bits of data metadata   really the video metadata and the user metadata I  want to talk about those because I think that they   they are related to the second thing I want to  talk about which is a little bit more thinking   around how I might do this this generator if I  were to treat it a little bit less as a black   box and try to dive a little bit deeper although  that's not my really my forte but I but I want   to try only kind of take an attempt make an  attempt at that so if that's okay then then   that's that's kind of where I'd like to go from  here I'd love to see you drill down in Sedalia so let's talk about the video metadata  uh what's in this video metadata so   I think there's so I mean there's there's standard  stuff that I don't know how much of this I want   to spend time on but but I'll but I'll just kind  of mention it so uh there's there's going to be   a video ID which is just a unique identifier for  the video there's going to be also a video uh URL   which is this thing that I mentioned up here  that you would go and fetch and I'm going to   uh I'm gonna mention this but I'm not going  to write it just because I don't want to use   the screens this the space here I mentioned  briefly earlier that you might want different   depending on you know for a particular  video so during the upload phase and possibly   offline asynchronously uh if we want to store  differently encoded forms of the same video   then we probably need to have different URLs or  different blobs for those things and so we might   have the original raw uploaded video and we might  have a a you know iPhone optimized video when we   might have a Mac Book in the windows and a you  know something like that so it's so that means   that you might have video URLs instead of as a  list instead of just a single one but I'm going   to hand wave over that for now and just say that  there's one video URL and a pardon the capital   oops pardon the capitalization I think that's  just the the way the this this Auto completes   here okay so video ID video URL what else do  we want to have in here maybe like a creation   when was this thing created uh oh  probably uh like a Creator ID who created this   video who uploaded this video right we want to  know who it is we probably want a uh I'm gonna   say like likes maybe we want the number of of  likes or maybe it's more generically would be   sort of reactions or something like that on this  video like how many you know how many people liked   it how many people didn't like it Etc things  like that and there's something else that I'm   gonna that I want to put in here and uh but I'm  gonna put just a placeholder in here for now uh   because I want to get back to it which is related  to the the uh algo uh features I'm just gonna call   this for now and and let's just leave it at that  so so I'll put a pin in that and then get back to   it so this these are some of the things I could  think of for this video uh there might oh there   might be something like length or duration of  the video you know something like that right   some other metadata so that that's what I would  think about for the video metadata and uh uh let   me now talk a little bit about what I think would  be in the user data so I'm going to move make   some space here copy this put this up here and  apologies for the type tight spacing here but uh uh all right so user metadata so what are the  things I could think about so there's the typical   the typical stuff about users like a user ID  some you know credentials uh you know login   type of stuff maybe there's some  things like you know name age I don't know   other other bits of information like that I don't  want to get into too much of that that's going   to be pretty typical for any real application but  then I think the other thing that we mentioned is   because this is such a social network uh graph  you probably want to have following user ID IDs   uh so these are you know who who are the  people that you're following that each user   is following you probably want to know that that's  probably something that is important to understand   uh there's probably also watch I'm gonna say   uh video Maybe video history ID or something  like that so this is the the watch like your   watch History like what are the things that  you've watched in the past videos that you've   looked at that you've seen or or have been maybe  shown so shown or watched not not really sure   and then for if you're a Creator you probably  also have uh you know uploaded video IDs right   like what are the videos that you up that you've  uploaded so that those to me are things that are   important and then let's let's not forget we  talked at the very beginning like what would   be a success metric here and something that I  probably want to track in here is uh time in app   so how long has this user been spending in the  in the TikTok app because I probably need to   aggregate that and I need to look at that and be  able to look at that offline and so on so that's   probably something I need to store and keep track  of okay so let me see here is there is there what   else am I thinking about oh yeah yeah so okay so  let's get back to this thing that I mentioned uh   foreign into the second point that I  wanted to make which is the for you algo profile I'm going to call it so   and and I'm gonna I should probably change the  video one here as well for you I'll go features   so what I'm thinking here so here's how my how I'm  thinking about this and again I'm not an ml expert   by any means I have limited sort of background  and this kind of stuff but in order to have a   machine Learning System be trained and figure out  uh and be able to make decisions one of the uh one   of the things that it needs to take in is a set  of features and so features could be things like   uh the the video I video ID itself the number of  followers the entire following the the the entire   network graph it could be it could be content  preferences it could be language it could be   I don't know yeah some some features features  like this so things that are sort of specific   to this user that help the algorithm figure  out what types of videos to show to the user   so that profile is something that I would want to  store per user have the the uh for you generator   reference this profile and utilize this this uh  profile along against the algo features that were   generated for a particular video and that could  be length of video it could be content it could   be location it could be language it could be  a bunch of different features so that the this   generator can essentially uh input the features  from the user and based on that and based on all   of the entire database of [Music] algorithmically  feature identified videos make a determination as   to what the best match is for videos to show  to the user so that's a lot of hand waving   but the idea is that this is essentially uh  it's a neural network or some sort of other   system uh and I keep referring to neural networks  it's kind of you know 80s and 90s here but it's   it's a machine learning mechanism that has been  trained offline with all the videos that have   been uploaded they've all been feature extracted  for each of the videos and then those features   are fed into the system to train it uh and  then there's a there's a matching step where   on the Fly we say Okay given this user profile  what are the videos that that we should show that's still kind of black boxy still kind of  hand wavy but I think the the key thing there   is that this profile there's two things here that  I want to think about first off I think the the   and there's probably some sort of a I don't  know if it's a database I don't really exactly   know what the right right mechanism is is  here but uh oh maybe I should put it on its   side but anyway there's some sort of a uh ml  database that's super super hand wavy super   you know over overly simplified but there's  there's that database but and the input into   this generator is the database and all of  the uh the the various uh features and so on   but when making the determination the uh  the for you generator just looks at this   profile and all of the information the one  thing that I think it this is me being naive   I think that this will this process of updating  the profiles and showing stuff that's relevant to   the profile including what have you seen before  etc etc I think that's a slow process by slow I   mean it's not real time it doesn't get updated  right away so if I for example see a video that   I don't like or a follower that I don't want  anymore and I don't want to see any more videos   from a particular user as a user I expect that to  take effect immediately I don't want to see any   videos from that user I don't want to see anything  like that immediately but if the you know this ml   system is an offline asynchronous system that's  churning churning churning so it's gonna it might   take a while to catch up to to that so I probably  would want a little bit of rule-based boundaries   guard rails maybe I could would call it that would  be like exclusion profile I'm going to call this   it's super hand wavy too but exclusion profile  might have in it like followers not to follow   anymore follower IDs or user IDs or something like  that and so in my in my generator my uh this could   be ML and I could make do the exclusion filtering  stuff out in this app service here or I could push   that all the way down into this generator here  but one way or another a combination of some   machine learning stuff which is not going to be  perfect this sometimes it's hard to reason about   how did it figure out make a decision and some  rule-based logic simple rule-based logic that is   that is unambiguous would ultimately filter down  the set of video URLs that the user would see yeah   that makes sense to me so let's see I've I I've  talked a lot about this and and again I haven't   drilled down on this and I know that people are  who are watching this who are ml or AI uh you know   have have much more knowledge please I mean you're  you're absolutely right if I've gotten this wrong   and that's that's on me it's definitely not my  expertise uh and I'm oversimplifying here greatly   but I think ultimately there's got to be some  mechanism that generates this stuff based on all   of the attributes of this user including some sort  of a profile so that's kind of just my the high   level uh mechanism that I'm that I'm suggesting  here Mark's explanation for the databases was   well structured first going high level and then  drilling down into the schema afterwards Mark   did well talking about the algorithmic generator  for the for you feed even though machine learning   is outside his area of expertise if the system  requires including things that you don't know   much about don't just ignore them you can mention  them in a simplified way while being upfront with   the interviewer about the limits of your expertise  as Mark was here let me stop here and just kind   of ask you uh what other things are you thinking  about or whatever things are you wondering about   here with this design yeah well I'm wondering  uh I'm wondering if there'd be any bottlenecks   and yeah I'd also like to hear if if you  have any any enhancements you could make yeah good questions Okay so I mean I one  bottleneck which is not so much a so when   I when I think of the term bottleneck I typically  think about a performance bottleneck but I I think   I already mentioned there could be just a possibly  a storage bottleneck in terms of keeping all of   the user data in one single table uh one one  database just because of size that could be an   issue don't don't know for sure how I might result  think about that or think about solving that is I   might have to partition the data for the largest  regions or something which maybe isn't ideal but   maybe I'd have to do something like that so that  might be that but that's a different question than   I think what uh what you're asking about  and so I think a bottleneck an actual   like a bottleneck in terms of performance  that I can think of is This generator here   that is generating on the Fly you know in near  real time here when this app service is called   every time the app sends it a message saying  hey give me give me and what's next what's next   that could be a bottleneck because it has to  respond quickly and it has to run in basically   in real time or it has to read ahead so so  let me let me just make a quick detour here   one of the one of the uh the components really  cool features of TikTok and I think I mentioned   it but I I I just I find it fascinating is that  uh you're in the app and everything is there   immediately there's no lag there's no hesitation  there's no clicking there's no Spinning Wheel it's   all just right there as you're scrolling in the  app your videos are scrolling and they're starting   to play as soon as they come into Focus how do  you make that happen if you're you know sending   a request to some server for a megabyte video  uh well the answer is you don't uh I think the   answer is that as The TikTok app is running and  I think it's probably running in the background   some it's actually constantly talking to the app  surface and saying what are the 10 next videos   or the 20 next videos or whatever it might be  and so it's it's doing a what's called a read   ahead so it's pre-computing basically the stuff  so that it's ready to go and it can send it to   back to it right away that's not perfect and  and there's some issues with that but that's   one way to make this this seem extremely alive  and uh responsive application but that doesn't   solve necessarily the bottleneck of the uh this  generation here on how you might generate this   list this this uh list of videos to recommend  and so one thing I could think of doing is   it over time when this app service and the  generator start to maybe mature I could imagine   moving this logic some of this logic into the  app onto the popular platform so if we've got   you know uh iOS Android Windows Mac and uh maybe  maybe web then I could imagine that on the most   popular platforms let's let's say iOS let's  say iPhone and Android on just mobile devices   that you move this generator code into the actual  application sorry into the phone onto the phone so   that it can take advantage of maybe even the GPU  on the phone because if this is like a you know   it's a math heavy or you know a kind of   your phones are the smartphones are super powerful   these days so you have a lot of compute power to  use there and that could reduce the bottleneck of   having these things sort of running out in the  cloud so that could be one way to solve that   so that that would be a bottleneck that I would  be thinking about is this is this compute thing   because it oversimplified I'm sure it's much more  sophisticated complicated than I'm making making   out to be so that's that's on the bottleneck side  yeah that's an interesting answer not to mention   the data bottleneck for the user data but the  more interesting one was around the algorithmic   for you generation and Mark's solution moving it  from the cloud into the application on the device   for mobile applications this can be a powerful  option to offload compute from your distributed   system and onto user's devices you you asked  about enhancements so I mean I guess maybe that   would be an enhancement too is to move it to move  this move this logic to into the into the phone   I think an enhancement that is probably more of a  product enhancement or an application enhancement   is that I could imagine a right now you get a 4u  feed and it's it's customized to your likes and   your preferences and I could imagine in order  to test out new algorithms and test out new   feeds and so on having an  algorithm the the the the generator   occasionally pepper the the list of videos with  something that is outside of your uh outside of   your uh domain or your to kind of give  you a taste of something you don't want a lot of   that because you don't want people to get a feel  like okay I'm getting something that I don't like   or I'm you know I have no idea where this is not  somebody I'm following right you don't want that   to be super super in your face but you could  imagine peppering the results every now and   then with a little bit of uh of something that is  new and unusual uh you know sort of a a surprise   unsolicited uh idea for a video or something like  that and that would still want I would still want   that to go through that filter that I was talking  about those guard rails of saying well if this is   from a follower that I explicitly said don't look  at then you know I wouldn't want to see that but   but if it's something that you know hasn't  explicitly been uh disallowed then maybe   it's something that you know you you get to see  and uh uh is an interesting interesting sort of   feature maybe broadens your perspective a little  bit opens up your mind to some other other ideas   you know maybe it's baking or something like  that and you haven't watched a lot you know   you're not interested in baking I I don't I  don't really uh have have a good example here   but that could be some that could be kind of  more of a product enhancement to to think about   and I think from a system perspective I think  I mentioned earlier one enhancement I could   imagine making is that if you if some of this  architectures as this architecture stabilizes you   could pick out certain components or aspects that  are costly that are expensive uh so if the Dynamo   turns out to be super expensive or RDS maybe you  decide you know what I'm going to go ahead and   pay the engineering costs and get my engineers  to do our own hosting of a MySQL solution or   or do it on a different platform or something  like that just to kind of reduce costs because   now it's worth it at the scale to to build our own  solution so that could be something to think about   but that's always a trade-off between the uh you  know the the engineering cost and the the sort of   service costs that you wind up paying for these  kinds of solutions okay so okay that's [Music]   that's kind of uh uh I I guess that's kind  of where I would leave it at Mark's idea   that you could pepper the four you feed with  a different type of content there's more of a   product enhancement than a system enhancement if  you're interviewing for an engineering role you'd   probably want to come up with a system enhancement  if possible keep in mind the role and level   that you're interviewing for as an individual  contributor position more technical depth might be   expected while if you're interviewing for say an  engineering manager role you'll want to consider   things such as cost trade-offs as Mark did here  overall a great interview from Mark he showed   some clear thinking and superb communication  skills on what was a pretty tricky design   yep let's wrap it up now yeah Jonah just have  a quick look over your design are you are you   happy with it do you feel that you you met the  the objectives that we laid out at the beginning   yeah so you're I think so because I think you  asked about sort of the upload and and download   uh stream uh or streaming aspects of it and we did  some capacity there we have sort of a we have the   databases we've got the rough idea of how that's  going to flow through the system I've got a little   bit of hand wavy stuff around uh the for you feed  which is sort of the what we show so I feel like   this is this is a decent start again I there's so  many things and I I you know I imagine the real   TikTok architecture is significantly more complex  I mean people have spent engineering decades   building this thing and it's an amazing app so uh  there's no way I'm gonna reproduce that in in any   amount of time so uh uh yeah kudos to those to  those Engineers that's very good uh well I think   I think you made a pretty good attempt uh let's  finish the interview there and yeah thanks Mark uh   you can relax how was it was it was it enjoyable  that was fun hard difficult but fun it's a tricky   one but uh yeah it was it was good fun uh watching  you watching you do it and uh yeah I think   hopefully people will have learned a lot I'm  sure we'll get lots of comments on it so yeah   Mark thanks very much and uh yeah I hope we'll see  you again here sometime soon thank you thanks Tom   hello I really hope you found that useful if  you did you can like and subscribe and why   not come visit us at IGotAnOffer.com there  you can find more videos useful Frameworks   and questions guys all completely free and you  can also book expert feedback one-to-one with   our coaches from Google meta Amazon Etc thank  you and good luck with your interview foreign [Music]

💡 Tap the highlighted words to see definitions and examples

Vocabulario clave (CEFR B2)

interests

B1

The price paid for obtaining, or price received for providing, money or goods in a credit transaction, calculated as a fraction of the amount or value of what was borrowed.

Example:

"your uh domain or your interests to kind of give  you a taste of something you don't want a lot of  "

maintenance

B2

Actions performed to keep some machine or system functioning or in service.

Example:

"Amazon provides because they take a whole bunch  of main a mono or maintenance and operational load  "

operational

B2

Of or relating to operations, especially military operations.

Example:

"Amazon provides because they take a whole bunch  of main a mono or maintenance and operational load  "

comfortable

B2

A stuffed or quilted coverlet for a bed; a comforter.

Example:

"important to be comfortable with the diagram  drawing tool that you're using so that you can   use it to help and enhance your explanation rather  than having to struggle with it Mark also covered  "

operation

B2

The method by which a device performs its function.

Example:

"Amazon provides because they take a whole bunch  of main a mono or maintenance and operational load  "

downloads

B1

A file transfer to the local computer.

Example:

"like you to focus on on a back-end distributed  system that supports uploads and downloads"

cloudfront

B1

A B1-level word commonly used in this context.

Example:

"cloudfront or something like this and what this  does is it's basically a set of caches in uh near  "

encodings

B1

A B1-level word commonly used in this context.

Example:

"encodings depending on you know for a particular  video so during the upload phase and possibly  "

timestamp

B1

A variable containing the date and time at which an event occurred, often included in a log to track the sequence of events.

Example:

"timestamp when was this thing created uh oh  probably uh like a Creator ID who created this  "

credential

B1

(chiefly in the plural) documentary or electronic evidence that a person has certain status or privileges

Example:

"the typical stuff about users like a user ID  some you know credentials uh you know login  "

¿Quieres más ejercicios de dictado de YouTube? Visita nuestra plataforma de práctica.

¿Quieres traducir varios idiomas a la vez? Visita nuestraWant to translate multiple languages at once? Visit our Traductor multilenguaje.

Consejos de gramática y pronunciación para dictado

1

Chunking

Observa las pausas del hablante después de ciertas frases para facilitar la comprensión.

2

Linking

Escucha el habla conectada cuando las palabras se unen.

3

Intonation

Presta atención a los cambios de entonación que destacan información importante.

Análisis de dificultad y estadísticas del vídeo

Categoría
people-&-blogs
Nivel CEFR
B2
Duración
4158
Total de palabras
11706
Total de frases
519
Longitud media de frase
23 palabras

Recursos de dictado descargables

Download Study Materials

Download these resources to practice offline. The transcript helps with reading comprehension, SRT subtitles work with video players, and the vocabulary list is perfect for flashcard apps.

Ready to practice?

Start your dictation practice now with this video and improve your English listening skills.