Google system design interview: Design TikTok (with ex-Google EM) – YouTube Dictation Transcript & Vocabulary
En iyi YouTube dikte sitesi FluentDictation'a hoş geldiniz. Etkileşimli transkript ve gölge okuma araçlarımızla B2 seviyesindeki videoyu ustalaşın. "Google system design interview: Design TikTok (with ex-Google EM)" parçalara ayrıldı; dikte çalışmaları ve telaffuz geliştirme için idealdir. İşaretli transkriptleri okuyun, anahtar kelimeleri öğrenin ve dinleme becerinizi geliştirin. 👉 Dikte Alıştırmasına Başla
YouTube dikte aracımızı kullanarak İngilizce dinleme ve yazma becerilerini geliştiren binlerce kişiye katılın.

📺 Click to play this educational video. Best viewed with captions enabled for dictation practice.
Etkileşimli Transkript & Vurgular
1.[Music] hello welcome to another system design mock interview this is where we get one of our top coaches to play the role of candidate and take on a system design question for you to watch and learn with and I'm very pleased to say that today we have Mark with us again Mark how you doing great thanks Tom great to have you with us uh quickly do you want to just introduce yourself and tell people a bit about your background sure yeah so I am a former engineering manager at Google mostly at Google sometime at Uber and uh spent a lot of time on large-scale distributed systems so that's I think what we're going to be doing here and of course I've been doing coaching I'm available as a coach and interview coach but I hear the tables have turned so here we go again all right well uh let's turn those tables let's crack on with the question and yeah Mark the question I want to ask you today is how would you design TikTok
2.okay TikTok all right uh popular vertical format video sharing application uh popular with lots of people around the world I guess and so uh it's kind of a social media platform and uh pretty pretty interesting and pretty complex I think so uh tell me a little bit about maybe uh use cases here that we should be thinking about yeah sure let's not spend time on the mobile app and and content Creation in the app I'd like you to focus on on a back-end distributed system that supports uploads and okay that's good uh yeah because certainly the the application is the uh is the the the most central point of the whole thing and you could do lots of cool things in terms of generating your being a Creator so I'm glad we don't have to worry about that that's also not my forte so in terms of uh sign in and sign up and things like that can we kind of skip that too that's going to be pretty standard I think I think that's that's you're right that's the less interesting part so let's skip over that yeah okay so you want to focus on sort of the video uploads into the system and then streaming the videos or consuming the videos I guess absolutely okay so tell me a little bit about you know how many users uh how many videos uploaded what's the scale of this of this thing what are we thinking about here okay yeah well we've got uh let's say one billion users uh in around 150 countries okay and let's assume one billion video views a day and I think there were I think I saw in a recent year there were 10 billion videos uploaded in one year so so let's use that figure as well okay all right so let me change my little and then go to my little diagram here it's a TikTok system design all right okay so let me let me capture just capture some of this stuff so make sure I get this right here so we've got I think you said a billion users uh uh across 150 150 you said 150 countries yeah I think you said uh a billion videos per day that's right is that right yes and you said uh 10 10 billion videos uploaded uh last last year or something is that something I say yeah yeah okay per year all right okay uh all right that makes sense um um all right that sounds reasonable I mean it's that's big and that's good because we want it to be big we want it to be a big distributed system so if I if I think about this app though so uploading down streaming viewing even though we're not going to talk about the app too much or you know we're kind of putting that aside uh we probably should think about some uh success metrics so would you do you have any thoughts on success metrics do you I mean I can think of some but do you have any yeah I'd prefer to hear your thoughts on it yeah give me some suggestions uh okay all right so I mean I guess in a classic sort of web application kind of thing daily active users or even an application phone app daily active users make sense I think for TikTok the simple metric I guess would probably be let's see so success metric something like time and app uh so how long are people spending in the app I met I mean I I'm not a heavy TikTok user at all but I imagine that people use it in different ways some maybe just scroll through the videos some watch the entire thing and react and share and do lots more interesting things and then there's the creators but time and app is probably the simplest metric to think about so maximizing time and app and uh I think there was some I don't know you might know this do you have a do you know how long people spend in the app do you have a do you have a sense of that um I think let's say like around I think let's say the average is around an hour a day okay does that sound okay sound about right sure so let's let's assume that uh okay that sounds fine here I'm gonna try and just uh where's my trying to oops trying to see if I can out well I'll just leave this okay so all right so time and app okay so having had uh kids and thinking about screen time and all this kind of stuff I can imagine enhancing this if I weren't you know sort of thinking more about the the the online Health maybe maybe I'd want to even as a as a working professional maybe I wouldn't want to maximize the time in my app during business hours I can imagine some future thing maybe to maximize the time in the app outside of working hours or school hours or something like that but that could be a that would be a future enhancement I don't I don't know that I would yeah for now let's just focus on this time in app sounds sounds reasonable it's always a good idea to narrow down the scope of the question to make it achievable within the hour here Mark did this well confirming that he could just focus on TikTok's back end distributed system rather than the TikTok app he made another good call in suggesting we skip designing the login process covering that very standard part of the system might have wasted time that Mark wanted to spend on more interesting aspects that gave them greater opportunities to impress the interviewer of course always confirm your suggestions with the interviewer to make sure they're on board with the direction you're taking right so okay so time and time and app uh currently about an hour per day sounds good all right so let me see if I can make some more detailed assumptions here some of this stuff I need to make this a little bit smaller so I have some room here all right so detailed assumptions I'm going to assume that so this is vertical it's a vertical form factor video sharing thing as I as I mentioned and typically these videos are like you know they're was it called 1080p but they're rotated so like 1080 pixels by 1900 pixels or something like that so those kinds of videos the the typical I think the video range time range for these things is Clips is like three seconds to a minute but uh just for sake of simplification maybe we can assume uh 10 second video on average if that's does that seem okay yeah yeah I think that's it works and so uh and I think I said 1080 times 1920 pixels which you know in a video uh a video for 10 seconds I think we can probably assume a megabyte per video if that seems it seems all right uh okay so megabyte per video let me just see if there's any other things I'm I'm thinking about here all right let me so with those assumptions maybe I can do some really rough calculations here about in terms of the the scope of this thing and and see if let's let's test this and see expand this a little bit to see how big this is going to get so if we say that we've got a billion let's look at uploads first let's look at uploads because that translates to storage in addition to some traffic and it looks uh looks pretty heavy so if we say 10 billion videos per year uh then if we do the rough math and we divide by 360 well we divide by 365 days in a year and that comes out to approximately let's see 10 million 365
3.I'm going to say 30 million per day uh 30 million videos upload loaded per day and if I if I say uh actually let me let me back it let me put that let's hold hold that thought I mentioned storage and I want to finish the storage calculation first so 10 billion a year times a megabyte I think I said video per per video so if you do a million times 10 billion what you wind up with is so 10 billion is you basically 10 gigabytes times a thousand would be 10 terabytes times a million which would be Omega million would be uh petabytes so this is going to be 10 petabytes of uh of videos per year that's just doing doing Simple Math now in reality uh we have to factor in possibly some replication to have redundancy also possibly we have to have different formats for different types of devices so you know you could I mean if we just sort of do a super super simple calculation here I might say so devices and replication uh times 10 so then we'll say 100 petabytes per year of storage requirements okay so if if that and that's a lot that's a lot of storage but a I think that uh these days you can you can scale so this the reason I'm mentioning this now when I'm doing this calculation now is because it uh helps me think about uh systems that could support this kind of scale and just naturally megabytes or uh videos or blobs or objects and so I'm already leaning towards a storage solution like a blob storage solution of some some kind and modern solutions out scale uh quite well in that in that way so I I think this could that gives me sort of an idea of what kind of uh you know video storage solution I might I might utilize foreign if I think about so is that does that seem reasonable so far in terms of calculation okay yeah that seems okay and then okay and then so now uh I I'm gonna put here I need to clarify this just a little bit let's let me think also about the that was the raw video data that I was talking about let me also just briefly talk about or do a quick calculation on like video what I would call video metadata which might be this might be like uh an ID for a video when was it created how long is it maybe a number of likes ultimately because we want to be able to track how many people like this video so so something like that so if I think about that kind of a thing metadata and I say let's say it's a kilobyte of data uh for for metadata so now instead of 10 uh 10 petabytes it's a factor of that's a factor of a thousand less so we're talking about 10 terabytes of metadata uh actually I should put metadata here video metadata and so 10 terabytes of video metadata and so again I'm mentioning this I'm doing this math just to kind of get an idea of where would I put this metadata this information that needs to be potentially utilized for purposes of of uh figuring out which videos to show and and identifying the videos and so on and Counting stats like likes and so on and so where would I put this kind of a thing and so I probably because these videos are not I don't need to do I think I probably don't need to do super complex calculations across the videos they're connected to each other so they don't need to relate to each other so I probably would use some sort of uh non-relational or or no SQL storage solution for this kind of thing this would be like key value basically key value pair storage and there's lots of solutions out there for that sort of thing but that's probably where I would put this so does that make sense so far yeah makes sense so I initially started off by doing a uh calculation where I was trying to figure out the traffic the incoming traffic and I put that aside because I wanted to look at storage first so now let me look at traffic so if I say that there's again 10 billion videos uploaded per year and they divide that by 365 days in a year then that gets me approximately 30 million videos per day and if I then uh divide that by the approximate approximate number of seconds in a day which is a hundred thousand then uh that gets me to 30 sorry 300 videos per second so we're talking about uploading 300 videos per second if you assume that there's some unevenness during the day and that might actually translate to more like you know you could say a factor of let's pick a factor of three just to make it easy and let's just say uh a thousand per second at Peak because the 300 per second is an average so all right so we've got a thousand videos uploaded for uh per per second and what we that's that's not a huge deal because if you think about servers that can handle uploads out in the cloud that's that's a very very small amount if I think about a thousand now per second uh videos per second and I multiply that by a megabyte each and a megabyte is approximately 10 megabits and we I mentioned I think I mentioned that we do Network traffic and megabits not megabytes so then that would give me uh let's see 10 million times a thousand would be 10 billion or 10 megabits per second would be the the the traffic uh coming in Ingress basically that's not a lot uh modern network cards on on servers are 10 gigabits per second so this is this is not a bottleneck so at least we understand that this is not a big big bottleneck now uh I should probably I should do the same thing on the on the way out meaning and I realize we're spending a lot of time on on math here but uh what we can assume is that with the uh I think you said a billion videos viewed per day and we just did the calculation of 10 billion sorry 30 30 million per day but if I say a billion videos viewed per day and I divide that by again a hundred thousand seconds approximately uh then that's uh going to be let's see 100 thousand million no sorry uh no no it's a thousand million so this is billion uh million so it's it's a a million thousand videos per day divided by a hundred thousand is going to give us a ten thousand uh per second so we're viewing ten thousand videos per second and again if we do that uh similar math you can kind of see this is a factor of ten of the of the other number so instead of 10 gigabits ultimately what this means is that we would have 100 gigabits egress which is which is also not a lot so okay I think we've done done with some high level math I think these are all numbers that can be worked with and sustained the one other math that I didn't really do but I could I could also do is that you know users if there's a billion users you could think about another profile user profile how much would that cost to store and you could imagine that that could be maybe 10 kilobytes or something like that and so that would be uh maybe 10 terabytes or or something of that nature so we could talk about that later I think we can skip that for now it can be useful to do some high level maths to inform your thinking and your solution so make sure you practice multiplying and dividing big numbers like these as part of your interview preparation as if you're not used to it it's very easy to get lost in all the zeros Mark did some solid calculations to work out the storage for the raw data and metadata however he didn't do the Maths for the user data storage requirements instead moving on to make the traffic calculations and queries per second calculations it would have been a better idea to finish the storage calculations first as these requirements were more likely to influence the design as a general rule when you're working on storage calculations finish them yeah okay so I think we should we should probably move on so let me let me just draw a couple boxes here some some shapes here I'm going to draw a little uh TikTok app and then of course we're we're going to do the typical thing I don't really want to spend too much time on uh you know load balancers here or you know you've got your you're going to have some sort of uh I'm going to call this maybe this is a TikTok uh and like an app server or something like that uh yeah application yeah TikTok app service this is the thing that would run in the cloud this would be one of my servers and there'd be multiples of these and you might have well actually let me see I take that I think I take that back I think TikTok we're going to separate these two because we you asked me to talk about uploads and and viewing and streaming right I think those are the two use cases that we want to spend time on yes so there's going to be some sort of a TikTok uh yeah like a Content uh service I guess application Services maybe is maybe okay maybe that's all right because it's it's a service that generates for the application all right so let's go ahead and leave that and then let's go ahead and create another one which is the TikTok upload service because we're going to have we're going to be uploading videos and I think I might keep these separate separate just because they serve slightly different purposes and you know microservices architecture best practices you would you might separate these things out uh and this app service when you when I first opened my application and my my TikTok app and I come through the load balancer and I hit one of these app Services the very first thing that I might see is all of the stuff the videos Etc and the followers and all these kinds of things but I might be able to if I click on upload or create a new video then I might get it redirected to this upload service does that make sense okay yeah these are kind of the front ends if you will to the to the services okay okay so let's talk about uh let's talk a little bit more about the databases so I think I would like to let's talk about the video database first um and I'm I'm I realize I'm already making giving myself not enough room here in this uh in this screen so I'm going to move stuff around a little bit make things a little bit more palatable and interview candidates you know I would recommend don't spend too much time on on making things pretty but uh being able to see stuff is good all right so uh here we go so we've got uh this is our so the the one the first thing I talked about was the video lob storage and uh that's where the actual Raw videos would go and so this has to be sort of a kind of like a file system almost but I probably would use something like Amazon S3 for this because that scales really well and it's it lends itself nicely to this kind of stuff and it uh you know it's purpose-built for this kind of thing okay so that would be the video of The Blob storage and by nature of uh Amazon web services you can nicely you can do a couple of really cool things you can replicate the data for better availability you can have Regional uh replicas so that it it knows it can be sort of more aware of regions or you can you can utilize the naming for that and the other thing that you can do is you can tier the storage data so we talked about I think I said 100 petabytes of raw data that for a year and so that can accumulate quickly and so uh you know with TikTok videos my I have make I'm making assumption that a video's life span is not more than maybe a couple of months for most videos that you don't go back and watch you know three-year-old videos too much that it's all about the most recent stuff the influencers creators whatever it is which means if that's true assuming that assumption is true then that means that you can tier your data such that older data or data that hasn't been touched or looked at as often can go into Cold Storage so I'm going to put a keyword in here which is this tiered AWS S3 so what that means is that if a video is let's say a year old and nobody's really looked at it it could go into really slow offline storage which means it doesn't cost as much and it's uh it's more cost effective versus stuff that is most recently uploaded does that make sense yeah makes sense okay okay so that's kind of like a blob storage I'm going to use yet another one of these shapes because we're talking about two different things so then video metadata and uh video metadata I think I mentioned this is things so what would I think about for video metadata actually let me draw another box here because I think it's two and we can talk about I want to get into that in a little bit more but uh video meta data uh I'll get into that later I want to talk about the tables the type of what's actually inside this but video metadata I think whatever it might be I think it's going to be I think I mentioned that this is what do we say this is going to be 10 terabytes I think I said so I and this is not and I said no SQL storage uh no no SQL so it doesn't need to be relational but it's not really blob storage because there are some some fields in there that you might want to might want to look at and so on so what I probably would do is something like uh you could do Cloud spanner although that's more like a SQL thing or you could do AWS you know dynamodb as an example uh and and I'm using just to be clear here I am biased for purposes of an initial solution to get it off the ground to hosted Solutions like Amazon provides because they take a whole bunch of main a mono or and load off of off of uh the engineering team and so as an engineering manager I recognize that there's a cost to the engineers that need to be do writing code and I don't want them to reinvent Wheels like lob storage and and uh indexed key value storage and things like that so I'd rather use something existing even if initially it means we have to pay a little bit more maybe we can eventually move that but right now this would be a really good solution because dynamodb scales really nicely in terms of the the rows that you number of rows you can have and so on so these two things together uh if I if I draw yet another box around this and yes that's of course that's what's going to happen uh so these two things would make up basically the the the uh the video storage database if you want to call it that okay so this is for the actual videos okay good so far yeah we're good so far okay go ahead yeah I wanted to I wanted to ask how you think the different regions and countries could be reflected in your design are you are you taking them into account yeah good question so um yeah so I I think I so the nice thing about another nice feature of some of these existing hosted Solutions is that you can include Regional they have Regional features and so that's not really answering your question I think uh if I think about it that you I can imagine having different data centers in different locations because you want the you want the response so if you're in India uh you want the responsiveness to be fast if you're in Europe you want the responsiveness to be fast of the videos if you're in China us same kind of thing and so having the data local to near to where you are can be uh super well is what you kind of you have to do this as a global application and so one of the ways to do that is by by partitioning the data possibly based on region or language or things like that and locating the data in different places so that could be one of the aspects of a kind of a regional you know how would regions and countries play into this uh another one would be kind of related to this is uh we when we get to the serving side or the streaming side in order to not overload these these storage systems you really need to have and I guess I better draw this go ahead and draw this so we have it uh some sort of a a CDN a Content delivery Network like or something like this and what this does is it's basically a set of caches in uh near the the closer close to the user that cash data that is typically uh popular let's say so videos that are popular in a particular region would be in this CDN stored in the CDN and thus made to be available faster and uh the I think the regional the regionality might happen automatically meaning that you probably would get videos created in India to be more commonly located in the cdns in India versus Europe U.S China whichever it might be so I think that there there's some there's definitely some Regional aspects to this I'm not sure whether I would explicit how explicitly I would partition the data based on a region or language or something like that that I don't know yet that's a good question but I think you would get some you can do it and you would get some benefits by having this this CDN which is kind of necessary for this kind of kind of stuff so I'm not sure if I answered your question yeah more or less yeah please uh please continue okay all right so the piece that we're missing the piece that I haven't talked about yet and I'll just mention briefly is well as brief as I can uh let me draw and yet another database because we also mentioned uh let's see this is a video database I I mentioned kind of like a uh a user user data I'm just going to call that and if I I should just think about this for a moment a billion users across 150 countries but they have friends and followers and connections and this is where this whole social graph kind of comes into play that you know a social network kind of a thing so because of those connections and those connections may very well span regions and countries and things like that you uh you probably want to have this user data not be explicitly partitioned out into separate separate databases that can't be talked to each other so I I probably would want to put this into some sort of a SQL storage solution and I still would want it to scale so I might use if I'm if I'm going to stick with Amazon here Amazon RDS relational data data store allows you to use SQL basically use SQL solutions to to store your data and I'm I'm hand waving a little bit about the amount of data here I think at a billion users and a kilobyte gig terabyte that's a that's is that right a billion is a gig uh kilobyte is a terabyte and if 10 a kilobyte might be small depending on what's in there but let's say it's a kilobyte so a terabyte of data that fits you can do that with with modern uh SQL Solutions that's possible but we're starting to get into some some bottlenecks there potentially so that might be something we need to look out for but I'm going to hand wave over that for the moment and say that we can store the user data in a relational database oh something like Google Cloud spanner actually scales you know much more broadly than that and gives you some relational features so you could use you could use that too but anyway so user data so that that would be that's another big aspect of this so uh if I talk about if I talk about the upload path and I don't know if I need to draw arrows here I probably should probably good idea to draw some arrows uh you know the the the upload that you know you you the app uploads a video let's say I create a video and by the way I'm again I'm glad that you mentioned that we don't have to worry about the app because there's so many cool features in The TikTok app how to create a video I mean you've there's there's this uh duet and uh a feature and there's there's there's you know you can clip and trim and you can bring in there's filters there's all sorts of cool features but once you have the video now you you know upload it and so the TikTok app would upload that and send that to the upload service and then that that has to go from there to two places it really needs to go to the uh to the The Blob storage whoops it needs to go to the blob storage and it needs to go to the to the metadata store because you need to be able to to to add the information about the video uh so that you can you can figure out can it be shown does it make sense to show to people there's a yeah so so that's kind of that's roughly the flow of an up that's a very super simplified flow of an upload but what I probably should think about for just a moment is uh what's in that data but I I can I can also talk about the let me think about this for a moment what would be better to talk about let me finish the floats let me let me talk about the uh the video uh streaming flow if that's okay yeah go for it because now I'm gonna get this is going to get much more complicated I think because I think there's some magic that I've completely hand waved over with with that TikTok does and I'm going to just draw a box here and I'm going to call this uh I think the feed is called for you so it's a curated customized feed that you get it's the video stream just for you uh so I'm going to call this the for you generator maybe I'll call it that I I think that's yeah and so let me see if I can I can figure this out so if I'm when I load my TikTok when I first go into my TikTok app I'm already seeing a bunch of information like friends who who I should follow uh I don't know maybe some explore some some options in the in my in my app but the most important thing and the center piece of all is I'm already seeing videos so I there's there's no lag there's no clicking on things you I'm already as soon as a video is in focus it starts playing and so that means that my app is actually uh getting multiple has already gotten multiple videos by the time I start the app so let's see here if I so if I go to the app service the one of the first things that I think the app service has to or one of the important things I should say that the app service has to do is it has to talk to this generator and by the way this generator I this is also going to have to be multiples there's got to be lots of these because we're scaling but this this app service needs to needs to be able to talk to some back-end uh algorithm that says what should I show the user what are the cool videos that I what's this for you list that I need to show and I think that this is actually rather complicated and sophisticated and if I were doing this uh today I would probably employ some sort of machine learning system to figure out what to show the user because there's just too many variables to try and build a rule-based system a handcrafted you know if if this this many likes then show to them if they're you know I think that that would get out of hand pretty quickly and I think would be really hard code to maintain so I think I would I would I would think about a an algorithmic solution to this a sort of a either neural network or some sort of a you know a similar similar type of learned online learning solution and so machine learning solution I guess is that is really that's the over overview uh and so this TikTok app service needs to contact this and then the question is what is this thing doing how is it actually figuring out what these what these videos are that it needs to show because ultimately let's assume it's a black box for now and it does Something Magic ultimately what it's going to do is it's going to return back to this application service a set of let me draw do a little text box here uh and I'm going to make this small so I can fit stuff but uh this is going to be sort of among other things a list of videos uh sort of probably actually video IDs so yes it's going to return back the list of users it's going to return back you know uh the user profile maybe some some other types of things but I think the most important thing that's going to return back is it's going to return a list of video IDs that the application ultimately should show to the user and so uh good good does that make sense so far okay so the application contacts talks you know reaches the service the service uh builds up what it needs what needs to be shown in terms of content but but the most important thing is that it gets a list of video IDs to show that are the the uh the best videos to show for this user the for you list of videos and actually maybe that's I could be super explicit about this for you video IDs if I really want to be explicit okay so then what happens with that so the the application service I think would wind up using uh oh sorry I'm trying to draw another arrow here okay so the application service I think would utilize the would get the from those video IDs would get metadata information about the videos how long they are what is the ID the URL for where to find this this this video so that I actually can go and fetch it maybe some other things like who created the the video how many likes it has things like that so this this app service would go straight to this database to get that uh metadata information so that it could pass that on to the application okay and that again I'm going to go and uh you know draw too many arrows here probably don't need that many but it would return turn that list so this would be the metadata now the key the reason I'm drawing this is because the metadata is only one aspect of course the biggest part of this is that you actually want the actual videos to be to be returned to the to the application so how is that happening and so that that's really happening in a kind of a direct way so oops need to drum better arrows here so from the uh the application is going to get back a as part of the metadata the list of videos so that let me see if I can draw this better here or give a little bit of a I'm going to put a I need to put somewhere put some text in here and I apologize this is again kind of small here but this is video uh uh info or maybe I'll call I'm going to call this video URLs these are like the The Blob the IDS of where to fetch the the videos because these are actual like https or some sort of URLs and I am also hand waving over authentication here by the way so you know we're assuming https this is you know secure login et cetera Etc so there's a lot of hand waving going on here so this app service among other things returns these video URLs and then the TikTok app for each of those app video URLs goes and fetches them and it fetches them basically by just going to those URLs and it turns out that that URL is going to land uh let me draw another little uh shape here to make this super super explicit pretend there's a little Cloud magic Cloud but the TikTok application is just going to go go to the cloud and specify this URL get me this URL that will hit a CDN a Content delivery Network now if that content delivery network has the video because it's a popular video uh in its in its local cache then it it will just go ahead and uh and deliver that and that's just how cdns work this is nothing special I'm not you know I'm not trying to recreate uh it meant the wheel here either so if this if the CDN has it it'll just serve it and it'll be fast and for popular videos that's super fast if not then it'll wind up going to actually going to the back end storage Service uh like S3 and going and fetching it and then delivering it and then it'll decide based on whatever its logic is whether to store it it's cash or not and that's I'm going to again hand wave over that so uh the the the key thing I wanted to just be clear about here is that from a video serving or perspective that the application there's a two two parts to it there's the actual video data which goes through the CDN and the blobs and then there's the metadata which will which would come through the uh the application service and all of the content that it returns back to the app here's here's everything you need to show that's small and and structured data that you need to to make your make your phone application look look good so I'm going to stop there does that make sense that makes sense Mark yep marked her a good high-level diagram he did an excellent job talking the interviewer through the upload and download flow and then for the video streaming flow 2
4.you might have noticed that Mark Drew lots of arrows to illustrate the flow of data you don't have to do this talking it through with the interviewer is sufficient if you prefer don't feel that your diagram has to be totally self-explanatory that said it's really important to be with the diagram drawing tool that you're using so that you can use it to help and enhance your explanation rather than having to struggle with it Mark also covered the download flow for video streaming and how the four you feed is generated algorithmically even though he did this in a very simplified way it was important to touch on because its key to upmates TikTok unique this way he prepared the ground to drill down into it later in the interview alright so if that seems good so far then there's two things I'd like to do if we still have time one of them is I'd like to just talk a little bit more about the the data that I think that's the schema of the data what are the elements of the different the two different bits of data metadata really the video metadata and the user metadata I want to talk about those because I think that they they are related to the second thing I want to talk about which is a little bit more thinking around how I might do this this generator if I were to treat it a little bit less as a black box and try to dive a little bit deeper although that's not my really my forte but I but I want to try only kind of take an attempt make an attempt at that so if that's okay then then that's that's kind of where I'd like to go from here I'd love to see you drill down in Sedalia so let's talk about the video metadata uh what's in this video metadata so I think there's so I mean there's there's standard stuff that I don't know how much of this I want to spend time on but but I'll but I'll just kind of mention it so uh there's there's going to be a video ID which is just a unique identifier for the video there's going to be also a video uh URL which is this thing that I mentioned up here that you would go and fetch and I'm going to uh I'm gonna mention this but I'm not going to write it just because I don't want to use the screens this the space here I mentioned briefly earlier that you might want different depending on you know for a particular video so during the upload phase and possibly offline asynchronously uh if we want to store differently encoded forms of the same video then we probably need to have different URLs or different blobs for those things and so we might have the original raw uploaded video and we might have a a you know iPhone optimized video when we might have a Mac Book in the windows and a you know something like that so it's so that means that you might have video URLs instead of as a list instead of just a single one but I'm going to hand wave over that for now and just say that there's one video URL and a pardon the capital oops pardon the capitalization I think that's just the the way the this this Auto completes here okay so video ID video URL what else do we want to have in here maybe like a creation when was this thing created uh oh probably uh like a Creator ID who created this video who uploaded this video right we want to know who it is we probably want a uh I'm gonna say like likes maybe we want the number of of likes or maybe it's more generically would be sort of reactions or something like that on this video like how many you know how many people liked it how many people didn't like it Etc things like that and there's something else that I'm gonna that I want to put in here and uh but I'm gonna put just a placeholder in here for now uh because I want to get back to it which is related to the the uh algo uh features I'm just gonna call this for now and and let's just leave it at that so so I'll put a pin in that and then get back to it so this these are some of the things I could think of for this video uh there might oh there might be something like length or duration of the video you know something like that right some other metadata so that that's what I would think about for the video metadata and uh uh let me now talk a little bit about what I think would be in the user data so I'm going to move make some space here copy this put this up here and apologies for the type tight spacing here but uh uh all right so user metadata so what are the things I could think about so there's the typical the typical stuff about users like a user ID some you know credentials uh you know login type of stuff maybe there's some things like you know name age I don't know other other bits of information like that I don't want to get into too much of that that's going to be pretty typical for any real application but then I think the other thing that we mentioned is because this is such a social network uh graph you probably want to have following user ID IDs uh so these are you know who who are the people that you're following that each user is following you probably want to know that that's probably something that is important to understand uh there's probably also watch I'm gonna say uh video Maybe video history ID or something like that so this is the the watch like your watch History like what are the things that you've watched in the past videos that you've looked at that you've seen or or have been maybe shown so shown or watched not not really sure and then for if you're a Creator you probably also have uh you know uploaded video IDs right like what are the videos that you up that you've uploaded so that those to me are things that are important and then let's let's not forget we talked at the very beginning like what would be a success metric here and something that I probably want to track in here is uh time in app so how long has this user been spending in the in the TikTok app because I probably need to aggregate that and I need to look at that and be able to look at that offline and so on so that's probably something I need to store and keep track of okay so let me see here is there is there what else am I thinking about oh yeah yeah so okay so let's get back to this thing that I mentioned uh foreign into the second point that I wanted to make which is the for you algo profile I'm going to call it so and and I'm gonna I should probably change the video one here as well for you I'll go features so what I'm thinking here so here's how my how I'm thinking about this and again I'm not an ml expert by any means I have limited sort of background and this kind of stuff but in order to have a machine Learning System be trained and figure out uh and be able to make decisions one of the uh one of the things that it needs to take in is a set of features and so features could be things like uh the the video I video ID itself the number of followers the entire following the the the entire network graph it could be it could be content preferences it could be language it could be I don't know yeah some some features features like this so things that are sort of specific to this user that help the algorithm figure out what types of videos to show to the user so that profile is something that I would want to store per user have the the uh for you generator reference this profile and utilize this this uh profile along against the algo features that were generated for a particular video and that could be length of video it could be content it could be location it could be language it could be a bunch of different features so that the this generator can essentially uh input the features from the user and based on that and based on all of the entire database of [Music] algorithmically feature identified videos make a determination as to what the best match is for videos to show to the user so that's a lot of hand waving but the idea is that this is essentially uh it's a neural network or some sort of other system uh and I keep referring to neural networks it's kind of you know 80s and 90s here but it's it's a machine learning mechanism that has been trained offline with all the videos that have been uploaded they've all been feature extracted for each of the videos and then those features are fed into the system to train it uh and then there's a there's a matching step where on the Fly we say Okay given this user profile what are the videos that that we should show that's still kind of black boxy still kind of hand wavy but I think the the key thing there is that this profile there's two things here that I want to think about first off I think the the and there's probably some sort of a I don't know if it's a database I don't really exactly know what the right right mechanism is is here but uh oh maybe I should put it on its side but anyway there's some sort of a uh ml database that's super super hand wavy super you know over overly simplified but there's there's that database but and the input into this generator is the database and all of the uh the the various uh features and so on but when making the determination the uh the for you generator just looks at this profile and all of the information the one thing that I think it this is me being naive I think that this will this process of updating the profiles and showing stuff that's relevant to the profile including what have you seen before etc etc I think that's a slow process by slow I mean it's not real time it doesn't get updated right away so if I for example see a video that I don't like or a follower that I don't want anymore and I don't want to see any more videos from a particular user as a user I expect that to take effect immediately I don't want to see any videos from that user I don't want to see anything like that immediately but if the you know this ml system is an offline asynchronous system that's churning churning churning so it's gonna it might take a while to catch up to to that so I probably would want a little bit of rule-based boundaries guard rails maybe I could would call it that would be like exclusion profile I'm going to call this it's super hand wavy too but exclusion profile might have in it like followers not to follow anymore follower IDs or user IDs or something like that and so in my in my generator my uh this could be ML and I could make do the exclusion filtering stuff out in this app service here or I could push that all the way down into this generator here but one way or another a combination of some machine learning stuff which is not going to be perfect this sometimes it's hard to reason about how did it figure out make a decision and some rule-based logic simple rule-based logic that is that is unambiguous would ultimately filter down the set of video URLs that the user would see yeah that makes sense to me so let's see I've I I've talked a lot about this and and again I haven't drilled down on this and I know that people are who are watching this who are ml or AI uh you know have have much more knowledge please I mean you're you're absolutely right if I've gotten this wrong and that's that's on me it's definitely not my expertise uh and I'm oversimplifying here greatly but I think ultimately there's got to be some mechanism that generates this stuff based on all of the attributes of this user including some sort of a profile so that's kind of just my the high level uh mechanism that I'm that I'm suggesting here Mark's explanation for the databases was well structured first going high level and then drilling down into the schema afterwards Mark did well talking about the algorithmic generator for the for you feed even though machine learning is outside his area of expertise if the system requires including things that you don't know much about don't just ignore them you can mention them in a simplified way while being upfront with the interviewer about the limits of your expertise as Mark was here let me stop here and just kind of ask you uh what other things are you thinking about or whatever things are you wondering about here with this design yeah well I'm wondering uh I'm wondering if there'd be any bottlenecks and yeah I'd also like to hear if if you have any any enhancements you could make yeah good questions Okay so I mean I one bottleneck which is not so much a so when I when I think of the term bottleneck I typically think about a performance bottleneck but I I think I already mentioned there could be just a possibly a storage bottleneck in terms of keeping all of the user data in one single table uh one one database just because of size that could be an issue don't don't know for sure how I might result think about that or think about solving that is I might have to partition the data for the largest regions or something which maybe isn't ideal but maybe I'd have to do something like that so that might be that but that's a different question than I think what uh what you're asking about and so I think a bottleneck an actual like a bottleneck in terms of performance that I can think of is This generator here that is generating on the Fly you know in near real time here when this app service is called every time the app sends it a message saying hey give me give me and what's next what's next that could be a bottleneck because it has to respond quickly and it has to run in basically in real time or it has to read ahead so so let me let me just make a quick detour here one of the one of the uh the components really cool features of TikTok and I think I mentioned it but I I I just I find it fascinating is that uh you're in the app and everything is there immediately there's no lag there's no hesitation there's no clicking there's no Spinning Wheel it's all just right there as you're scrolling in the app your videos are scrolling and they're starting to play as soon as they come into Focus how do you make that happen if you're you know sending a request to some server for a megabyte video uh well the answer is you don't uh I think the answer is that as The TikTok app is running and I think it's probably running in the background some it's actually constantly talking to the app surface and saying what are the 10 next videos or the 20 next videos or whatever it might be and so it's it's doing a what's called a read ahead so it's pre-computing basically the stuff so that it's ready to go and it can send it to back to it right away that's not perfect and and there's some issues with that but that's one way to make this this seem extremely alive and uh responsive application but that doesn't solve necessarily the bottleneck of the uh this generation here on how you might generate this list this this uh list of videos to recommend and so one thing I could think of doing is it over time when this app service and the generator start to maybe mature I could imagine moving this logic some of this logic into the app onto the popular platform so if we've got you know uh iOS Android Windows Mac and uh maybe maybe web then I could imagine that on the most popular platforms let's let's say iOS let's say iPhone and Android on just mobile devices that you move this generator code into the actual application sorry into the phone onto the phone so that it can take advantage of maybe even the GPU on the phone because if this is like a you know it's a math heavy or you know a kind of your phones are the smartphones are super powerful these days so you have a lot of compute power to use there and that could reduce the bottleneck of having these things sort of running out in the cloud so that could be one way to solve that so that that would be a bottleneck that I would be thinking about is this is this compute thing because it oversimplified I'm sure it's much more sophisticated complicated than I'm making making out to be so that's that's on the bottleneck side yeah that's an interesting answer not to mention the data bottleneck for the user data but the more interesting one was around the algorithmic for you generation and Mark's solution moving it from the cloud into the application on the device for mobile applications this can be a powerful option to offload compute from your distributed system and onto user's devices you you asked about enhancements so I mean I guess maybe that would be an enhancement too is to move it to move this move this logic to into the into the phone I think an enhancement that is probably more of a product enhancement or an application enhancement is that I could imagine a right now you get a 4u feed and it's it's customized to your likes and your preferences and I could imagine in order to test out new algorithms and test out new feeds and so on having an algorithm the the the the generator occasionally pepper the the list of videos with something that is outside of your uh outside of your uh domain or your to kind of give you a taste of something you don't want a lot of that because you don't want people to get a feel like okay I'm getting something that I don't like or I'm you know I have no idea where this is not somebody I'm following right you don't want that to be super super in your face but you could imagine peppering the results every now and then with a little bit of uh of something that is new and unusual uh you know sort of a a surprise unsolicited uh idea for a video or something like that and that would still want I would still want that to go through that filter that I was talking about those guard rails of saying well if this is from a follower that I explicitly said don't look at then you know I wouldn't want to see that but but if it's something that you know hasn't explicitly been uh disallowed then maybe it's something that you know you you get to see and uh uh is an interesting interesting sort of feature maybe broadens your perspective a little bit opens up your mind to some other other ideas you know maybe it's baking or something like that and you haven't watched a lot you know you're not interested in baking I I don't I don't really uh have have a good example here but that could be some that could be kind of more of a product enhancement to to think about and I think from a system perspective I think I mentioned earlier one enhancement I could imagine making is that if you if some of this architectures as this architecture stabilizes you could pick out certain components or aspects that are costly that are expensive uh so if the Dynamo turns out to be super expensive or RDS maybe you decide you know what I'm going to go ahead and pay the engineering costs and get my engineers to do our own hosting of a MySQL solution or or do it on a different platform or something like that just to kind of reduce costs because now it's worth it at the scale to to build our own solution so that could be something to think about but that's always a trade-off between the uh you know the the engineering cost and the the sort of service costs that you wind up paying for these kinds of solutions okay so okay that's [Music] that's kind of uh uh I I guess that's kind of where I would leave it at Mark's idea that you could pepper the four you feed with a different type of content there's more of a product enhancement than a system enhancement if you're interviewing for an engineering role you'd probably want to come up with a system enhancement if possible keep in mind the role and level that you're interviewing for as an individual contributor position more technical depth might be expected while if you're interviewing for say an engineering manager role you'll want to consider things such as cost trade-offs as Mark did here overall a great interview from Mark he showed some clear thinking and superb communication skills on what was a pretty tricky design yep let's wrap it up now yeah Jonah just have a quick look over your design are you are you happy with it do you feel that you you met the the objectives that we laid out at the beginning yeah so you're I think so because I think you asked about sort of the upload and and download uh stream uh or streaming aspects of it and we did some capacity there we have sort of a we have the databases we've got the rough idea of how that's going to flow through the system I've got a little bit of hand wavy stuff around uh the for you feed which is sort of the what we show so I feel like this is this is a decent start again I there's so many things and I I you know I imagine the real TikTok architecture is significantly more complex I mean people have spent engineering decades building this thing and it's an amazing app so uh there's no way I'm gonna reproduce that in in any amount of time so uh uh yeah kudos to those to those Engineers that's very good uh well I think I think you made a pretty good attempt uh let's finish the interview there and yeah thanks Mark uh you can relax how was it was it was it enjoyable that was fun hard difficult but fun it's a tricky one but uh yeah it was it was good fun uh watching you watching you do it and uh yeah I think hopefully people will have learned a lot I'm sure we'll get lots of comments on it so yeah Mark thanks very much and uh yeah I hope we'll see you again here sometime soon thank you thanks Tom hello I really hope you found that useful if you did you can like and subscribe and why not come visit us at IGotAnOffer.com there you can find more videos useful Frameworks and questions guys all completely free and you can also book expert feedback one-to-one with our coaches from Google meta Amazon Etc thank you and good luck with your interview foreign [Music]
💡 Tap the highlighted words to see definitions and examples
Ana Kelimeler (CEFR B2)
interests
B1The price paid for obtaining, or price received for providing, money or goods in a credit transaction, calculated as a fraction of the amount or value of what was borrowed.
Example:
"your uh domain or your interests to kind of give you a taste of something you don't want a lot of "
maintenance
B2Actions performed to keep some machine or system functioning or in service.
Example:
"Amazon provides because they take a whole bunch of main a mono or maintenance and operational load "
operational
B2Of or relating to operations, especially military operations.
Example:
"Amazon provides because they take a whole bunch of main a mono or maintenance and operational load "
comfortable
B2A stuffed or quilted coverlet for a bed; a comforter.
Example:
"important to be comfortable with the diagram drawing tool that you're using so that you can use it to help and enhance your explanation rather than having to struggle with it Mark also covered "
operation
B2The method by which a device performs its function.
Example:
"Amazon provides because they take a whole bunch of main a mono or maintenance and operational load "
downloads
B1A file transfer to the local computer.
Example:
"like you to focus on on a back-end distributed system that supports uploads and downloads"
cloudfront
B1A B1-level word commonly used in this context.
Example:
"cloudfront or something like this and what this does is it's basically a set of caches in uh near "
encodings
B1A B1-level word commonly used in this context.
Example:
"encodings depending on you know for a particular video so during the upload phase and possibly "
timestamp
B1A variable containing the date and time at which an event occurred, often included in a log to track the sequence of events.
Example:
"timestamp when was this thing created uh oh probably uh like a Creator ID who created this "
credential
B1(chiefly in the plural) documentary or electronic evidence that a person has certain status or privileges
Example:
"the typical stuff about users like a user ID some you know credentials uh you know login "
Kelime | CEFR | Tanım |
---|---|---|
interests | B1 | The price paid for obtaining, or price received for providing, money or goods in a credit transaction, calculated as a fraction of the amount or value of what was borrowed. |
maintenance | B2 | Actions performed to keep some machine or system functioning or in service. |
operational | B2 | Of or relating to operations, especially military operations. |
comfortable | B2 | A stuffed or quilted coverlet for a bed; a comforter. |
operation | B2 | The method by which a device performs its function. |
downloads | B1 | A file transfer to the local computer. |
cloudfront | B1 | A B1-level word commonly used in this context. |
encodings | B1 | A B1-level word commonly used in this context. |
timestamp | B1 | A variable containing the date and time at which an event occurred, often included in a log to track the sequence of events. |
credential | B1 | (chiefly in the plural) documentary or electronic evidence that a person has certain status or privileges |
Daha fazla YouTube dikte egzersizi mi istiyorsunuz? Ziyaret edin pratik merkezi.
Birden fazla dil çevirmek istiyor musunuz? Ziyaret edinWant to translate multiple languages at once? Visit our Çok Dilli Çevirmen.
Dikte için Dilbilgisi & Telaffuz İpuçları
Chunking
Anlamayı kolaylaştırmak için konuşmacının cümle gruplarından sonra duraklamasına dikkat edin.
Linking
Kelimeler birleşirken bağlantılara kulak verin.
Intonation
Önemli bilgileri vurgulamak için tonlamadaki değişiklikleri takip edin.
Video Zorluk Analizi & İstatistikler
İndirilebilir Dikte Kaynakları
Download Study Materials
Download these resources to practice offline. The transcript helps with reading comprehension, SRT subtitles work with video players, and the vocabulary list is perfect for flashcard apps.
Ready to practice?
Start your dictation practice now with this video and improve your English listening skills.