Support us

Episode 25 - Talking data with Oliver Graeser of Parcel Perfom – YouTube Dictation Transcript & Vocabulary

최고의 YouTube 받아쓰기 사이트 FluentDictation에 오신 것을 환영합니다. 인터랙티브 스크립트와 쉐도잉 도구로 C1 수준 영상 "Episode 25 - Talking data with Oliver Graeser of Parcel Perfom"을 마스터하세요. 우리는 영상을 받아쓰기와 발음 연습에 최적인 작은 세그먼트로 분할했습니다. 강조 표시된 스크립트를 읽고 핵심 어휘를 학습하며 듣기 실력을 향상하세요. 👉 받아쓰기 시작

수천 명의 학습자들과 함께 YouTube 받아쓰기 도구로 영어 듣기와 쓰기 실력을 향상하세요.

📺 Click to play this educational video. Best viewed with captions enabled for dictation practice.

인터랙티브 스크립트 & 하이라이트

1.perfect welcome to the pinpo Asia podcast today we have Oliver graser with us from parcel perform now we're going to what we're going to do today is to understand all things relating to data so Oliver is data expert and um off late there are all these terminologies right data Lake data data analytics even artificial intelligence how they relates to data so what we want to do is to dissect the whole data element right and ask all of your questions so we can position data so that anybody will be able to understand okay with that let me welcome Oliver to the pinia podcast welcome Oliver thank you for having me all right so let's start off with your right so I mean I noticed I'm I'm looking at your linked right so you've got a doctorate in in physics yes from CK right tell us how that came about oh I primarily so physics simply was my undergrad right and then I I wanted to see the world a bit and doing a PhD is a really like is the one easy way I guess you can get from Iran to Israel if you want to um because it's like it's a community is really there's not much about borders easy to get a Visa I looked around um I actually back then took a train ride all the way from Germany uh to Hong Kong then I plan to Singapore uh stopping along the way checking out universities um I met my adviser at CK uh he was very open to the idea which was a fairly unusual idea there um uh because I was for the entire time I was there I was the only grad student who is not Chinese in the faculty of science um yeah and uh I spent three amazing years there I wouldn't want to miss it for the world you kind to run through this real fast you took a train ride from Germany to Hong Kong yes how did you do that I guess the part from Kev to Moscow doesn't work that well anymore these days but yeah cologne Berlin Berlin Kiev Kiev Moscow Moscow UT um spent a few days at Lake Bal which is absolutely fascinating Place uh Beijing Beijing h h Hong Kong wow that's yeah bit of an back then a podcast in itself yeah okay what got you interested in data uh very bluntly speaking the first thing I did after graduating was like okay how can I make money in a city where basically everything is about commercial transactions right I'm a I'm a numerate person I'm quantitative person I'm not a person who understands much about money or Commerce so the first instinct was like okay who's who's using mathematical models right so quants Investment Banking risk analytics all these things that's where I got in then I've quickly noticed hey what they do is actually very similar to what we even learned in undergrad right so half of what I do today is trying to reduce entropy somewhere um that's what you spent two years studying uh in a physics undergrad degree um yeah and then I found it's it plays to much strength it's uh a fairly nice field of work maybe unlike Investment Banking from what I've experienced there when I tried to get into into that field and um it keeps my mind busy so it's still not bored after years and of course I mean the world has also pivoted quite towards uh AI machine learning these things so there have been always opportunities entropy what is that uh the degree of uh uness right so basically I mean what you do want to do in many use cases just like you want to let's say distinguish someone from who will buy a product from someone who will not buy a product dis a parcel that's will be laid from parcel that will not be laid um distinguish someone who will default from someone who will not beaing on loan right if you look at the any original population it's it's chaos right it's like you find all kinds in there if you can separate them um that is basically what what you want to achieve and the measure for this is this like we're getting into formulas then right it's it's entropy um like so the the less ordered something is the higher the entropy is so if you find ways if you find hypothesis that allow you to separate people from who default from those who don't default Parcels that will be late from those who won't be late can measure this entropy is going down right that's basically what you're what you're trying to do in in physics it's more about temperatures but the edness of like okay it's cold over there in the corner it's hot over here that means less entropy physics you generally if you don't touch something you always know the entropy is going to increase there's going to be more case but the whole math behind it all the stuff that is sometimes bit tough when you get into machine learning right uh it was tough when I was 22 and I had to take statistical mechanics from my undergrad right but it's something you picked up a long way okay uh let me try to understand that so from a physics perspective right entropy essentially says that the role is always going to be in a disorderly disorderly State basically says it will always from any state it's current any isolated system from any state it's right now will either stay completely it's completely chaotic it will stay this way or it will if it's not completely chaotic it will become more chaotic by itself there is no degree there's no way that an isolated system by itself can become more orderly right you have to basically you have to make effort so basically you have an air conditioner right so if you want you have everywhere at the same temperature now air conditioner you switch it on there gets cold here it gets hot that is more order you separate this that costs you energy which you probably notice at the end of the month on your electricity bill right so you need to put the energy in to reduce the chaos that's the physics side okay now let's tie this into what you do right par perform what does the company do so the company started basically just trying to give Merchant marketplaces away to um basically see what was happening with their shipments right where are they which of the carriers has it where problems occurring um we went from there into postp purchase experience so like updating you where your parcel is as an end consumer um but by now we're basically we're trying to say okay doing the One-Stop solution for any Merchant of Market place that wants to um handle everything from the post Pur experience the data experience what will happen so this is lot of what I'm working on how which Parcels will be late which Parcels will problem which products you can offer sensibly to Consumer who already bought from you um so we use lot of data in AI right but all the way to you want some you want to return something right like for the merchant to use the same solution to trigger the return and maybe to discuss with the or to offer the consumer something that they can at least keep the revenue so like instead of like just okay we give you a refund um here's a voucher we incentivize you in some way to keep the revenue with us okay let's go through a practical uh example so say say for example I um I use um Shopify yeah and then I and then I buy something I buy an iPhone okay and then after I buy it um the merchant that actually is in charge of shipping it they will somehow contact you and say I've shipped this parcel Shopify can they can contact us directly but with Shopify actually so we support Shop app um so basically Shop app directly on your phone they would send us a track number they wouldn't necessarily know which carrier they shipped it with honestly also because the merchants are not very good at telling Shopify the correct thing right so then it's up to my team to say okay which carrier is this with is this right now with DHL is it with FedEx like determining this from the tracking number we find the parcel there we pull uh uh the carrier do you have updates has something happened with this right um if we find an update we send it to shop app and Shop app will inform you hey your pass is not out for delivery which is from a Singapore perspective probably not a big thing but if you know the term porch piracy right in the US getting this timely update hey today your stuff is going to come now it's on your front porch is super super important right because otherwise you ordered an iPhone and then someone else has that iPhone so it's um it was when when you when I speak to clients in the US who emphasize this importance from a Singapore perspective this is quite alien I go two weeks on a holiday and I come back with all kind of e-commerce orders in front of my door nothing has been touched it's not going to happen in the United States I see so is it fair to think of um parcel perform as some sort of a app that links together all these different updates from different carriers through apis right that is the basic product right behind that is beyond that is then like you also want to know when it comes not only when I say it's out for delivery now ideally I should tell you 3 4 days before ideally I want to tell you this on the merchant side before you even order something because what are the criteria that you order something yes some cases it's purely the price in some cases you say hey only this Merchant actually has this thing right but in many cases if you buy a pair of airpods right it's like okay I I probably don't want to wait like even it's $2 cheaper I don't want to wait five days to get it who will get it to me first right this is the predictions that we basically offer to to the merchants hey you can tell us like you can use us to show on your um on your website when is the product going to reach the customer I see so say a customer buy something off Amazon or something right and then uh and then they then you're able to predict and say it's going to reach you in 5 days time so Amazon does the other way around we do it for small merch Journey Amazon does this themselves but they do it the other way around instead of predicting they control they basically say okay we told that that is going to come in two days now we're going to make that happen whatever way it takes I had cases where I still don't understand how that made sense for Amazon but I bought a bicycle trailer for my kids from Amazon SG which was shipped from Amazon us and I bought it and it was not they said it coming it's coming in 10 days and for 7 days it didn't even dispatch and I was like uh hope there's nothing wrong it was a really good price right offered free shipping yeah so so after that right so um let's talk about data then right so there are many complications and there are many elements in which this can go wrong right could you talk about some of the complications so what happens if a parcel doesn't go through or what happens if a customer has some questions that you're not able to answer well um if let's say what can happen is like so well we often talk about part of our own positioning is like 8020 right 80% of the paral journeys are happy Journeys everything works exactly as planned everyone everyone can handle those right um 20% is where something goes wrong and something can all all kind of stuff can right and one of the first things we did was actually um and that's unique to us no one else in the market does this uh so we integrate with more than a thousand carriers right each of these carriers sends us thousands of different types of events so we have multiple million of strings that a carrier basically says this is what just happened right so you want we we categorize these strings we actually have a we have our own ontology where we say okay here's 130 something different event types right uh which basically distinguish okay this actually is ready for pickup is it ready for pickup at your own home in a locker is it as a pass Locker somewhere in public is it at a community Locker because all those have different implications so when we get the message from a carrier the raw string we have a mapping at the back end that basically says this is this type of event and if it actually is like okay sorry but someone drove over your parcel with the forklift and you're not getting it we can we we actually have this as exception defined we can inform the consumer right are we the ones who going to do a refund or send a new item no that's the merchant uh uh job to do but they know right away what happens and this is the most important thing because if you look at the scale of operations right you always want to find the places that you need to intervene if you start try to look at everything like your customer service salary the customer service payroll would be unaffordable if you just say no no you you'll get it when you get it you won't have customers anymore so that's basically what we're looking at but it's not not so much of a dat it's more of a a business logic thing got it yeah now from a data standpoint right let's let's let's go straight down to the nitty stuff right data we we hear of all these um um terminologies right data lake is one of them right and then the data analytics right could you paint a picture as to Raw data and when it becomes information right what is the flow like so basically the the for us um we're kind of a the mass manufacturing equivalent of data because we Poll for like mill ions of parcels every day all the carriers repeatedly hey do you have an update um we use some so there basic data streaming Technologies this is purely on the part of the engineering team there's no no analytics or no Insight in there it's just like okay we need to process all these incoming events we need to categorize them we need to store them in our in our uh database um we use primarily our systems are based on Kafka and Flink those are like uh big uh queuing and stream processing uh Frameworks um that at that point it's just like okay making sure the application works any of our clients can look up what's going on with the pass we send up notifications all these things um that kind of if you so want that stops at the at the database um database in this case meaning like the transaction processing so something has just happened we know what it is please update the status of this pel and so on for analytics is BAS and the question okay how do you make sense or more how do you take this uh uh and and find ways to operationalize inside in this queue for the for the users so what we you don't want ideally not having a data scientist or data analyst mess with production systems so what we have what typically you have is a data warehouse we also have one but we primarily rely on our data Lake uh simply due to the scale where we basically all the data that we process uh gets stored um and we're basically having different projects for example as I mentioned before predictions right so when we our predictions generally um every of our customers gets their own dedicated models so basically we look at their data um we pull from the data we look like where do you ship from where you ship to what carriers do you use uh what metrics do we want to create around this we build our machine learning models this is all like without affecting production anyway and then we have when we have that model artifact built that would tell you hey for parcel at this stage in a journey it will with 85% probability reach you tomorrow that we hand back over to data engineering who basically puts this into the stream processing flow so whenever this parcel gets updated from the carrier side you can say okay does that change anything about our predictions and if we actually say hey oh yeah uh the carrier just like the parcel went into this warehouse and hasn't come out for 3 days and that should really not happening we need to update our prediction we need to update the consumer we need you need to update the merchants um because like what you don't want to do lot of carriers actually do unfortunately it's like sending you an update where says like your pass's coming tomorrow and now it's coming like the next day says now it's coming tomorrow now it's coming tomorrow right it's like we always like hey you're just pushing it out it don't get any information for us it's important to always really give a reliable estimate right and so we model it put it back operationalize it and make sure you get a very very timely update when actually something happens what's the difference right between say a data warehouse and a data Lake okay primarily it depends a little bit on on who I am if I'm someone who tries to use it or someone who tries to sell you one right uh but the fundamental would see the fundamental thing that has happened in the last 15 years was okay we have a latency versus bandwidth tradeoff so latency is how fast you get something fin or a small task finish with how fast you get something really really big processed right so traditional database you do something like you make a you transfer money you buy something online you want that transaction to be done within a second you don't want to wait like 3 minutes till your your Merchant says yes you actually bought this so that's important of latency that's databases um for analytics we always care more about bandwidth right it's okay if something uh more complex query runs 15 minutes or so but I don't want to restrict myself to looking at your data I want to look at like we have billions of events uh in our uh um in our system coming you in every month I want to look at all of them I want to actually say okay show me for all carriers in the last month what was their own their on Time Performance Based on their own initial promise right and I need to Crunch through millions of Records um the first step to do this was basically a data warehouse something that was more uh tailored towards you process a big chunk of data in in big batches um then I think you you you asked me before once about Hadoop right then came Technologies like Hadoop that says hey can we not use um can we not use like a single box that's always there always has the same capacity always cost the same money can we uh uh use cheaper boxes uh uh uh like off the shelf computers across which we distribute it all right that was the first step um now the data Lake concept is basically trying to separate this completely herey you have on the one hand you have super cheap storage Amazon S3 and our data Lake sits on S3 you get it also on blob storage equivalent Google whatever right and the data is just there and it's cheap right retaining terabytes of data does not cost much money um if I don't do anything right now like I mean Tech team is built based in Asia right so in the middle of night and Asia nothing's happening uh uh in terms of manual an big analytics I don't pay anything more than just storing if I had a data warehouse I'd still pay for the box or I either pay once up front to have my own box or I pay all the time yes this is my data warehouse right um if I then want to run a query if I want to do a big analysis um Amazon basically throws everything and the kitchen sink at the computation D so I have no problem with Athena or with other I mean this is like most vendors have something like this right uh to crish through several terabytes of data just say okay tell me like classic questions really for all my uh uh for all my car I'm looking at show me what was for each pass the first time they said this is when you get it and what was actually the on time how often did they actually meet this promise then I need to go through all passel I had need to look at every time I checked on the carrier side when they made the promise for the first time because it can vary widely and then I need to aggregate this and I get all the compute power that I need and I pay only for this one task right and that makes a lot more sense from generally it's more efficient but because if I don't need to compute Amazon can put it elsewhere it's it's a more efficient way for a cloud vender to operate but it also makes a lot of sense um for us that I don't I don't have to restrict myself right I never get a situation where I have five analysts trying to get something out of a red shift instance and like the instance like yeah sorry this I'm overloaded I want to I want to stop right we still keep data warehouses still still can keep smaller aggregations much faster so we still retain an instance primarily for our own clients to do their Bon right pre-aggregated cres um it has its use case but if you really want to know if you want to dig deep data lake is just a lot nicer I see so data warehouse irrespective of whether you're using the data or not you got to pay for it data warehouse is a box it's really it's a it's a server on the server runs a software right the um data lake is is Pur it's just storage it's just storage like the same way you like have funny cat pictures on uh uh imor right it's uh it's just like naked storage by itself there are a few other things so if you if I try to sell you one I will talk a lot about like oh you don't need to define a schema you can just dump your data there you don't even need to know what it is by uh uh by the time you store it you can Define this later when you try to analyze it which is like you turn uh you're basically turning kicking the the can down the road into a sales argument um I don't have a very high opinion of that approach but uh there's a few other such things but the general like completely separating compute having compute completely on demand having just storage there otherwise and having the cheap kind of storage not the attach this but the the network connected kind of storage is what makes uh uh it a lot more powerful like you get a lot more bang for The Bu for the type of heavy inside generation that we do I see and when someone says big data right do they essentially mean lots of dat you need to ask them that no one properly defined that ever like this was one of these so big data came up in the context of Hado and I think what people meant by then was like stuff that you don't that you can't really process on a single machine at the time was more buzz from like 15 years ago I I don't the other thing that was often associated with it was like because the only type of su would be lock message like like or if I track your mouse when you're browsing Netflix or stuff like that like that it was this kind of semistructured message where you actually need to pull like filter each message to see what's in is is the type of message I want to see um my opinion was a very poorly chosen word right because you you go into something with a use case you see someone has something that could qualify for Big Data you have a solution that can deal with big data you put both together and see it AB abolutely doesn't fit it was a huge Boom for the likes of Accenture and Co because they sold tons of Hado Big Data projects my opinion wasn't very impactful right I I it's actually is a term that has fallen out right nowadays you would lot more talk about document based data streaming data relational data and then you just quantify yeah okay so we have we stream log records about three terabytes per day right that that's a lot more it's not quite as snazzy but it makes very clear what problem you have what you want to do and there's much more refined tooling for it right so different kind of use cases you get the right tool and a bit more specific you touched upon this before right so Hadoop is it like an SQL thing for querying large amounts of data what exactly is it uh Hadoop is a technology to so SQL is um fa so SQL is actually really cool because it's one of the few declarative programming languages that you um that that that people come in touch with regular which means all you do is you define a problem you actually don't tell the computer what to do what you say is like hey I want data that follows these rules and what happens under the hood is something turns it into instructions how to give you the data Hadoop is one way of basically turning such a declarative problem statement into work that is they not run on a single machine but where basically lots of individual I see um computers get their own piece of work crunch this through send it back to like or or or share it and and then say okay let's let's put all our results together and condense it into what's needed sorry what does declarative mean declarative means like so the general primary paradig I think is declarative versus imperative right declarative means I tell you what the output should be find a way to do it right I Define the problem the computer finds the step to the solution imp ative means I tell the computer exactly what steps to follow so typical thing for database is actually you have you have a a query planner that actually goes looks at your Declaration of the problem what is the data that you want and turns it into steps and say okay do a full table scan of this table join it via a hash join to that table that that's imperative so but the the orders basically what steps to follow are coming from a program itself the user just does the this is the type by I want to see right um so SQL is a is the most common type of language where an end user uses this like declarative parium um I personally really like it was for me back then when I first learned about it like hey this makes actually a lot of sense right um but Hadoop is is just one of the many Technologies Hadoop was probably not the first one but the first widespread adopted one that had like allowed you to find a problem and it turned it into instruction to distribute this massively right um nowadays honestly don't see it used much anymore it's just like there's for the same kind of use case spark that solved a lot of problems that Hadoop had there but there are like I said we using on our data L we basically getting I don't have no idea what Amazon does when it distributes stuff on Athena right but no by and large it works and I can see exactly how much we're spending and it's uh it's very easy right so technology has moved on a bit right see I'm going to try and understand this right so basically so if I say there is a school in progress report data right on a sitting on a SQL database somewhere so if I were to go as a end user and say select star from the class 12 results and I get a table that's essentially an SQL statement querying one database right whereas a Hadoop is a system where that sort of a database it's on many different computers and it's not really a database in many cases it's just it's just imagine as a as a blank file right you basically say okay every computer has its own slice like a text file where it says like CSV file maybe right way it would honestly not don't know how too much about I do but with the data leg that's how it works you can just distribute here's a bit of a CSV file there's another there's another or you use a paret file there different structured files and then you just say okay what do I need let's say you have a filter like give me everyone born in 1979 all right then every computer can for itself say okay for the records that I have can I pull out those with from 1979 okay I can do this right send it back to Central note put them all together right and uh uh but this the fact that you don't have as a as a program or as a data analyst in particular no data analyst would be able to say to come up with an algorithm to distribute work right that's just something you're not trained to do this is fairly like fairly deep algorithms that's like usually the course algorithms data structures it's the course where com computer science students decide to leave if they're not finishing right like so this Hadoop is a technology basically abstracts this largely was the first technology that abstracted that largely away and left you just to Define what kind of data do I want to get back because all you have to know is like logically I have a lot of user records they all have this attribute that want to be filtered for this attribute got it relational database I guess we all know this is related to that right when when would one need a non-relational database um so a good question um because I personally don't need it too much but it's actually from my perspective I see it where I see it having the biggest impact is actually from a very odd point of view in terms of engineering empowerments right if you think about let's say If you think about financial institution all the entities they're dealing with uh Financial transactions payments they all there's actually not that many of them right I mean there's many transactions but there's not that many entities so putting them in relations is fairly simple now when we started seeing Web 2.0 which was like like Social Web Facebook Twitter and so on there was suddenly a lot more variety right and if you want to map you can still go and find a way to take all these objects that you're dealing with as a programmer object is it what they think of first typically to map them to into a database they are like it's called an orm object relational mapping um exists for but it's it's honestly it's fairly heavy lifting mentally right you you like making sure you create San structures in your database to cover all the I mean just think about what you post what what what things you have on Facebook right um it becomes very very complex already and the I think the the standard know SQL database is a document database where you just say okay look I have one document that in a typically a Json or XML format has information about you right it could just have a section where it lists all your runs other people don't run could have a section with the the uh uh that deal with your cats every time you want to go actually someone goes to your page all the information about you is it's easy to retrieve you don't need to retrieve a giant relational schema you just retrieve the document for you it's there it reduces so my experience has been it reduces the cognitive overhead on developer me mely and makes them a lot more independent because if they go and say okay uh know we want to support cat pictures in our app right adding cats to that document as a section okay cats you own as a section under you right super easy to add like they don't need any they don't need to speak to their database admin they don't need to be fine IND this cats now right and um that generally helps a lot with with more complex uh um data structures that are not static and there you would say like we're both people right but the thing that we would probably put into a document about us is vastly different right there a few things you probably I mean you you work in recruitment few things that goes on everyone CV right but the moment you go into private stuff like there is like so someone would say no of course I want to have all my music in my in my profile like back in my space that's what people put put there right and uh uh uh other people would say of course I want I'm in stra I want to have all these these these runs and so if you want to have the ability to easily add this it's a relational database makes it quite hard I see is every non-relational database a document database I I would not pretend that I know how many even are out there like document is one another one is a key value store where you basically is uh like red is is a very popular thing like to cach things quickly to basically say okay I need to I basically figured out okay this the section I want to display on the web page is about this person about this event or whatsoever I want to quickly just rece retrieve all that information Q Value Store there are not sure if you would even call an Vector database a relational database or not not sure how how they figured that out but there's lots of other types but I think the the one that's the easiest to understand uh that's graph database right which is yet another thing the one that's easiest to understand okay why do you need something else in SQL um is the document database because like just the visualization of this you can you can make it an every everyday problem right just try to write down for yourself how you would structure all your life information in tables and the number of tables if we just say Okay I want to write one document about me it's not easier got it you mentioned Json and XML right XML I understand it's a bunch of tags to describe whatever right what's the difference between a Json result and an XML result uh XML is for people who like backs slashes Json is for people who like cly brackets um so in the end it's the same I'm not sure if you can actually describe one something in one of them that can't be described in the other but in the end it it boils down to the same thing right so um it's it's a a very it's just a different syntactic way of providing a certain object notation XM came basically out of sgml towards the yeah I think towards the start of uh web2 uh Json JavaScript object notation does the same thing but it's syntactically very close to JavaScript and a person also find it a little bit easier to e to to read right but if you were someone who was very used to HTML you probably say the same thing about XML like this in day-to-day you could basically use both SQL we kind of know that no SQL so is everything that's not SQL nosql or how does that work yeah another word like big data that no one ever should have come up with yeah it's honestly no no this is one thing I I think they probably coined again when they said how they going to use document databases differently to relational databases but then now SQL quickly proliferated because there's other ways to to build data stores databases that don't use SQL and in the end it's it is something where it it really doesn't mean much anymore except that like no we're not talking about a relational database but by itself someone tells me every using no SQL database like you don't say I just like it's like like your how the weather is and and and you say there is no hurricane right now it's yeah that's um that's helpful good Kafka what's that uh Kafka is a streaming framework so basically say when you when you work in um uh any kind of it infrastructur two ways that primarily data get exchange one is an a restful API call where basically say hey I want this from you and then you give you give it back give me an answer back and the other way is just like look I want you always to do the same thing with those things that I give you here right so for example um all our events that come in so before we map them we figure out exactly what they mean so we basically say every event gets packaged actually as a Json message with this in with what we have retrieved and someone has to Crunch so we put it onto like onto Kafka onto Kafka topic right there a que where like we push everything in here at the other end it's a fling who pulls one after the other from there um it's uh like generally it's it's a fairly nice way in my opinion um of uh uh like also of as a mental model um when you process a lot of data you can for example you could just say hey how much backlog do we have right now you just look how much messages I might use um uh uh can make easy trigger to scale is all bit more tricky when you have a standard web service um and Kafka I would say is probably the predominant uh open- Source technology um for for like such a such a messaging frameworking framework good it now I want to move on to analytics right so we hear terms like Predictive Analytics right my question is what is the difference between between Predictive Analytics and AI okay AI is a third word that no one should ever have used right so um Predictive Analytics is primarily so analytics per se insights from data descriptive you describe what's there predictive you're basically say okay there I believe this is what's going to happen next which is typically solved as most cases it's being solved as a Super Wise machine learning problem right you basically say okay in the past I saw and like had a bunch of let's say we have hundreds of millions of parcels I can say which of them arrived after two days which of them arrived after 3 days I train a machine learning algorithm to determine what were the influencing factors it will make such a prediction AI look when I did my first they want a Udacity course in AI um and what it taught me was a star search which is uh the way that Google Maps tells you when you say I want to go from here to Jon what's the fast way that's a search no one would call this AI today and why do why do we have such a why is AI such a horrible word because no one actually ever came up with a proper definition of intelligence if you go back 100 years you'd probably say okay someone who can multiply and divide numbers very quickly it's very smart right by that definition the very first computer was smarter than all humans right Peter Norwick who's a big sh at Google he um he defined AI as basically okay everything that we don't understand how computer is currently doing it and what we currently seeing as AI being tossed around is largely around the interaction using human language because we never expected a computer to be able to do that so easily um what lots of people don't know so what we have right now the the primary AI think is generative AI either large language models so models where you basically can send language in you get an answer out or um like uh uh diffusion models that basically can create uh uh images or stuff or even videos from a description the letter I'm not really an expert on um so I can't exactly explain how they work the form however is quite simple actually What's happen under under the hood is pure Predictive Analytics because what llm does is it builds a very like it takes humongous Corpus of text and then it builds a machine learning model that says okay I look at all the text that happened in the past and I make a prediction what is going to be the next word and then I have a new when I look at all this together and I make predict another one now the algorithms of those they have evolved a lot um in the last 10 years and of course the results are absolutely amazing but what happens under the hood is predict a new token put all the thing back in my model put it next token this is why chpd 4.0 so when you see it hearing bit by bit on the screen this is not an animation this is actually how it works actually how slow it processes right so um I would say nowadays in terms of workwise Predictive Analytics people typically think about okay going put a number or a class or an event Based on tabular data right because this is something where they also often work themselves to figure out what features to use what Al Alm to use how the Jun algorithm so there's a different skill set whereas J llms right um the architectures of those has gone so humongous that you basically have like 50 companies on the globe who really have the money to toss it at these problems right uh foremost of course Google open AI Facebook uh uh with Fair uh being very prominent um and who have the resources to acquire the humongous Tex Corpus you need for this right and the way you use it is actually as a box where you just send a text in and you get new text out right so there is well I would still say lots of people working in gen they probably follow what has been built inside to just sometimes understand why are some things hallucinating more or less or stuff like that um but there's very few people who directly build something like that up from the ground like this is like it's a very very narrow discipline by now yeah here's my question right so say you say even Chad GPT right essentially I don't know how some Vector database or whatever it predicts what's going to come next right isn't the isn't that the same as Predictive Analytics then no it is you don't see it this way right I mean if you if you go to the website and you say hey CH give me a recipe for gingerbread you don't see that this is happening but this is like when for people who who a little bit understand where this is all coming from right so I mean we have the the big jump happened in 2015 when Google released tensorflow um back then you could still build some of smaller language models we had some back then when I was a job bu in production that we build all by ourselves um when you look at this you're like no there is never going to be anything like a generalized artificial general intelligence coming out of this right is a it's one tool the only thing that it that makes it so incredibly impressive is that we have always Associated like being able to answer with expert level knowledge on confidence on a verbal question we've always Associated that with high intelligence so therefore it appears to us as to be intelligence but in fact it's just Matrix modifcation i d I don't want to diminish this to the slightest right I mean the the amount of productivity gains uh that we got from there the amount of enablement we've seen is nothing short of amazing but let's keep in mind the moment when we suddenly had computers who could just do accounting and add up Columns of numbers right that was also amazing you don't want to go back to doing your accounting with someone actually going through the leder and and manually adding up all the digits right so it's like there is nothing that you would generally from a human purely human perspective terms of cognition consider intelligence is happening there wow fascinating yeah let a couple of questions before we finish one is where do you see all this going so I mean data AI what do you think future Technologies should be prepared for say someone doing their computer engineering right now what do you think is going to happen to their careers in the next two three years or well they they going to be just fine look a lot of people are saying oh CH can write code for me um so uh so so computer science students would uh need to worry but look what our Engineers for problems they're trying to crack it's not about code right it it's never about writing right code it's about the the architecture it's so we had recently we had discussion internally where we were worried about something uh M multi exact uh like something basically where your database runs out of a certain type of ID also ask BT that they have no clue you actually have to think about what could it mean would mean anyone who studies computer science will be fine right what's important is you need to stay on top of the game if you start doing any kind of work where you [Music] basically I started my career in Consulting right one thing that people liked me as a junior consultant on their project was I knew how to write SQL and you how to write VBA I could rip up a small Excel tool with a form and click and whatnot very very quickly right people paid good money for this I could have just like I mean you probably can still make this money today but at some point this easy stuff is going to run out right so you need to I think everyone needs to focus on okay like let's always use the new technologies to improve my own abilities but there is no there's no chance that someone who actually now cons gradates computer science at n us would find like he my job has been replaced with AI is not going to happen uh not in the next 10 years like I'm I'm very curious to see where the journey is going with AI um there are a few people very prominent people um with impeccable academic credential so history like yand who who basically came up with the underlying quets that still power much of what we're currently using uh who's like no a AGI will work differently haven't heard him saying actually how will work like like you the thing is we basically figured out one model with these large language models that gives us so many benefits that I don't see that much happening right now else I'm very curious um but we also have to realize the reason it's so successful is because the internet has provided an insane amount of training data to these models where would you get anything remote like that if it's not just text like where would you get something that actually helps about making decisions so yeah I I have a pretty big question mark uh there myself I see bunch of hurdles on the horizon because obviously when people saw how much money this and suddenly flowing around right everyone gets a bit jealous you could say okay like is it okay for any of these companies to use stack Overflow to use Reddit which is where they get a lot of their high quality trading data from now these companies who want to have their slice of that right I'm sure they all have lawyers who will in some shape or form try to risk to to to tell other companies you're not allowed to use this our IP right um GitHub right a lot of the code that's publicly visible if your code is my personal code for some pet project on GitHub should open AI be able allowed to use this there's going to be a lot of open questions where someone in the past just printed it gave us amazing results is this going to be quite that easy um I'm not sure honestly not sure so you don't think um you know like Elon Musk says right robots are going to kill humans or whatever they take to take over the world humanity is going to end I once had very very deep admiration for Elon Musk but he has been I think very maybe my brain is too weak uh to comprehend his genius but he has very done very few things in the last three four years where I would say they make any sense at all um so no no and I um look if I'm afraid of anything um it's the climate zones of the planet changing in a way that makes it uninhabitable in places where a lot of lot of people live because that that is that is truly dangerous machines like there is you can carry this very quickly into a philosophical discussion by asking you hey what makes you sure I'm not a machine wow I'm not 100% sure it could be and I could be too so no so you should know about yourself that you're not because you have your own Consciousness but I know for myself I have conf Consciousness um therefore I'm not a machine but I can't know this for sure about you because I all I do is like I observe you from the outside and therefore I can't be sure like I could at some point be a machine that behaves exactly like winds be yes absolutely um would that make that machine then in any form intelligent not necessarily because imitating like like just if we at some point have complete cameras around us for entire life and using this as training data to train a system to behave exactly like human yes that's entirely possible that would still not be intelligence right so and especially there is a I think for every human there are underlying motivating desires you can program those into a computer like optimize for something maybe at some point maybe leashing is a robot who's optimized for any money I don't know maybe muskus maybe he's running out of batteries um but I um like the the core thing that you say okay I have a self-interest that does not come from the outside not with anything we currently even are envisioning because as I said it's all matrix multiplication it's matrix multiplication plus a certain amount of random number generation if you go into your chat gbt interface there's something called temperature that changes the impact of the random number otherwise it's just like if you turn this to zero put the same thing 10 times in you get 10 times the same result which human would do this fascinating conversation uh Oliver what can our audience learn more about you about me uh jeez I'm still trying to figure some stuff out about myself not not sure there's that much more for other people to learn um yeah you didn't give me that question beforeand uh no it's like um May maybe the fact um going to be good end for this right but I'm one of the few people who remembers how you were standing outside of Chong center with the sign asking Shing to tell you how to make money so and I like those are the things like if if there's something about me I find such stuff always incredibly intriguing and memorable because there is just um the world is way too normal and uh I'm personally just uh uh always a huge fan of anything that makes it uh extraordinary in we in ways that I don't understand thank you very much Oliver great chat thank you

💡 Tap the highlighted words to see definitions and examples

핵심 어휘 (CEFR C1)

importantly

B2

(sentence adverb) Used to mark a statement as having importance.

Example:

"importantly how do you take this uh uh and and find ways to operationalize"

operational

B2

Of or relating to operations, especially military operations.

Example:

"importantly how do you take this uh uh and and find ways to operationalize"

processor

B1

A person or institution who processes things (foods, photos, applications, etc.).

Example:

"Kafka topic right there a que where like we push everything in here at the other end it's a fling processor who pulls one"

understanding

B2

To grasp a concept fully and thoroughly, especially (of words, statements, art, etc.) to be aware of the meaning of and (of people) to be aware of the intent of.

Example:

"understanding what will happen so this is lot of what I'm working on how which Parcels will be late which Parcels will"

background

B1

One's social heritage, or previous life; what one did in the past.

Example:

"background right so I mean I noticed I'm I'm looking at your linked right so you've got a doctorate in in physics yes"

adventure

B1

The encountering of risks; a bold undertaking, in which dangers are likely to be encountered, and the issue is staked upon unforeseen events; a daring feat.

Example:

"adventure back then a podcast in itself yeah okay what got you interested in"

originally

B1

As it was in the beginning.

Example:

"originally and someone has to Crunch so we put it onto like onto Kafka onto"

encounter

B1

A meeting, especially one that is unplanned or unexpected.

Example:

"encounter problem which products you can offer sensibly to Consumer who already"

afterwards

B1

(temporal location) At a later or succeeding time.

Example:

"afterwards I come back with all kind of e-commerce orders in front of my door nothing has been touched it's not going"

hierarchical

B2

Pertaining to a hierarchy.

Example:

"hierarchical distinguish okay this actually is ready for pickup is it ready for pickup at your own home in a locker"

더 많은 YouTube 받아쓰기 연습을 원하나요? 방문하세요 연습 허브.

여러 언어를 동시에 번역하고 싶으세요? 방문하세요Want to translate multiple languages at once? Visit our 다국어 번역기.

받아쓰기 문법 & 발음 팁

1

Chunking

이해를 돕기 위해 화자가 구 뒤에 멈추는 부분에 주목하세요.

2

Linking

단어가 이어질 때 연음에 귀 기울이세요.

3

Intonation

중요 정보를 강조하는 억양 변화를 살펴보세요.

영상 난이도 분석 & 통계

카테고리
people-&-blogs
CEFR 레벨
C1
재생 시간
2999
총 단어 수
9359
총 문장 수
480
평균 문장 길이
19 단어

받아쓰기 자료 다운로드

Download Study Materials

Download these resources to practice offline. The transcript helps with reading comprehension, SRT subtitles work with video players, and the vocabulary list is perfect for flashcard apps.

Ready to practice?

Start your dictation practice now with this video and improve your English listening skills.