Episode 3

Is MLOps just hard mode for Data Engineering?

Apr 18, 2023

About this episode

Welcome back to AI or Die, episode 3! In this episode, we cover:

  • GDPR violation of ChatGPT

  • Why companies shouldn't wait to implement AI

  • Emerging Tutorials for Documenting

  • Is MLOps mostly data engineering? (hot take)

Transcript

[0:00] We should kick this off by talking about the weather. Yeah, why not, why not? Our favorite subject. I mean, I'm just happy we're all feeling better. I know Rick and I at least were really down and out with a cough for the couple weeks. Still unconfirmed whether it was allergies - no, it was not code for sure, but it was not allergies.

[0:25] I'm still trying to chalk it up - maybe it's a really bad allergy bout. You never know. If Keyla got sick, I don't - I don't think allergies are contagious. I think that - yeah, we're all living in the pollen though, Brendan, that's true. As I say, I just started feeling sick like - it's only a night and I like wake up and I'm like coughing and my throat's messed up. So I started taking Flonase and now I'm better, so that's allergies, right? Flonase works - it's allergies.

[0:48] Yeah, we should have at least litmus test for that. Man, that was bad. I'm just happy to be feeling better. Mine was confirmed not allergies - sick as hell. So for the folks listening, we were planning to do all of this in person - the last episode when we're all in town - and because we were so sick, we just had to cancel and we had to move it to this time. But next time - next time, Brendan, you're in town - let's definitely do this one in person.

[1:17] Yeah, are you sure? You didn't even have a voice. No, I - I couldn't even like - I wanted to do it, couldn't even say anything that you guys would hear. So yeah, did not have a voice at all. Happy to have it back. Oh, silly. Anyway, no, roll better. Anyway, no, yeah, here we are.

[1:40] Intended for the human to support the machine, the machine to support the human, and working. An AI means something different to anybody you talk to, which is why this is AI or Die - Episode Three.

So just want to touch base with each of you guys now that we're on episode three. How we doing, Reagan? You know, how are you doing, what are you doing today, what are you looking forward to, what's on your mind?

[2:08] Probably really annoyed about all the stuff coming out. Yeah, but I do think it's some really interesting thought-provoking discussion - like philosophical discussions, societal discussions. I think this is just an interesting - I don't know - accelerator for some of those conversations.

I - I've been entertained. I - I don't feel - I know there's overhype. Yeah, there always is for stuff like this because people get really excited. But I just feel like there's a lot of really interesting dialogue happening and people who I thought would be on one side or on the other side, and it's - it's just an interesting time.

[2:51] It does feel like AI is a lot more public now. Just we've been doing this for a few years now, it just feels like it's really in the public light. There's so many more news stories daily coming out about this stuff too. So yeah.

[3:03] The timing is so crazy. I always talk about this to people - I'm like, "Yeah, we started this company over two years ago," so well before this like most recent hype. And before that, you know, Brendan and I were working at a model of like six years ago on the deployment problem. So it - it - it's just so funny. And then I was talking to someone yesterday too, and they're like, "Well yeah, it hasn't been around for that long," and I was like, "Well yeah, it actually has."

[3:41] You shared that article, Brendan, where Google searches for ML ops didn't really show up on the radar until 2019 or so. ML's been around as a discussion point for so long. Even topics like ML Ops are just a few years old.

[3:53] Yeah, definitely. I think it's - it's - it's great to be able to go now and talk about and they're like, "Well, our enterprise is actually doing AI," and then you can list off, you know, a large selection across verticals that are doing either initial piloting or they're doing very large at-scale number of models deployed. So it's pretty crazy just to see how quickly that has changed. And I'm looking forward to like giving the hype around the LLMs, ChatGPT, when that comes into the enterprise - how big that's going to be because it's already happening now. So like that splash is going to be huge.

[4:21] I agree. It is less abstract. And Brendan, how are you doing? What's on your mind? What are you doing today? What are you looking forward to? How are you?

[4:33] It's good, it's good. I started a new project today and I'm very excited about it because it's a lot of like taking a very complex data engineering process and then simplifying it and communicating it. And I'm just very excited to like learn some new tooling kind of configurations that they're using with like DBT and Databricks and Azure. So just like a new tool stack, as we always see in every client - they always have their different kind of mix and matches of that.

But I just realized how much I like taking something complicated and trying to simplify it. Like that's what I really enjoy about the work that we get to do - is taking these complex analytical emerging processes and then simplify them to the point that people can understand them. So that's what I enjoy. So I'm pretty excited about that project.

[5:13] The engineering - you know, once an engineer, always an engineer. You're always going to like unpacking this stuff. It's a good time, it's a good time. Agree, agree.

So today's episode, we really have a couple big topics we want to talk about. So one - I called this on the last episode - data breaches, especially with the use of ChatGPT. So there's been a few different data breaches either driven by the employees or accidentally at companies. So we're going to talk through a few of those.

There's also a super interesting article that you shared, Brendan, just around how ML Ops and its relation to ML engineer related to data engineering. So the hot take essentially is: "Is ML Ops just glorified data engineering in some way?" I think we should definitely talk about that.

[5:53] But then two, another topic is companies just feeling like they're not ready yet in terms of scaling some sort of knowledge, some sort of central repository around how they work with AI and the process of related AI as well. And just the patchwork that we're seeing of SharePoints, Confluence, PowerPoints all just coming together to make a bit of a Frankenstein for folks. And the analysis paralysis - or not feeling ready - along with just the tool mix confusing folks for adoption. It's just something that we had to talk about. It's so - it's so confusing and we're seeing it so consistently too.

So let's get into the first topic just for the next 20 minutes or so, guys. We'd like to unpack the data breaches. So it's interesting because Samsung had employees where they were submitting company information to ChatGPT to help them with data engineering problems - inherently private Samsung data leaked to ChatGPT is one. But then also even - itself ChatGPT also had its own data breach around customers' last four digits of their credit card information, their name, you know, personal information like that as well. And then of course, everything going on in Twitter-land with the switching of teams and a lot of stuff getting leaked at after folks were getting let go as well.

[7:04] What are you guys' reactions to it? Reagan, want to start with you? I mean, did we see this coming? Obviously we talked about on the last podcast episode, but to me, I think it's more rampant than I ever expected. What are your thoughts on this?

[7:10] Yeah, I think when I was digging into it a little bit, it looked like it was an issue with an open source library that was being utilized that they had patched and fixed. There was a certain - a very specific circumstance, at least explained by their CEO, of course - in which an end user would experience this particular breach.

It did expose - firstly, first name, last name, email, billing address, and last digit - last four digits of their credit card information that they used to actually sign up for GPT - ChatGPT and pay for it. It's - I - I think a little bit different than some of the other breach conversations that are happening.

[7:53] But this seem to be an isolated incident - not necessarily like the AI turning on everyone and having issues. It was more of a product problem. And I think this is such an important thing for us to distinguish as well when we're talking about this - is they are - the OpenAI has turned into a product company. They have switched to being as a for-profit, and they are now productizing a lot of this. And we've mentioned this on previous episodes multiple times, but there is a very big difference between the kind of R&D that happens on the back end on these large language models that they're iterating on and the product interface that is exposing this and allowing people to actually engage with these large language models.

[8:45] And so this, to me, seemed like more - that specific incident seemed more like a product issue than the R&D side of it. But, you know, I also didn't spend a ton of time digging into it. So I - I just - yeah, I think this is bound to happen.

I was actually digging more into the ban of ChatGPT because of GDPR concerns, which I think is actually more of an issue than this isolated incident. But those are just kind of like my high-level thoughts.

[9:21] Yeah, I'm interesting. And Brendan, what are your some of your thoughts on this as well?

[9:26] Yeah, I think it's a really interesting time. I was actually - I had a good conversation with two like product leaders here in Denver, and both of them are startup founders. And they were talking about the role of ChatGPT. So it's really cool to hear the two ends of the spectrum because the one was very cautious, right? She's in the mental health space and she's like, "There's too much - you need more controls around this stuff to be able to expose it really to the broad public."

And the other perspective that the guy was bringing is like, "This is like the heyday of the internet where it's super unregulated." Like there's a lot of risk, but there's a lot of reward of this lack of kind of putting it into a box and constraining it.

[10:02] So I think we're kind of in that period right now where we're figuring that out with AI of like how regulated, how controlled does this need to be? How much do individuals need to like wrap and contain ChatGPT or LLMs as they're using them? I'd say overall, I think the biggest piece is just in making sure people understand that this is an external system that you're sending data to. So is anything proprietary? You are sending that to an external system, right?

And I think that's the biggest piece that we've even seen like startup clients we work with putting out policies of don't use it and don't feed our data into it because you don't know where it's going. So I think that's going to be a really interesting piece of like how do we control this stuff? And in the beginning, it's like be cautious about what you're putting into it because it is going who knows where, right? Like it's out of your hands as soon as you send it to that endpoint. So yeah.

[10:44] Yeah, I was gonna say there's a lot more pressure on the user right now to be cautious, you know? Think about any of the kind of early technologies that we utilize - there, you know, there's a lot more push for the user to be educated and to understand what guard rails exist.

And so I think in reading through kind of that ban that Italy had on ChatGPT was kind of interesting, you know? They were pointing out very specific components in which they were uncomfortable or that ChatGPT was in violation of core fundamentals of GDPR.

[11:25] So an example was the right to be forgotten. That is a core component of GDPR that says if there is inaccurate information about me, I have the right to request that it be removed. And so how do you do that for a model that learns and incorporates and maybe doesn't have a mechanism to forget, you know? We - we talk about like, "Okay, do we - do we delete the data from the training set for the model and retrain the model completely every time a request comes in from someone?" That there's kind of an inaccurate component to that, or do we try to control the alignment part of the model and the human-in-the-loop element that kind of guides the model in different directions to fix or correct some of those errors?

[12:14] The other piece of this was utilizing data without being given explicit permission to utilize it. So it's another core breach or fundamental that Italy is claiming that ChatGPT is - is in conflict with. They also don't seem to have controls over minors utilizing the tool or any kind of age restrictions, which is another big or fundamental that that Italy had highlighted.

So it's just a very interesting element here. You know, we were talking to a lawyer in the AI space and she made a very good point - like there are going to be new rules and policies that are associated with AI, but there's a lot of existing rules and regulations that can be broken by AI that can also be broken by non-AI scenarios. And so we need to understand what the implications of that are and what new bounds need to be in place and what - and how to - how to react to that.

[13:25] So I just - I thought it was really interesting. I understand - I think the logic behind the ban - I don't know, I have no opinion of whether or not it was the right decision to make. But it is a pretty fascinating scenario.

[13:39] And just confirm - the ban is company and personal individual users? It's country-wide no matter whether it's large enterprise or individual, right?

I believe so, yeah. Which is two very different use cases because we're seeing it on a smaller scale - companies, you know, banning the use of ChatGPT within their company. But I just think it's so interesting how the country's saying, "Everybody, no - Italy data in, data out, nobody actually using it" instead of letting the users kind of pick on the individual level and then letting the companies pick at the individual level as well.

[14:10] Yeah, I think they're looking at it from more of a "you're in breach of GDPR and therefore you cannot do business in our country." You know, there - there's obviously some pretty strict regulations around that. So yeah.

I was just going to say to your prior point around it's really on the user now - whether in a country that where it's not banned - it's on the user to really check themselves and understand, "Am I putting my company at risk, or am I putting my own personal information at risk through using this tool or not?"

[14:30] It's so much burden to put on them. It's just hard to imagine how they can protect themselves. And then also, what's the risk to the employee if they do an "oopsie" - if they actually use ChatGPT and they accidentally leak data there? I haven't read anything about like what the employee has to do at the end of the day or what happens to them, but you can only imagine it can't be great, you know?

[14:53] Yeah, there's going to be a lot of cyber teams focused on this. We had a great conversation earlier today where we're collaborating with a - with a partner, Virus, who's kind of in the cyber education space - to try to create a foundations course in AI and some policies and best practices and just a general conceptual understanding for cyber teams to get a grasp on what's happening and some of the core concepts.

That - that we talked about was part of this was like data ownership and its use in - in the model. And then the other element of that is like how do we determine quality of outputs and results of the models producing for us? On the other side, how do we understand that and how do we interpret it? And you know, what are the - what are the differences between these black box models and these, you know, models that have a lot more transparency around how they came to a result or came to an output?

[15:38] So yeah, I do think cyber teams are going to have to get up to speed very quickly, start putting some policies in place, start doing some more educational campaigns for the companies to start getting the everyday employee and associate understanding what this is and what to do with it, what not to do with it, and start putting some guard rails in place at a company level.

[16:08] Yeah, Brendan, what are your thoughts on that? Especially for companies that are looking to navigate this?

[16:14] Yeah, I think there's a lot of interesting ramifications around security and AI. I know from our personal experience with me and Reagan, we're working with the big banks, the financial insurance - they had very strict policies already established in the IT and software development. And those were essentially what the data engineering and machine learning and data science functions needed to really adhere to and respect.

So we spent a lot of time, especially because we were working with Docker and a number of new technologies that we were bringing in, to make sure things were locked down and adherent to that like software development level. And then there's a whole another layer on top - like what Reagan mentioned - around the performance, the monitoring, a lot of other ramifications that come through when you start working with machine learning and AI.

[16:55] So I think there is a very large uphill for teams as they're leveraging AI, scaling AI, and trying to make sure they do that securely and also build like user-level trust. Just because it is a whole new different type of technology because it does learn and adapt over time compared to a traditional software algorithm or software system. So I think it's a very interesting problem that people are facing.

And I think there's two - "Who do I involve with my company to help me solve this problem?" Because of course there's legal folks involved, there's IE - obviously the advanced technical teams evolved too.

[17:23] Reagan, I love your point earlier around there is existing structure around issues like this that have happened in the past. So it's not completely brand new and specific to AI, and I think people may have that misconception. But would you guys mind expanding on that? Reagan, specifically around, you know, how this is not necessarily new? How maybe there is some lessons that we could take from prior software to have lessons learned, or prior just even legal internet boom stuff that came in the past?

[17:46] Well, I think it's more so looking at it from a lens of what rights and freedoms do we have as an individual inside of our country, and how is this potentially in breach of that? So like the example with Italy was that right to be forgotten in a very specific circumstance, or the ability to have - have the record corrected about you is fundamental for them and non-negotiable.

And so it's not that it's AI-specific, right? It's that AI can enable a scenario in which that is not native. And so I think we have to look at it from that lens as well - like what fundamentals are we in agreement on that if we're in breach of that, it's an issue?

[18:23] We keep hearing about - I mean, even in that - that Sam Altman, the CEO of OpenAI that Lex Friedman posted - they kept talking about this alignment problem with AI, which is silly because our company name is Illini, which we can talk about the fun connection there too. But like this big - yeah, this big alignment problem is, you know, how do we get this system to behave in the way that it was designed and intended from the developer and the and the designer?

And so that kind of understanding requires you to have a fundamental baseline of what your intention is to begin with, and that we agree upon that intention. That was the one thing that was like most fascinating to me. I think that's the conversation - that's why this conversation is so big. It's because we're saying that there are some fundamentals in how the system should behave, and that we agree on that, and that the system doesn't adhere to it.

[19:53] So problem number one is that we all agree on what alignment means, and yeah, good luck with that. Problem number two is that we have to ensure that the system is behaving according to that agreement of what we align on. And that seems to be, to me, almost like an easier problem to solve than the first one.

And so that's why it's turning into this larger kind of philosophical and societal discussion that a lot of people have a lot of opinions on because what is acceptable? And - and then again, how do we get the system to adhere to that?

[20:29] I think the second problem is where people are afraid that AI takes on a mind of its own. It's, you know, turned into AGI, which is artificial general intelligence, where it's able to do multiple tasks. There's a big debate on what that even means. I mean, you get down to the fundamental of what is intelligence? Who is intelligent?

I think one of the other massive issues because Andrew Ng got onto a lot - YouTube Live, I think it was like yesterday or the day before - with the chief data scientists at Meta, and they were talking about how the six-month pause or petition that was signed should not - is not going to be helpful. Like we should continue to do R&D in the AI space, and we should not slow down, and we should not, you know, stop the research part of it.

[21:10] I think they made a case that we might want to be more cautious about the productization component of it and the commercialization of it. We should not stop researching it. And I think what is super interesting and - and fundamentally disagreed on is what is this general intelligence? What is the risk to everyone else? And I think there's an assumption that intelligence can cause harm.

Well, there's a lot of really stupid people who have caused harm. So like why does something need to be ultra-intelligent to cause harm? So it's just - it's a very fascinating debate, you know, when you break it down to its - to its fundamentals. Some of it's a technical challenge that I think is really interesting - how do we get into the mind of these kind of black box models and try to predict what they're going to do? And then some of it's more of this like philosophical stance. And I think the fear may be with the intelligence - it's, you know, pain at scale.

[22:14] And Brendan, what I want to get your thoughts on the pause as well as just the technical versus the almost debate that we need to have around what to do about it?

[22:20] Yeah, and I think pausing is going to be hard to enact. I've heard some of this kind of in the philosophical like consciousness of like if artificial intelligence reaches consciousness, like what'd that look like? And then all the way down to the more practical of like what are the ramifications that this, you know, goes out of control, right?

So I think it would be very hard for us to pause it just given like all the capital already existing behind it, all the momentum behind it. Like realistically, it would be difficult to do a full halt. I think what might be a more palatable and, you know, realistic approaches to really invest heavily in the control research at the philosophical level and then as companies are bringing this in, implementing a lot of that governance and ethics because we're already kind of seeing that.

[23:06] Like this is such a common concern that there is a lot of focus here already, which I think is wonderful. Max, I don't know if this existed in the beginning of the internet age. So I think it's wonderful that we're having these conversations. And to Reagan's point, I think one of the interesting ramifications I've seen is like the AI alignment problem is aligning to the values, and a lot of that comes into like ethics and bias on a practical side.

When we talk to teams about this and using AI scale, and the Catch-22 there is that they need to be able to like have access to data that would show whether or not it was like biased. And I think this is one of the most interesting like practical ramifications in these conversations because the data science team that produces the model doesn't have access to that data because they can't use it as features.

[23:41] So going back to your initial question there, we would like - you know, how are you going to - like what do you need inside the organization? You basically need a third-party auditing system inside of the organization that has access to protected features but not building models to regulate the people who are building models. And so I think the more that we just start AI with this, you know, you have to have control and you have to have development, right? I think we'll be able to kind of find a middle path between between the two extremes of like all-out uncontrolled AI and then, you know, full stop on AI. I think there's a middle path where we can figure it out as we work through it.

[24:16] It's a really interesting point. Is there a future where there's a third party kind of government body going around and checking kind of the ethics and the bias that are built into models or not? And - and I - I bet there's teams out there who are trying to draft kind of what that looks like as a first kind of, you know, set in stone, you know, let's try and start here and then we can move forward with additional amendments to it.

[24:33] Yeah, we were just talking to the - the Responsible AI Institute, and they're - they're really hyper-focused on this. Like they're working with all of the official and unofficial kind of governing bodies that are thinking about this from a legal perspective but also from an ethical perspective and putting together a set of policies that - that companies can audit against.

The thing is, you just need a set of rules. You need a set of rules in which it's very clear when something is out of line and when something isn't. And we went through this with the data age - the data privacy age, right? We wanted to collect all the things about all the people on all the tools. We still do, and we want to capture it and we want to store it and we want to use it.

[25:10] And so, you know, there's - there were massive implications to that. People trying to understand what data do you have about me, you know? And that's this whole movement of GDPR about data and data privacy and data ownership. And I think it's a very similar revolution.

Which brings me to my kind of next point, which is kind of like where are companies realistically about, you know, on this - on this journey to AI? And, you know, where - where are they actually - where are they realistically, and what does this look like for them in the next five, ten years?

[26:00] Yeah, a lot of the teams I've talked to are just standing up initial like governance programs, and they're - and they're getting along in that. "Hey, I'm a governance program lead and maybe I have a focus on just educating the wider general population around how to use governance through in your day-to-day work." And then there's other much more centralized like "we are the less federated" kind of central governance team - a handful of folks who are just focused on defining that for our own company too.

Wondering what you guys are seeing - most mature or least mature, somewhere in between in terms of accessibility - are doing related to governance?

[26:30] Quick disclaimer - we have a little bit of a glitchy AI Nick today. Oh no, is it bad? Should I -

[26:53] No, it's pretty consistent. Yeah, maybe move the router.

[26:59] Yeah, I'm literally five feet from my router. I moved down here because it's right next to it. That's so unfortunate. Is it - I mean if your audio is coming across great, which is why I haven't stopped it. Okay, so for listeners - okay, for your listeners, this is great for you, but for anyone who's watching on video, I apologize. We can just do an image over me or something like that, or again, AI and like your video for you for this entire duration.

Well, you know, in - I think it was Saudi Arabia or Kuwait, there's a very first AI news reporter like rolled out live actually reporting the news now - fully AI-generated now, which I thought I could not see that. Anyway, sorry to describe it.

[27:30] Brendan, back over to you. Yeah, Brendan, back to you. Just in terms of governance teams and what - what you're seeing in reality - are teams are at in organizations?

[27:43] Yeah, I'm definitely seeing a lot more like operational governance inside of the tech stack. Like that seems to be a large - especially with these centralized architecture teams that are becoming more and more prevalent around data and AI - they are building in the framework to the tech stack, which I think is like very strong pattern - very good way to do it.

For teams that are starting out, I would just give the advice to like track everything - like do any form of lineage tracking, logging, whatever you can do to make sure you have a very clear, comprehensive view of the lineage. Because then you can at least like evolve your policies over time based on previous mistakes or like, you know, emerging issues, right, or emerging things. As long as you have that data trail behind you of metadata, much easier to kind of like build out those policies.

[28:22] And then we are seeing on the more mature end - like teams that are operating at scale with AI but also data - they have very operationalized - there's also like a cultural stance around this that we're seeing enacted around making sure that everybody's aware of this governance. There's, you know, teams dedicated to really keep being that governance framework up to date and making sure that evolves as new things are learned or as the, you know, organization faces new challenges.

So kind of seeing a wide spectrum here, but I do see a very strong like concern and therefore action around governance as teams are even initializing AI and data functions inside the organization.

[28:49] Are these teams pretty large, Brendan? Just as a quick follow-up to that - do you see like giant multi-dozen person teams, or do you see them as like under ten? Like what - what is the general size of teams that we're seeing?

[29:07] Yeah, I've typically seen it be more like federated across the business units or across like the spokes. And there's typically like less centralization there. Sometimes it is coordinated through like the center of excellence or like the hub and the hub-and-spoke model or some central entity. But I - I am not seeing - I haven't personally seen as much of like very large-scale governance teams. It seems like it's more like people focusing on governance in those spokes.

[29:30] Yep, yep, that makes sense. And Reagan, back over to you. What do you - what have you seen at companies? How far along are they? How big are the teams?

[29:37] Yeah, I think we did spectrum of companies. This is - honestly wild how some have just culturally figured it out and moved really quickly and how some are just really struggling.

So on one end, you've got companies that might not be as financially or competitively motivated to put their foot on the gas around this, or maybe unclear why or how some of these use cases would be accelerated through AI. But they - but all of them are focused on data. All of them are spending money on data.

[30:21] I think I saw - there was a couple of studies done in terms of budget at the enterprise level in this space, and I think right now it's about ten times the budget is spent on data initiatives - data management, governance, quality improvement, access controls - then AI initiatives or data science initiatives. So that's a huge difference.

Every big company I talk to, especially if they're in a rather complicated industry or regulated industry or not - they don't have a really strong history of leveraging and utilizing data - they're still working on centralizing data, they're still working on getting access to it, they're still dealing with a lot of quality issues.

[31:09] And so for them, these AI initiatives are like way, way in the future, or they're super isolated into very specific domains because that domain has it together on data and data quality and accessibility and availability. So, you know, there's that end.

And then there's the other end where you've got 60-plus data scientists that are building models - sometimes very simple models, sometimes very complicated models. And again, this is that build versus interface idea of like we're going to build the intelligence and the models internally and have our own research groups and - and iterate on that and implement that versus leveraging external models and interfacing with external models in a clever way that enables our organization to be competitive.

[31:54] Either way, they're both still trying to figure out how to move quickly and how to move along best maturity, you know, curve as - as fast as possible. And I think there's an inflection point where they have the right leadership in place, they have the right budget structure, they have the right cultural elements that they see significant acceleration. So it's like a very slow crawl up until this like kind of pivotal moment where they can start moving faster.

[32:17] And just you're describing that - it feels very elastic too, like a crawl of like two steps forward, one step back in terms of making mass advancements in the data science capabilities we have, but then one step back to "oh, we need to fix the data quality that's actually supporting a lot of these models and go and clean that up on the data engineering side." And then we can go and make further progress forward. So it does feel like a little bit of elasticity around, yeah, going ahead but then coming back and cleaning up, going ahead and coming back, cleaning up. It's not fully unidirectional.

[32:44] And I think that's the big miss that a lot of these companies are doing. They think they have to get this huge foundational piece in place before they can move forward. And to your point, Nick, like they can grab little pockets and they can start experimenting and they can start leveraging more advanced techniques in these pockets and showing the ROI and showing that it's feasible and it's worth the investment.

Like that is the right way to do it. It's not this massive like "once we have our data in the right structure or high quality" - like that just will never happen. I - I don't understand why companies are waiting for this moment where they feel ready to start moving.

[33:21] Yeah, Brendan, your thoughts on that?

[33:23] Yeah, I think it's the classic issue of hammer looking for a nail with new technology, right? It's like if we stay problem-oriented, we work up through initial like descriptive stuff, get the data cleaned up, and then start putting predictive on top - like that will be a slower but, you know, ultimately more aligned and more beneficial typically approach to building out some of these modeling approaches.

Obviously depending on the use case, but I'd say just for AI practitioners everywhere, we need to remember to focus on the problems so we don't get caught over-spending on the solution.

[34:00] Totally agree. I think that's why it's important to be use case focused and again, not do this bottom-up approach - like get all of our data centralized, get all of it high quality, fix all the problems, and then - it's just not obtainable. And it's also, to your point, not - not problem focused. Like we're not solving problems in the business. We're just making these massive incremental improvements to this like platform that we have, which nobody in the business actually gives a shit about realistically.

[34:32] It's inherently risk aversion, and I think that that's why a lot of companies keep this capability within an innovation team because they can have a budget set aside for the innovation side of things, and it just feels inherently less risky when it's not live in operations impacting a wide employee base. And I think making that flip over out of the innovation team into "this part of our operations that we live with" is a really, really hard kind of paralysis point that a lot of companies feel.

[35:01] And you need it in production to show the ROI. So like data innovation team also needs to cross that threshold, otherwise it's hard to come back and say, "What was the actual impact of this spend and this technology that we brought in?"

That's - I think where a lot of teams are struggling in the beginning part of that maturity curve. And I think it's really good time for us to double click into those companies saying, you know, "We're trying to do this today, but we don't feel ready yet." And a lot of times we're seeing kind of a smorgasbord of different tools that they're bringing together - you obviously leveraging SharePoint, PowerPoint, Confluence just to try and get all of their documentation around this new capability that they're trying to wrap their arms around and do it consistently.

[35:34] I want to talk to you guys and get your thoughts around what you're seeing in terms of companies just not feeling ready yet and what they're trying so far just to kind of formalize this function in their org. Reagan, start with you.

[35:46] Yeah, I think the biggest problem here is that there's a huge difference between designing a program or a capability inside of a company and rolling it out to the masses. Like I understand not wanting to fatigue people on change - that is super important. You don't want to waste a moment to get somebody to change their behavior or learn something new. It - people get super fatigued and they - they don't like change.

And so I get that part. Part I can't wrap my head around is why not iterate in small pockets here and design that program and try to figure out where the holes are? And you're never too early to start this. Like you could start this tomorrow.

[36:22] A lot of companies do - they, to your point, they start up a PowerPoint deck, they start up a SharePoint site, they start up a document, and they just start. "What do we need to do? What are our policies? What - let me reference this big, you know, framework that has been published. Let me research what other companies are doing in terms of policies." And like you can just start ideating and forming that and then poke holes in it and - and test it into smaller teams and pilot things.

So like there is no "it's too early to start" on that piece. Now the rollout and like getting everybody to like play along and put the, you know, big stamp of approval and "this is our golden" - like target for that - I get there should be some caution around that and there should be thoughtfulness around that. But you can't ever get there if you don't start building it.

[37:11] Yeah, pilot - that's the word of the day, folks. And although pilots - relationship with a partner in the business or somewhere in operations that is willing to go forward with you in this high-risk space and use of their time. I also have complete empathy for making that flip from pilot to wide scale. It's again a bit of a networking exercise to build partnerships with a lot of the folks in operations to trust you and want to work with you to go and move forward with bringing it into their team.

So I worked in a Team A - to be piloted with because you're very close and already very mature. My Team B may not be as mature and they're not - we're not at a point where we're ready to pick this up, and we don't know you guys as well as maybe Team A as well. So that's a bit of a conflict too.

[37:38] Brendan, want to get your thoughts on this?

[37:40] Yeah, to me it comes down to like when we make these investments in technology, we expect them to scale, and we need them to scale to really justify and to be competitive, right? Because that's usually what the goal is. We're back and working backwards from ten years from now, five years from now, when we need to go head-to-head with the new latest AI-native, tech-native stuff.

So from that perspective, I think starting early like Reagan mentioned is really important to be able to wrap your technical capabilities with the communication layer, the documentation layer, the knowledge management layer - you know, whichever term we use for that - but really getting those people and the processes around that technology.

[37:52] And just to share kind of like an insight from my recent work has been a lot of like working with this company around enabling this big data platform - is they're using the communication layers to really justify or validate the process and the design of the system that they're building. So working backwards from how users are going to come into your platform or your capability - it's also a really good way to increase the effectiveness of what you're investing in, right?

Because our goal is to have many, many models running on systems or many, many data pipelines, many dashboards running on a system. So if we design with that intent up front of having multiple people come into it, we can save a lot of time, save a lot of effort, and be much more successful with these investments.

[38:24] Yeah, one thing I want to note that I started thinking about this a lot because it's like you look at these well-funded tech companies in the data and AI space building platforms and technology - all of them have robust technical documentation, and they have implementation teams, and they have educational resources, and they are trying to optimize that end user experience to adopt their tool.

How is that any different than building a platform internally or building tools or products internally off of data that other people in your company are going to use? So - but we treat that so different. We're like, "Oh well, you know, I'm going to build this thing, I'm going to have no documentation around it, but I have no thought about how the end user is going to use this. I'm going to have no like -" there's - it's such an afterthought.

[40:04] It's always like there's this business opportunity, here's a project, we're going to build a model or we're going to download the dashboard and call it a day, and it's going to be managed and Jira, and then we're going to release it and we're going to check it and see if people are using it maybe a little bit later. But we're going to work on another thing now.

It's like where - where was all of the thoughts around making sure that those people - end users - know how to use it? They know the risks with using it? They understand like this - it's just - it's that important. You're developing solutions and products and tools internally, you're configuring all of these tools together in unique ways, you know, that other companies aren't. There's got to be some guide or playbook for people internally. And I think this is just like massive missing piece.

[40:50] And I think like it's so hard for engineers to focus on that, right? Because we are always focused on the solution and like the classic fallacy is "if we build it, they will come," right? Like this solves a really important problem. So to your point, we need to make sure that there's champions on the business side to like understand and like embark on this change, right?

And then on our engineering side, or on the people building up these capabilities and these platforms, it's thinking working backwards from where we want to see that scale, right? And that does require a little bit of like shift in mindset of like, "Hey, I'm building an internal product," right? I'm not just assembling a big technology stack together. I'm building a product that people are going to interact with to solve problems and to leverage the latest innovative technology.

[41:26] So you kind of need to work backward from that user perspective, be problem-oriented, and that can be kind of the opposite skill set of build out the latest greatest technology. And that's where I think a lot of teams struggle, but we are seeing more and more kind of focus there as they are trying to scale this out across the organization.

[41:42] It reminds me of my best friend who works in accounting, and he's super diligent and organized around the product he puts out because it's for a customer - it's for his company - but then he doesn't touch his own finances at all because it's like an at-home thing. And I think I draw parallel to these companies who when you're developing a customer-facing product, so much work goes into it. But it's like what for their own employees? It's like at home for them, and they don't think about putting as much diligence into just controlling it and thinking about it as a product that's rolled out to your internal employees. I just think that's so funny.

[42:06] Yeah, it is bizarre. And I - I think we're also drastically underestimating the workforce transformation that's happening right now and getting people to come along and getting people to understand the core fundamentals. I just - I - I know I harp on this a lot, but there's a huge difference between like "why am I doing this? what is this?" versus "how do I do this?"

And so many people are like, "Well, why can't you just lean on these tech docs or, you know, the tooling documentation or tooling training or certifications?" It's like great, you want to teach people how to do stuff all day long, but there will not be adoption of something new if they don't understand why they're doing it and the core concepts underneath it and how it applies to their day-to-day.

[42:43] You know, a lot of these people are getting certifications and tools - that's cool - but then you go back to your company and you're like, "Well, how is our company doing it?" You know, it is different. I - I - that's why there's so many consultants out there. That's why they make so much money. It's because it's so different. So they can go into a company and they can sell them millions of dollars of services to configure it to their company and get it to work.

And I - you know, it's just - I think we're drastically underestimating how hard that is.

[43:13] Yeah, and one thing I would say too - like to rely on technical documentation is legacy systems - like you're always gonna have legacy source systems that you're pulling from. They're going to create a lot of chaos in your tech stack. So this project right now - DBT, Databricks, cutting edge stuff - pulling out SAP and all the messes there. So like - and that's, you know, multiple systems stacked together, and there's a lot of different variations that can happen there.

Like to Reagan's point or on the configuration, so you can't rely on just general training or documentation. Like you really need to design and build the system and then communicate it out to release it to get that scale that you want.

[43:45] It's the biggest pain point I keep hearing is "we send folks out to Python training, we send folks out to Azure training," but the "what's in it for me" is missing in terms of what do I do next week in my job? And just the connection back to exactly how that links into my environment and everything else but the tool - everything else around the tool in terms of my day-to-day work and how I need to do an action differently as a habit Thursday morning when I go and do this task than another time.

That you - you can't inherently get from a massive open online course or from a standalone course like that. We keep saying "in the flow of work," we keep saying things like "micro learning" and "just the time learning" as well. It's all centered around trying to crack that nut of changing a person's habit and getting them to do a thing in their work very differently.

[44:19] I want to drop a little bit of an "Illiniism" in here just this episode. Yeah, and one more time I want to throw in there too is people hear "documentation" and they kind of puke in their mouth a little bit, right? Because it's like, "Why would I waste my time if no one's going to use it?"

And I've seen a emerging pattern which I'm super very happy about around creating tutorials, exercises. So instead of having pages and pages of your own like long-form texts like "here is step by step how to go through, here's a recipe that you can copy for your next project" - engineers like that. Like the people who are actually going to be using this stuff - that's much more aligned to them and is very practical because it also shakes out your design, shakes out your environment, make sure that it is actually scalable.

[44:57] So if you hear all that and you go "that's gross," just think about hands-on tutorials, exercises that get people up to speed because you can send that and save yourself a lot of time and -

[45:03] Some people - it's so bizarre too because like you usually when we talk to folks, they think we work with just like data scientists. And I get it because of the name, but I think one of the biggest - biggest misconceptions is this is one of the most multi-disciplinary like cross-function types of work that is happening inside of company today.

So you've got data coming from all of these different domains that adds more context, and you're building these solutions that can impact different parts of the business. And, you know, everybody - you got data coming in from all these places and then solutions going out to all these places. And so there's an immense amount of collaboration that needs to happen. There's tons of handoff points from all of these teams that don't report to the same person, don't fall into the same budget - makes it super complicated.

[45:46] And on top of that, you've got highly technical teams serving non-technical teams, and the translation is not just important - it is the most important. It is the most critical element of it. Otherwise, you're going to fall on your face when it comes to ROI, which we've seen over and over again.

So I think there - to that point - like sure, engineers love tech - tech documentation. They love reading research, you know, papers and - and long-form text and articles, and that's fine. But I'm telling you that that is not the case for everyone else. And so we've got to figure out a way to create multiple mediums of - of information that's relevant to the in person that gives them what they need when they need it and shows them how to interface with all of these different teams.

[46:29] I will say because I get the opportunity to work with so, so many teams, and across the board we collect our own data just around teams changing habits and really what they're relating with - to Brendan's point, the hands-on is the most overwhelmingly positive customer feedback we get is just "I loved how it felt real, I loved how it felt like it was something that I could do tomorrow."

And it's - it's - again, it's - it's very important because that makes it relevant, but it also just makes it much more collaborative. And again, we're building connections across teams that may have never worked together before, who may have different educational backgrounds. And a lot of this ends up being a lot of a networking exercise too - introducing each other, how do we hand off work to each other, how do we collaborate and build this thing?

[47:07] And just as you were talking about, Reagan, it does feel like these teams are building a rocket and it's about ready to launch, and there's so many different pieces and parts and people that come together to actually make it happen. It's - it's an event. It is a true transformation that is such at a scale that I don't think people have imagined.

[47:21] Yeah, I think the interesting element to this too is everyone's thinking about AI and this really small little lens, and it's so much bigger than that. When we talk about the capability areas inside of a company, we're talking about, you know, data literacy and fluency - like what is it, why is it important, where I find data, what does it mean? We're talking about enablement - how do I access it, build solutions on top of it, try to understand baselines of critical operations and elements of the company? And then building sophisticated models that help from the predictive perspective on driving more efficiency inside of the company, and interpretability of all that, and all the technical support for that, and all of the business use cases on top of that.

[47:59] I mean, it is so interconnected, except we talk about it in such different terms and lenses. Like, "Oh, why do you - why are you focused on this and that?" Like it's - it's so interesting how people have like isolated these concepts. "Well, I'm talking about ML Ops, so that's super specific." But okay, but what - how does what does that matter in the grand scheme of, you know, the company strategy or the business? It's just - it's really interesting to see how people are framing all of these concepts up.

[48:22] Dan, how the roles engaged - Brendan, want to see if you had thoughts on that? Because I know you - you see a ton of different role definitions. We've talked about this in prior episodes too, but wanted to get your thoughts around just how people are breaking up the units of work, as we've heard it called before, and how the handoff points are looking today.

[48:35] Yeah, it almost feels like there's such a variety, it's almost less of trying to define these roles and more trying to define these processes. I think sometimes people could just get hung up on like I self-identifying of where you fall in this broad spectrum of messy data to AI and production or messy data to dashboard - because there is so much that goes in that's it by extreme.

So I almost feel like today it's more about defining these rules and then making it clear and kind of how we interact in those processes and then using that to really define our roles - like the - the real processes that we're going to be doing rather than trying to point to arbitrary titles just given how ambiguous they are today.

[49:11] They're ambiguous, they're different at every company. Each - even if they're called the same thing in two different companies, they have different roles and responsibilities that they've listed out underneath. To yeah, it's such a variance today, but I keep saying Wild West.

[49:20] Yeah, it feels like it's gonna say - I'm curious to hear about this whole ML Ops data engineering, yeah, discussion.

[49:25] Yeah, can we take ten minutes just to touch on that? I think now's a good time to talk about kind of ML Ops. We've talked before in prior episodes around how data engineering is almost the bell of the ball this year. Like it seems like a lot of companies are focusing on modernizing their data engineering practices. But I really like this hot take, and I just want to spend ten minutes on it around ML Ops being mostly data engineering according to this article that we looked at. And - and Brendan, wanted to start with you around your reflection on that article.

[49:51] Yeah, I think it's great because it points to a really like tangible problem around ML Ops of you need data engineering systems to be able to be very effective in ML Ops. So when we worked with the big banks, a lot of times we're working with data teams that we're doing more modeling, so they're just like isolated data science teams. They didn't actually have a lot of access to like scalable production-grade data pipeline tools, and we struggled a lot because that wasn't available. So we had to go find it, we had to implement that, and then we had to add on them a lot of stuff on top, right?

[50:19] And so to this point of like 98% data engineering, I definitely resonate with that because once you designed the system for ML Ops and kind of work backwards, the way you implement it is a lot of data engineering, right? Because it's taking data from point A to point B and putting metrics on top of it that are really solving for the unique problems of ML Ops or machine learning in particular.

The - uh - counter-argument I would make against that is that there is so much that is unique about ML compared to a traditional data pipeline. So again, once you've done the design, the implementation is a lot of data engineering. But the design is a very tight collaboration between ML, data science. So depending on how you look at that, you know, 98% of the implementation may be data engineering, but a lot of that design and that 2% is really what's going to make or break your model in production.

[51:05] And just to kind of clarify some of the models for folks that might not be as deep into this problem as we are, but ML Ops is more on how do we deploy models in production, how do we monitor them, how do we manage them effectively? And data engineering is a lot of the running data pipelines underneath that. So a subset of ML Ops is a lot of data engineering work to just get this implemented, and then the other piece that they're out there too is I always want to call out like architecture is kind of a thing that's rolled into data engineering but probably should be split.

Because a lot of that ML Ops work is architecture. It's how do we deploy these models, how do we manage trained model artifacts, how do we do feature stores, stuff like this that are new and unique data engineering chunks? So they're not Python code, they are Docker plus this tool plus that tool plus image data and S3 bucket plus this plus that. So I just want to call that out that there is also a architecture represented in that 2% that is probably more than 2%.

[51:56] Yeah, Reagan, what's your thoughts on the 2% versus 98% as well as just what's all involved under ML Ops today?

[52:02] Yeah, I think - I feel is from a - from a modeling perspective, there's a - there's a portion of modeling where you do kind of feature generation where you create features that the model will learn off of and or predict against, right? So that means that we're taking kind of raw data and then we're generating features from that raw data.

And so the ML part of that is the design piece that - that Brendan mentioned - which is what features should we be generating and how do we generate them and how do we incorporate that into a model and build a kind of statistical technique that references these features and attributes?

[52:40] The data engineering piece of that makes it real. So how do we get all of the data together and manipulated in a way that represents these features that the model needs? And so the design piece, as Brendan mentioned, is like very ML forward. The data engineering piece is the implementation of a lot of that.

So I can be a data engineer, I can get a set of requirements from a data scientist of what needs to be generated, and I can put all of that together. I did not create the design for that, though. And I think that's kind of the unique element to this. It's not just this logical layer - like I started my career as a data engineer.

[53:20] The logic that I was working against was somewhat complicated about that. It's like kind of multi-step like logical instructions of data and, you know, pretty simple business rules - basically. And then how did I - how do I create a pipeline that efficiently moves and manipulates data to those business rules? Those business rules aren't so simple anymore. They're more complicated, and that kind of goes beyond the - the skill set of a typical data engineer.

So I agree in terms of the execution and implementation of it. The harder part is where those requirements came from and making that a reality and then testing for it and ensuring that that is what it is over time. So because when you're coming up with all the tests to ensure that that - you could monitor that and observe that that is behaving properly over time, and it won't break if I get new data that I didn't think existed before - like that takes some more thought in design as well.

[54:18] Yeah, and I think that's like the big epiphany I have as I work on ML Ops stuff now is just like these models learn and evolve. So you could also think of like ML Ops as data engineering on hard mode because it's - you need to design the system to be very dynamic. If the camera using your changes for an image data - like an image model or image-based model - or if you are like there's a drastic change in weather or something like that that impacts the data you're working with - like that happens in data engineering, you need to be responsive, and that's why data ops is such an important thing.

But ML Ops just adds in a whole another layer of complexity that you want to account for because you are working with machine learning models that learn and evolve over time.

[54:55] And just thinking through the amount of quality that you need to focus on in data engineering that we're hearing from teams, and as we mentioned before, the relationship with the business to do not only technical stewardship but also business stewardship around the quality of that data - is this not only, you know, accurate, and is it not only complete, but also when you're talking to the businesses, it actually makes sense in the context of what they've seen in terms of the past trends in their area too?

[55:17] That was good. Do we see ever - and just the softball question - do we ever see teams transitioning from data engineering into data ops into ML engineering and ML Ops? Or are those two very distinct kind of career paths? See ML engineering and ML Ops versus data engineers that we see?

[55:31] I think I've seen a lot of data engineers do the ML engineering and ML Ops work, especially the more like data architect type folks that are higher up - I'd say in like the skill set - are familiar with the wide variety of tools. Yeah, and we work with one guy who's really strong in data architecture, and he's worked a lot with ML, so he can work very fluently - like very easily - across like a traditional BI stack versus a ML/AI stack.

So I think we are seeing more and more like data engineers cross-skilling into the ML Ops and data ops skill sets, but it definitely does require a little bit more like experience working in these systems.

[55:58] Yeah, I think it's - it's dangerous enough to know enough to interface with it appropriately. Like that is - I always harp on the fact that I'm not a data scientist, although I'd get misclassified as that often. I don't build models. I know enough about it to interface with it in the appropriate way from an architecture perspective, from a data engineering perspective, from a business perspective.

I could probably spend more of my time, you know, diving deep into some of the - the core elements of being able to build models, and it's getting easier because more tools are providing more features and - and tools for individuals to build models. I just - I think it's this point of intersection that is really important. And, you know, if somebody knows enough, they can - they can do their function around it and enable data scientists to actually get their work into production systems and providing value for the company.

[56:39] And I think there's a lot of those points of intersection. You know, there's product managers that know enough to be able to understand the system design. There's architectures that know enough. Data engineers that know enough. Business folks that know enough. Business leaders, P&L owners, you know - like there's all of these other roles that if you know enough, you know, you can - you can interface with that type of work.

[56:57] I mean, back to - largest - we brought up companies are having - you can have these little pockets of knowledge where these different roles know enough, but how do they know how to engage with each other and how much the other person knows around product management? For example, if a data scientist has kind of product management feather that they put in their hat, how do they know that they can go speak to a product manager in this way and share the same terminology too?

[57:15] Yeah, I think that's why centralization is so important too, because otherwise you have ten people solving the same problem ten different ways, and you have to support ten different solutions to the same problem. So I think that is why if you ever need to justify some form of centralization or communities of practice or center of excellence - whatever that looks like in your organization - just remember how annoying it's going to be to undo that down the road because eventually you will need to have one standard way that you manage it. Otherwise, you can't keep up with the governance frameworks, you can't keep up with the quality frameworks - like you need to have that centralization to some degree in order to be successful.

[57:47] Yeah, you won't have a single tip of the spear. You'll have multiple kind of diff - at different maturities, which is can be messy too.

This is good, guys. I know we're coming up on time, so I want to see if you had any additional points on the ML Ops and the data engineering aspect of it before we go and wrap up for today. This is really good.

[58:04] Nope, just see a huge - huge wave of people trying to figure out this operational element inside of companies. I think we're seeing "let's figure out, you know, models that might work, let's get them into production systems providing value to the business." And I think the next wave we'll see is "let's create better use cases." I think we'll see enough where it's interesting, then we're gonna double down on getting better at designing systems and finding the right use cases and getting better, you know, better quality in place. Fill that hopper up and fix the quality too.

[58:33] Brendan, what excites you as you look ahead?

[58:35] Yeah, I think the thing that excites me the most is data engineers crossing the threshold into ML. Like "ML Ops is data engineering on hard mode" is probably the most insightful thing I said because it's true. And I think if you're looking for a deeper challenge - which I know a lot of data engineers are - ML Ops is a great realm to dig into because there is so many new and novel and interesting problems that are coming out of that space and so much fun to work on. So that would be my closing thoughts there as I get your data engineer looking to go up to the next challenge - ML engineering and ML Ops are a great way to go.

[59:03] Well, could be. All right, well, thank you guys. Listeners, this is AI or Die. We discussed the news, the trends in the great debate of AI. You can listen or die by going to our website illiniAI.com, and you can subscribe or die if you don't step on any of the streaming services out there. As always, we had a blast on this one - episode three in the books. Besides ones coming up, so this will be up soon. And thank you guys, have a good rest of your day. Thank you.