For a hands-on learning experience to develop Agentic AI applications, join our Agentic AI Bootcamp today. Early Bird Discount

podcast Robin Sutara on Responsible AI, Governance, Diversity, and People Behind Data

Robin Sutara on Responsible AI, Governance, Diversity, and People Behind Data

In this episode, we talk to Robin Sutara, Chief Data Strategy Officer at Databricks. From Apache helicopters in the U.S. Army to leading global data strategy, Robin brings a truly unique perspective. She shares sharp insights on enterprise AI adoption, regulation’s impact on innovation, and why responsible AI starts with people. Robin unpacks the real-world use of RAG, data platforms, and and the crucial role of diversity and workforce enablement in the AI era.

About Speaker

From repairing Apache helicopters near the Korean DMZ to the corporate battlefield, Robin has demonstrated success in navigating the high stress, and sometimes combative, complexities of data-led transformations. She has consulted with hundreds of organisations on data strategy, data culture, and building diverse data teams. Robin has had an eclectic career path in technical and business functions with more than two decades in tech companies, including Microsoft and Databricks. She also has achieved multiple academic accomplishments from her juris doctorate to a masters in law to engineering leadership. From her first technical role as an entry-level consumer support engineer to her current role in the C-Suite, Robin supports creating an inclusive workplace and is currently Chair of Women in Data US, as well as an advisor for several early startup organizations. She was also recognized in 2023 as a Top 20 Women in Data and Tech, DataIQ 100 Most Influential People in Data, and WomenTech Speaker of the Year Finalist.

Transcript

Full Episode Transcript

Raja Iqbal:

Hello, everyone, and welcome to Future of Data and AI. I’m your host, Rajagopal. My guest today is Robin Sutara:. Robin is the chief data strategy officer at Databricks. Robin has previously worked in key roles like chief data officer for Microsoft UK and Chief Operating officer of Azure Data Engineering. It is my pleasure to have Robin on the show.

Welcome to the show, Robin.

Robin Sutara:

Thank you Raja. So, so glad to be here.

Raja Iqbal:

Yeah. So so I was looking at your background, you started your career as a technician for Apache Helicopters and now you are in data and the role of the Databricks. So tell us about that journey. I mean, this is, this is I would not say it is unusual but interesting. Right? So, I mean, from Apache Helicopters to Databricks and in a, in a very important key role.

So tell us about the journey.

Robin Sutara:

Yeah, I think, it maybe isn’t unusual these days for people to come from differing backgrounds into data. I think, I think I am a bit eclectic in that I didn’t start as a traditional sort of undergraduate Stem background, with the caveat that I actually did start university, for studying computer engineering. So at the time, but at the time, I think, you know, I was really random number generator, chipsets was really what people were focused on.

I mean, I always teased, I sort of coded. My first coding project was an Ada in Fortran. So, I don’t even know that people use those languages anymore. So I really did think I’ve always had sort of this interest in technology. And that’s interesting and sort of that background. Unfortunately, due to a lack of funding, I had to figure out an alternative route, to, to sort of enter into the career field.

And so, after two years of school, I, I ran out of funding for my university, and I ended up enlisting in the US Army, which is where, based on, testing, they sort of place you into roles. And I was fortunate to land and at the time, with the Apache, aged 64, helicopters doing the electrical and weapons systems, and so it was very much, still gravitating toward my passion around technology.

But how was I going to be able to turn this into a career? And so I was really just turning sort of screwdrivers, right. Loading. Hellfire missiles. Loading. The 50 millimeter machine guns were stationed in Korea on the DMZ for a while. And so I really, just sort of focused on how was I going to serve my time in the military so that I could use my GI Bill and eventually go back to school for technology?

Interestingly, at the time, though, while I was stationed in Fort Campbell, Kentucky, Microsoft came out with this amazing product called Excel, which I’m sure everybody at some point has used that as their database of choice. But because it was still relatively really novel and new, we were trying to figure out, particularly at my duty station, how are we going to use things like Excel to be able to track things like maintenance records, Apache helicopter parts to be able to deliver?

And because it was a computer sort of typing job, they thought it fell to the girl. So I think now they all sort of regret making that decision. But it was, it was wonderful for me to have the opportunity to figure out sort of that was my first, I think, footstep into data and just say, oh, wow, we can really start to think about how do we optimize our processes, how do we make sure we have the right parts of the right place at the right time, to be able to do these repairs as efficiently as possible?

And so I told everybody, when I got out of the Army, I was going to go work for Microsoft. And they just really never thought that that would happen. So so when I got out, I went to night school and did my while. I did computer hardware repair during the day. And then I was super fortunate to get an opportunity to interview for Microsoft to come in and do AI, E5 support.

Windows three one. And so I got hired into Microsoft in the late 90s. And then I had a fabulous 20 plus year career in various sort of, roles, starting in technology and moving back and forth, I think, between business and technical, sort of sort of roles, as you

mentioned, sort of my last two roles were, those that were really trying to help Microsoft think about their internal transformation and how were they going to use data and AI to be able to deliver on that?

Those transformational goals that Satya Nadella came in and had for the company? And so I had the opportunity to be, chief operating officer for Azure Data Engineering, which is the group that owns everything from SQL server on prem, up and to the point of at the time, up until the point of visualization, so was all the databases, the warehouse, ingestion tools, governance, the purview, etc..

So very, very exciting times as they were looking to do exponential growth. How could I help them drive the business to be more data driven, as opposed to just being conversational and their decision making? Could we really sort of ground those decisions in data and then based on that role, then, I got asked to move to London, in, in the United Kingdom and serve as the chief data officer, where my role was sort of half internal facing in on helping the organization be more data driven, focus on data and AI and the capabilities that the platform had to help the company, operate internally and externally.

How could we, you know, represent that to the organization and get feedback into the product group on what customers were trying to do? And then, as you mentioned, last two and a half years now with Databricks, in this role and as, the field CTO, which essentially means I get to travel the world and advise organizations on, how to get to the, you know, how to partner with Databricks as they think about using our data platform as their foundation for their data and AI transformations that they’re looking to do internally.

How do they think about not just the technology, but the people, the process, the organizational design, operating models, etc.? So can I bring the 20 plus years experience and Microsoft bring some best practices from the 12,000 customers at Databricks, to really help our customers be successful?

Raja Iqbal:

That is great. Thank you for the overview, Robin. So, I mean, as you were describing it to me, I had this I was almost thinking like, you know, that tagline for your career from Excel to Data Lakehouse, right? I mean, if you look at look at this, you started with Excel. That was your first job, and now you are, you’re advising people how to scale, right?

Beyond those million rows. I think that was it.

Robin Sutara:

I always tell people I’ve gone full circle now. Apache to Apache. Right. So I’ve got an Apache helicopter.

Raja Iqbal:

That’s very interesting, but I take it back to you. I say I love that, right. So that’s that’s also a good point. Yeah. So, so you work with, as a chief data officer for Microsoft UK, and then now you’re, you’re working in the US. So yeah, I’ve been on both sides of the ocean. Right. Correct. So what similarities and dissimilarities and anything that comes up as a result of that in terms of, enterprise adoption of data in the, in Europe versus the United States, right.

Raja Iqbal:

You know, I think regulations are a biggest difference.

Robin Sutara:

Yeah.

Raja Iqbal:

And maybe, maybe any other thing that you see in terms of enterprise adoption of, AI or for that matter, you know, data and any challenges that, come. But yeah. Journey.

Robin Sutara:

I think that’s a great question. I actually think, again, I’m because my experience is relatively limited to, you know, developed, developed countries as opposed to developing. I would probably say my point of view is, is May is maybe narrow. And that and that aspect, because if I think about the UK and many of the, you, countries that exist as well as, you know, sort of Canada and the US, Mexico, are my primary areas where I’ve worked in, for many of them, they are a developed country.

And so the problems that they’re facing are actually relatively similar. So so you brought up regulatory requirements, etc.. Right. If I think about the Air Act and GDPR and sort of all of those things and, and I look at the US and think about, well, we don’t have the equivalent of the EU Air Act, right?

We do have state legislation that emulates similar sort of requirement and, regulatory requirements at this point.

Raja Iqbal:

CPA you’re referring to DPA.

Robin Sutara:

Delaware has a version I think even right now, you’ll see, there are multiple states like California. Delaware tends to lead, in this space. The states have created some level of AI regulation waiting for federal legislation or regulation to come into place. And so while we say they’re, you know, they are different countries and there’s different sort of cultural, you know, expectations and backgrounds in organizations, which I would say is the biggest differences.

Right. British, organizations, are definitely different culturally than American organizations, which are different than Canadian does them, you know, Mexican, organizations and companies that we work with. And so if I let you in between Germany and the UK, right. Like, there is a difference, I think with the people when you think about a transformation.

But when I think about the technical requirements, for many of those countries it’s very similar. Right. So the regulatory requirements may be slightly different, but for the most part they’re all talking about, you know, explainability, transparency, in the, in lineage, you know, the impact it’s having on consumers. And can you be able to articulate that all the way from the data sets that you use to the models, to the algorithms within the models, the weightings, etc., to the data products and services and things like GDPR, CcpA at the consumer wants to be forgotten.

Do you have the technical capabilities to be able to deliver against that expectation, the consumer expectation or citizen patient expectation, whatever it might be? And so that tends to be very, very similar in developed countries, regardless of where you reside. But I do find for most organizations, the biggest difference that they’re struggling with are the expectations of their employees, the expectations of their consumers, etc. in, in the data and AI products and services that they’re delivering for them.

So let me maybe give you an example. So when I moved from the US with Microsoft, from the US to the UK, it was actually during the course of the pandemic and it was one week after Brexit. So. So I landed in the UK in London, immediately had to go into lockdown. There was no, and there was very little products on the shelf at the time because most of the lorries were being stopped at the border because they hadn’t figured out that whole EU, UK, movement of the supply chain.

And so it’s super sort of fascinating to think about, okay, how much information, despite the fact that I now live in a UK, in the UK where I have much broader protection as a consumer, how much personal information or health information was I willing to give up to NHS or Tesco as a retailer to get things like my groceries delivered, because there was such a shortage of supply?

So I think, you know, examples like that sort of show us that, yes, there are regulatory requirements, but situational requirements may, may create this environment where you’re willing to rethink, what what information you’re willing to disclose, what you’re willing that information to be used for. And now that we’ve come out of the pandemic, you’re seeing sort of maybe a re hardening.

I think of some of those GDPR requirements particularly being enforced out of Europe, less so in the US. I think as we as we continue to go through, you know, regulatory decision making with the new, with a new presidential, cabinet and members that exist today, there is still a little bit of uncertainty for us.

And so I think it’s always interesting to watch those global dynamics and what you’re willing to opt into or out of and how it impacts and organizational decision on how they’re going to use data or AI to deliver.

Raja Iqbal:

In that sense. Because so, I mean, I have a technical glitch. Just give me one moment. Yeah.

Raja Iqbal:

I mean, my apologies. Sincere apologies. Here. Let me see. Can you see me now? Okay. Sounds good. Okay. So? So. Yeah. So the enterprises, let’s actually continue the discussion here. So there’s the regulatory environment. Don’t you think that, the difference in the regulatory environment that can make a difference in terms of how adoption happens, because EU tends to be very, conservative when it comes to, you know, how to govern AI and data.

Do you think, that imposes some barrier on how enterprises are going to use AI?

Robin Sutara:

For some organizations, it can slow the pace of innovation. But to be honest with you, over the last 18 months, I think since then, you know, sort of this, generative AI, sort of hype, that has happened. Most organizations are actually trying to figure out practical applications of how to leverage AI. Right.

In some capacity, whether it’s internal facing, process optimization and, you know, employee empowerment, whatever they’re looking to deliver, even in the EU. Or maybe they’re, they’re less likely to do a customer facing application because of those risks or regulatory requirements. It doesn’t mean I think that they’re slowing down the pace of innovation. I think they’re just rethinking, the application of of those AI capabilities.

And so how do they do it more internally where they can minimize their risks, etc.? But but I think, to be honest with you, I think with, you know, the that GDPR has been around for a significant period of time. EU, I act was communicated, I think relatively early. And so for many organizations until that I think they see litigation around the EU, AI act and how it’s actually being enforced.

For many of them, I don’t see it actually slowing down. Their innovation, I do think they are being cautious on making sure that they have the right components in place. They have the right reporting in place. You know, they have the right capabilities. They should they be questioned or should the regulator come back to ask about a specific application or AI execution implementation that they’re doing across their environment?

I think I think they are making sure that they’re taking that into account. But I don’t see actually a difference in the work that I have done between, you know, the US and and the EU or the UK in sort of their pace of innovation or their want or drive or desire to leverage AI to innovate, just as quickly as the other side of the pond.

And so I don’t see it being a prohibitive, I don’t see it being prohibitive for organizations to actually execute against it. In fact, most U.S organizations are slower than what I see in the UK or they or the EU, because there’s still a lot of uncertainty in the US on what is the regulatory requirement going to be.

And particularly in regulated industries like financial services, health care, you know, public sector government, where there’s higher levels of scrutiny, scrutiny, there’s there’s maybe almost a slower pace of innovation, there for those customer facing applications that we saw 18 months ago in the UK or the EU. But but I don’t see there being a big difference between organizations, how they’re executing, what they’re executing, or their pace of innovation, to be honest.

Raja Iqbal:

So that’s a that’s a that’s a very interesting viewpoint, because usually, so what I hear from you and please correct me if I’m interpreting it correctly. What I hear from you is that, the absence of regulation is actually slowing things down as opposed to it is accelerating. You know, because sometimes, I mean, we hear, we hear about, like, EU is, they’re creating a lot of regulations, but, you know, and which is, slowing down what, how innovation can happen because, in the way I, what I hear from you is slightly different, right?

So because they are clear on what regulations are there, the and that that’s actually allowing them to adopt, especially in, more regulated industries like healthcare and finance.

Robin Sutara:

Yeah. And broad strokes. Yes. Right. You will always have there will always be a difference between organizations, their appetite for risk versus innovation. I think, every company is trying to decide what that balance is and what is the right thing, for them to do what it you know what? How much risk are they willing to sort of take on?

Databricks has done some phenomenal work, actually. We have a field. So that has put together an entire, security framework that takes into account 68 attributes of AI, that organizations should think about and come to agreement across legal compliance. The business it, etc.. And how much of that right sort of establishing what is that risk versus innovation balance that they’re willing to take on.

In an effort to actually leverage and execute against their AI strategies? And so, as much as I’m making sort of a broad statement, I would say every organization has to determine for themselves how much risk appetite they’re willing to take on to allow for a piece of innovation, because it is a give or take, right.

And making sure that they’re providing it. But I do I do find that for most EU organizations, because they have those standards that they’re going to be held accountable to, and the monetary, right implication of not complying could be relatively significant. That almost leaves a base, you know, a basis for them to be able to leverage.

Whereas in the U.S, in the US, we have an executive order. The executive order is no longer, you know, in effect, there’s just there’s a lot of uncertainty. And so for many of them, unless they have a state legislation or regulation that they’re trying to comply with based on their business operations within that state boundaries, for most of those organizations, it is a lot more of, how do how do we balance innovation and the pace of innovation while minimizing our risk of our exposure for whatever the legislation will be, at the time that it comes out?

But again, for most of them, if you think if you look at the legislation, regardless of whether it’s state, or in the case of the EU or the UK version of the EU, AI act, for most of them, the fundamentals of it are the same. Right there. There has to be traceability, explainability, you know, transparency.

Like there’s just fundamental things that are required of whatever they are. And so for those organizations that have a little bit more of a, risk appetite to innovate, they’re still executing with those foundations in place to protect themselves as, as regulation and legislation starts to get decided, in, in, in, outside of the EU. And so, I don’t know that I see either side of the pond completely slowing down on the pace of innovation.

If anything, I think the pace of innovation in the technology space is now creating an environment where organizations, can leverage technology in better ways to be able to execute, whatever pace of innovation that they want to, to, execute.

Raja Iqbal:

I so, so, I mean, is it is it safe to, say that, more than regulatory environment, it is the industry, perhaps the culture of the company, the company size, the risk appetite. That is, that dictates, the, the how quickly they adopt as opposed to.

Robin Sutara:

All right. And then probably the one other factor I would add in there would be, you know, the culture of the organization, like you said, the, the company themselves, because I had worked with some organizations who have created some amazing innovation, but they can’t get the business users to actually use that data product or service or a AI, that they, that they’ve created within the organization.

And for many of them, it’s because they forgot about the people and they forgot about the change management that would be required on how to bring the organization along, on the evolution of the innovation that they’re trying to execute, execute. So, yeah, so so it’s a complex I think,

It’s a complex.

Formula. And I would love to just tell you in broad terms, but I don’t see any particularly country or region or area of the world, at least in my travels and interactions that are moving significantly slower, than anybody else.

Raja Iqbal:

Okay. That’s, that’s great. And thanks for elaborating on this. In terms of, when you, when you look at Databricks, it is interesting that, Databricks started with the as a machine learning company. Yeah. And then, for a long time, people almost forgot that they are, they are they started as a machine learning company, and they became, you know, the data platform.

Right? So the data platform of choice, you know, significant player in that space. And then now, going full circle. Right. So now again, I so tell us, I mean, how is data bricks actually adopting, adapting to this. Right. So because they really are seen as a data platform at the moment. Right? So at least that’s how you look at it.

I mean, maybe there are others who look at it as a ML company, but yeah, I mean, for a long time they have been the data platform.

Robin Sutara:

Yeah. I can always tell, always when I walk into an organization, depending on what team is using Databricks platform on on how long they’ve been with us. Right. As a, as a customer. So as you mentioned, I mean, the company was founded 11 years ago, five PhDs out of UC Berkeley. The creators of spark. So really, they were trying to solve the big data and ML sort of issues that organizations were struggling with.

I think early on, they realized that it wasn’t just unstructured data in the lake that they needed to be concerned with. It was also the structured data. And so they went from that. How how do we solve this big data ML problem to how do we now help customers have a data platform that allows them to bridge the gap between their structured and unstructured data.

And that was the creation of the Lakehouse. Right. Eight years ago, and being able to, I think, evolve since then that we are the data platform of choice for many, many organizations. And so it’s been interesting that if you walk into a company primarily used on the data science, they probably been with us since the beginning.

Gather data. And if if it’s the data engineering team that’s the strongest group using the Databricks platform, they tend to be the Lakehouse sort of an era, of users that came into the company and today. Right. If I think about it, we have actually thought about how do we continue to evolve the platform leveraging the technical capabilities that exist today that didn’t exist when the company started 11 years ago.

And so a lot of that has been, how do we apply AI in generative AI, and genetic systems, etc. within the platform? So there’s sort of two problems that we’re looking to solve. About we did the acquisition of mosaic ML about two years ago. Was that announcement, and we really had to rethink, like what is a data platform of the future look like?

It’s no longer sort of this, separation between AI and BI and sort of engineering versus data science, etc.. And so while we had broken the, the some of those barriers with the lakehouse, I think what we’re thinking about now is a data intelligence platform. And there’s two factors to that. I think. One is how do we help companies think about the AI that they’re looking to build?

Right. And how do we make sure that they can do that on their structured and unstructured data. So builds on the Lakehouse foundations. But how do we think about the pace of technology that’s happening in AI now. So the top model of today isn’t necessarily going to be the top model of tomorrow. And so for many organizations it was can we create a platform that allows you to do in the in data science in a way that allows you to do things like MLOps or Lem ops or swamp models in and out, depending on, you know, whether what had the best return on your investment based on the sort of solution that you’re

trying to solve, or how do we do things like compound or genetic systems, like how can the platform support that natively building it in? Because, again, the intent at Databricks has always been, how do we help companies minimize the amount of data copies, that they have to create? How do we help them break down those silos? So it’s no, it doesn’t do any organization any good if they constantly are having to copy data in and out to be able to create a new model or a new, AI solution or a new data product to be able to deliver.

And so I think foundationally, we have thought about how do we build on the lakehouse and help companies with their own AI goals, objectives and missions. And how do we also leverage AI, within our platform? So, you know, when I started with the company two and a half years ago is very much about how do we make data bricks simple.

How do we make sure it stays an open platform so that we’re not, you know, locking organizational data into our platform, that they can move it as they need it or be able to use it with other tools and, and solutions that are built on on open systems. How do we make sure that we’re doing that cost effectively?

Now? It’s a lot of how do we make the platform smarter? How do we actually leverage AI inside of the platform? How do we disrupt ourselves? You might be right. Our CEO essentially stood up, you know, to a year and a half ago in ChatGPT and, and large language models were really sort of at the precipice and said, if we had to recreate Databricks, how would we disrupt ourselves?

How would we do things differently? How would we rethink how we actually built some of these products? And so it’s been a phenomenal pace of innovation for Databricks over the last 18 months to really think about how would we have done things differently, right. How

how do we make sure that we’re creating a platform that’s bridging the gap between your structured data, your unstructured data with some level of governance and control so that you have full end to end lineage and explainability and enforcement and policy, etc..

Across not just structured data, but also your AI asset. So your notebooks, your book models, etc.. And so for us, it’s really been about how do we disrupt in that space. And so how do we leverage AI in the platform to do things like understand business semantics so that we understand, right. So for example, a Databricks employees are called tricksters.

So if I type in right into I into a natural language interface trickster, I’m not looking for, you know, a Lego. I’m looking for a data right employee. And how do so how how do we, you know, leveraging the platform to be able to understand that what does revenue mean? What is our fiscal year? What is a quarter.

How do we define, you know, amea or America is etc.. And the other thing is, how do you actually leverage the platform to make it more cost effective? How do we make sure that we’re helping organizations? One of the biggest issues continues to be just infrastructure

management, right. And turning clusters or servers on and off, or setting up workspaces or optimizing it or prioritizing jobs or optimizing your queries, etc..

So how do we leverage AI in the platform to do automate some of those things so that organizations can focus their talent, the limited talent that exist, right. How do they make sure that they’re focusing them on the right thing, which is solving problems for the business? It’s not managing infrastructure. And so I think you’ll continue to see us innovate and disrupt in that space.

How do we help companies create AI, and then how do we use AI in the platform so that it’s intelligent about the organization, about, sort of the things that matter?

Raja Iqbal:

Yeah. So so let me expand on the, this example, Brixton for example. Right. So and now they can be, company specific. Jargon. They can be company specific terminology and for that matter, intellectual, companies, intellectual property, they’re, proprietary data that is sitting inside the platform. And hopefully, hopefully OpenAI and other models, they don’t have access to it.

And then they have not, so that that, knowledge is not built into your, whether it is a closed source or open source models right now. And that’s why rag, rag or as we call it, retrieval, augmented generation, those kind of, approaches exist. You’ll fine tune models, and sometimes you build, domain specific and a very specific to your company and your own custom models.

So does Databricks. And for that matter, let’s even before I go to Databricks, I mean, so for your businesses, I mean, do they, do you see the appetite for, you know, rank being the platform of choice, a choice out of people? Are enterprises fine tuning their, models, or are they building their custom models?

What do you see? Actually, how are they dealing with their own proprietary data when it comes to building application?

Robin Sutara:

Yeah. So maybe just one maybe, correction. So based on an organization’s usage of the platform and leveraging the metadata that goes into Unity Catalog, we do know those

things. Granted, it’s just within that organization in that workspace. But because of that, we are able to extrapolate what do we think this table does? Well, you know, can we start to use AI to do things like rolling column tagging like that?

Right. But again, all of that, like you said, is an organization’s intellectual property. And so it exists within their their instance of unity catalog within their workspace, so that we can leverage the metadata in that way. And the platform is able to then expose that to them in a way that allows them to actually apply what that is is an example, though, of, you know, essentially what we call compound systems, right?

I don’t think for most organizations, I do think, right at the very beginning, everybody thought, oh, we’re going to be able to leverage these proprietary models, and we’re just going to do prompt engineering. So how do we create everybody to be a prompt engineer? And then they realize, oh, wait, that’s really expensive to leverage. A thing like OpenAI for every use case and not they don’t necessarily have the domain knowledge of our organization or they don’t necessarily have.

We don’t want to give access to our intellectual property. Right. That feeds potentially back into the model or. Right. We also, I think the example of Samsung is maybe one of the most famous where the, the employees, you know, we’re really trying to do the right thing for the company, put trade secrets into ChatGPT and essentially now exposed that outside of their organization.

And so I think they’re still, again, talking about that risk versus innovation. I still think there is a level of, risk aversion to leveraging these big open source models. Even if you’re doing a ragged implementation against those open source models. I do think for many organizations, there’s still this question of how much control do we have and how much are we exposing things that are really, really are intellectual property and are essentially, you know, the foundation of our company and why we exist and what makes us unique or valuable compared to our competitors.

And so for most organizations, it is some level of compound, AI systems that they’re creating. They’re leveraging, open AI or, you know, proprietary models to be able to solve some part of the problem they’re creating their own models to be able to you’re right. And actually doing, net new creation and models via open source capabilities or even things that they’re building in-house to be able to deliver.

Can they execute a piece of it with a small language model or lever drag against a piece of it? Then the question becomes, how do you leverage the platform to tie all those pieces together so that you’re actually then able to deliver some value proposition? So if I think of examples like, insurance, is a great example, like how do you actually process claims?

So it’s it’s a very simple, you’re really not a simple you. Right. But it’s very much comprised of very similar sort of problems that they’re trying to solve. Can you do some, some level of document extraction and can you do some level of, you know, OCR capabilities to be able to read handwritten notes from the AI insurance adjuster in the field?

How do you tie that together with, you know, some level of computer vision or unstructured data, the, pictures that get uploaded. How do you now actually validate that those pictures are not deep fakes and that they’re actually legitimate? Right. I think we saw this rise of insurance companies dealing with false claims with AI generated, you know, pictures of traffic accidents that they were paying out on.

Like, there’s that. So I think that’s a great example of insurance companies have actually started thinking about how do we do things like fraud and solve for those cases. And it’s a compound system. It’s leveraging OpenAI for the right component of that. They’re creating some level of capability internally, leveraging drag on top of that, and then being able to pull those together into, end to end compound system, to be able to leverage, a capabilities to automate some of it.

And then the human in the loop at the end to validate, etc.. But it’s literally saving hundreds of thousands of hours, you know, claims adjusters to be able to process that volume of information.

Raja Iqbal:

Yeah. And I heard, this, agent AI and multi-agent systems, a few times in the conversation so far. So does, Databricks. Do you do, is Databricks more around, you know, bringing in the current existing ecosystem? So, you know, you have frameworks like Lang Lam indexed, and the most notable ones.

And there are others too. So are you have something your own frameworks for multi-agent collaboration or you have probably the platform has it built in. I’m just curious. Right. So because I work on the technical side of it, we have, bootcamp as well, right? We teach people. I’m just curious. I mean, how how is it? I was Databricks approaching it.

Robin Sutara:

Yeah. So we have frameworks that are built into the platform to be able to deliver again, some of that. But again, like I mentioned, it’s an open system. So if you want to bring chain and you can go I think there’s lots of organizations who have some level of capabilities that they’ve already built or been able to deliver.

And so our intent has always been can the platform supports an open ecosystem so that organizations can bring in the right capabilities that they need to be able to do, and support those? But we are thinking about how do we help organizations automate some of that? How much can we create as a result? There’ll be some big, exciting announcements that come out at our summit in June of this year in San Francisco.

So if you can’t attend in person, wrote highly recommend that you join us virtually. Because there will be I think this will be a big space of announcements, of innovation, of what the teams have been working on over the past year.

Raja Iqbal:

And in terms of, open source versus, closed source. So, you know, you can you can use OpenAI AI. You can use, lama or any of the open source models. So where do you see the future is, do you think it is going to be open source or is it going to be closed source or something else?

Robin Sutara:

I think, and that sort of ties back, I think, to the regulatory legislative requirements that we’re going to see. I do, I do think there will be some level of proprietary models that solve very niche sort of problems or issues as part of a broader compound or agenda ecosystem. But I think lots of organizations are starting to really think about if I have to do something like explain to the regulators what this model is, what weightings were used, what data went into it, etc. I think I see more and more organizations trying to figure out how how much of that can they make open so that they have more control and more visibility and explainability into it?

But there are definitely, there are definitely proprietary models that are able to deliver efficiently or effectively. So I think provided that they’re super clear on helping organizations be able to explain to the regulators specifically what part of the compound system that propriety model is looking to solve for. I think we’ll continue to see a combination of both, but I think that’s the power of something like the Databricks platform.

Can you tie into a compound system? Can you use components of a proprietary model and an open model to be able to solve for the business problem or output that you’re looking to, solve for?

Raja Iqbal:

Okay. And when you when you see enterprises on their journey to adoption, what is what do you think is a common call it common myth? Commonly when they get it wrong. And what is the biggest barrier? I mean, so in general, what is what stops them or what holds them back from being. And I am not a neo, company in the sense, basically adopting AI for their, business processes.

For almost every organization, it turns into, people process issue. And my observation. Yes. And so either you don’t have the right data talent that’s been educated and enabled on sort of the capabilities of the technology or the platform, or you haven’t thought beyond just enabling your data personas to the business users, that actually have to leverage the systems or tools.

Robin Sutara:

And so for, for almost every organization, we’ve talked about digital transformation or data transformation for decades now, I think it’s almost this new era of people are super afraid of what they don’t understand. And so for many organizations, their biggest hindrance is how do we how do we not just do this migration of legacy, sort of technical debt or process debt or business operational day?

How are we not doing it right? We don’t want to bring that technical debt into a new format. So, for example, when cloud first came out, I remember lots of organization just doing almost a lifting shift from on prem to the cloud. The problem is they never thought about modernization. So. Right. So could you optimize the way that that warehouse was constructed to take advantage of the cloud and not just bring current construct from on prem right into a different infrastructure?

I think we’re seeing that same thing with AI now. Yes. The process, the business process works in the steps of ABC. So they’re not thinking about now, how do we change business process to be Y and Z. Right. And and really rethinking internal processes to take advantage of the of the technical capabilities. And I think for many organizations that’s hindering sort of their ability to innovate as fast as they want, because all you’re doing is bringing your business or process or enablement get from one format or one technology to another.

And so we really have to think about how do we bring an organization along on that journey to be able to say, what could you do if you weren’t having to do that manual task of rationalizing 100 Excel spreadsheets, every week to be able to report on, on revenue? And right. And so what are the things that you never have time to do?

And how do we think about enabling technology to get rid of some of that stuff that people are really afraid of? Do? Well, I have a job at the end of it. How much is going to be automated? What is AI going to displace, or replace part of me? And so I think it is, you know, it’s no different than the Industrial revolution where machines started to take over, you know, manual processes that people were doing.

We have to take people along on that journey to say, how are we going to enable you? Because you have the domain knowledge, the understanding of the processes, the understanding of the business, the understanding of our customers or patients or clients, whatever that might be. And so I think for most organizations, it is that unlocking the

domain expertise of the organization and making sure that the technology is an enabler, not something to be feared by the company, by by the organization.

Raja Iqbal:

Yeah, yeah. So that’s great. So, Robin, you mentioned about jobs, and that was that the next, I would use this as a segue way to, to talk about society. Right. So, at the end of the day, we are humans, right? So we have, we live in the society. Our job. So, you know, our, physical health, mental health, you know, how we work, where we work, all of that, is also important to us.

So, how do you feel about or, where do you see this? Things that are going to, to be, I mean, no one knows exactly what is going to happen, but how do you see, I, adoption actually shaping the future of workforce?

Robin Sutara:

Yeah. So so, like I said, I do see for many organizations that I work with, it is actually optimizing for improving productivity. I think for many organizations, though, it comes back to that culture. Like, how do you make sure that the organization, understands the value proposition of that productivity increase? It’s always so, amusing, I think, when people say, oh, well, we saved 15 minutes of all 20,000 employees at the company and say, okay, so how did you translate that into something else?

Like, what were they able to do? And now, as a result of saving the 15 minutes a day, did did you now say, okay, now there’s the opportunity for you to deliver against, thought

leadership or the next project that you haven’t had time to do. And so I think lots of organizations were sort of missing that step of, oh, now we have to actually help the organization understand, because otherwise they just see pieces of their current role or functions slowly slipping away as we start to automate or leverage.

I to be able to do that. And so I think those organizations that are doing it really well are taking, you know, taking the company, the entire company and thinking about enablement. You’re right. Absolutely. You know, organizations like yours that are helping us really bring up then the data science capabilities and the next level of data scientists that will have.

But how are we thinking about the business user and finance department who’s been doing that same role or function for the last 25 years? That that domain knowledge is invaluable, right. And being able to translate. And so how do we sit with them and understand like what are their pain points and show them that value. And I think sometimes we miss that opportunity.

And so I think those organizations that are going to be able to innovate and truly transform themselves in a way, that, you know, at a pace that they want to, they’re going to have to think about every persona across the organization, and how do they create a way that allows them to go on that transformation journey?

Raja Iqbal:

Yeah. That’s, that’s a very interesting point. So, so we work with companies as well. And then one of the areas that we have seen is internally is reluctance because, you know, some of the workers, they think that they are going to be replaced. Yeah. Do you see that? I mean, have you heard of this, that that is a barrier to adoption that workers intentionally do not want to adopt?

Robin Sutara:

Because, yeah, I it happens at almost every organization. You’re going to have some persona, right, that just can’t see the future. They can’t see the art of the possible. They can’t see what what their job would look like. And maybe that is because, a majority of their current function could be automated, or could we could leverage AI and such a capability.

And so for those I really think about, you know, how do we think about, pivoting. So, for example, I do think data is probably the best space in the world for us to bring, diverse perspectives, diverse point of views. And that requires us thinking about how do we enable somebody that has domain area of expertise or industry knowledge?

How do we think about, you know, giving them the tools, the capabilities to be able to do things differently, so that they can rethink their job or what that would look like as a result of the knowledge that they have about the industry or the knowledge that they have about the process, etc.. And I would love to say that like that’s an instantaneous thing.

But if you think about it, I worked on digital transformation at Microsoft for 15 years and still left, and they were still transforming. They’re still transforming today after I left. And so for these organizations, I think it’s just going to be this instead thing. I would say, it’s it’s all about those those quick wins that being able to execute.

But it’s also about giving your organization and the people across your organization, particularly those that are worried about what does their job of tomorrow look like. And to think now about what’s the enablement that you can give them, what’s the training that you can give them? Where do you you have some level of technical exuberance or a desire to learn and, you know, great new capabilities.

How do you grow and foster that today? And then how do you rethink those that, don’t necessarily have the same technical background or technical aptitude? What does a future for them look like? And how do you recreate their function. And it’s a great it’s a long process to take them on that journey. And so making sure that you’re investing, you know, in the employees where it makes sense to take them along in that journey with you.

Raja Iqbal:

Yeah. And as a human, we know what these, these tools and technologies they are capable of, and we also know their limitations. So speaking of, large language models, they are built on data sets that are inherently biased. Right? And bias comes from, I mean, these companies began to gather data from data that has been generated by humans.

And humans are inherently I mean, we have our own biases. So does this as a human, does this worry you? You’re optimistic that these, these tools are going to eventually there is going to be some self-correction that is going to happen. So that’s the first part of my question in terms of bias and in general, overreliance of humans on these tools.

I mean, does this worry you because I hear all sorts of, options. Some people, you know, that they are worried. Some people, they say, no, I mean, I’m, I’m an optimist when it comes to technology. I would love to take, like, maybe more of a, Robin Sutara, not the field CTO, a Databricks, but as a human.

I mean, how do you feel about this?

Robin Sutara:

Yeah, I think, anyone who has never read the book Invisible Women by Caroline Criado Perez, I think it’s a fascinating read on, the impact that bias data can have on everything from city planning to to job definition to how seatbelts in cars are decided. Right? They’re all designed for, those things are designed based on data that is essentially the average man, which is five, eight, 160 pounds.

If you look at society, there are very few average men, right? There are very few men that are only five, eight, right. At 160,000. So I it was such a fascinating book to sort of read through to say, hey, we are leaving out a majority of society if we inherently depend on just limiting ourselves on the data sets that we have.

And that’s why I think, there has to be some level of introduction across the data teams. And, you know, the data product teams that are being developed, how do you make sure that you have diverse representation on that team? Because you’re right. We all have inherent biases. So if I only create a team of all women, or I only create a team of all veterans, or I only create a team of only Americans, I think I will be very inherently biased in on the products or services, whether they be data or AI, that I’m creating as a company to be able to deliver to society.

And so how do we make sure that you are creating organizational teams and structures that give you representation? Because there’s going to be somebody in that room that says, hey, wait a minute. Like, if we use that, we’re missing out on this perspective from Bangalore, right? Or, you know, somebody that didn’t graduate university, whether it’s socioeconomic or cultural or whatever it might be.

And so I really think when we talk human in the loop, it has to be a diverse team of humans and really like, how do we really think about setting up our data teams, our organizational structure, our enablement plans, making sure that we have diverse representation across all of those? Because data in and of itself is inherently biased, right?

It’s it was created with biases in mind. And unless you have somebody on the team that’s going to help you recognize where those biases might exist, you might do something like plan cities that don’t take into account people that don’t own cars. So you’re now creating a socioeconomic bias for those that can’t have to walk to work or take public transport.

Right. Or things like, the snow. There’s an example in that book about clearing snow. And they didn’t actually they only cleared the roads and not the sidewalks. And so now you’re essentially creating based on data, right? Our algorithm, AI models that were created to

prioritize snow clearance. So essentially, you know, said anybody that took public transportation that can’t afford to drive themselves to work is now putting at a disadvantage because they’re unable to get to work as a result, because of the way you prioritize that snow clearing, like, things like that, you just don’t think about, right when you’re a city.

I wouldn’t know if I had been a city planner. Of course I would prioritize getting the roads cleared, over the sidewalks to get people to work, etc. and so it’s just really fascinating. On if we understand those biases in the first place and somebody can point them out, how can you then start being mindful and planful and taking those things into account for the products that you’re going to deliver?

You see it in healthcare. You see it in, particularly I think there’s some great examples in there. I’ve done some, work with the women and data around women’s safety, and sort of, you know, how do we leverage data to, to help in the women’s safety arena, etc.? And so I just think there’s so much opportunity as a society, I think data is phenomenal.

We can solve some amazing problems with data and AI, but it does require us to be deliberate about creating diverse systems, meaning not just the data being diverse, but the people being diverse that are working on the data products and services to better make sure that we can deliver.

Raja Iqbal:

Yeah. So remind me, I mean, this diversity aspect of it. So, I mean, mitigating bias can be a tough problem. I’m, I’m not sure if you remember, one of the recent Google Gemini releases. You know, they were trying to mitigate bias, right? So, show me a roomful of CEOs, right? All white men showing up in a meeting room.

Show me the founding fathers. I mean, they and then the AI can actually sometimes say overcorrect things, and now you’re showing a mix of all colors and races. For the right reasons. We are trying to mitigate the bias, but in some cases, there is some historical factual correctness that you have to worry about. Right? So it can be actually very tough.

Right? So because, inheriting models, they learn because of bias in data. Right. So they fundamentally they learn because of some what we call signal. I mean, technically it is bias, right? So it’s a it’s a fascinating problem and it’s a very complex problem to solve actually.

Robin Sutara:

And it but if you would think about it, I mean, if, if the Gemini team had had enough diverse teams, right, working on that version of the release, would they have caught that mitigating bias before it went public? And all of a sudden they were, you know, creating this, you know, incremental racial backgrounds of the Founding Fathers.

But potentially I think it’s definitely a balance. And I think we’re there is a huge amount of risk that we won’t be able to mitigate all bias, but it does work. This is why I don’t think we will ever get to, not having some level of human intervention in these things, because. Right, right. I it requires that it requires people to use this system.

And this is why, I think for organization to hack in making sure that you’re enabling those people that have that domain expertise of your business, of your processes, etc. if you think you’re going to displace them, you are essentially introducing incremental bias into your company, into your AI solutions that you’re creating, because you’re getting rid of the people that have the domain area of expertise to say, oh, wait a minute, that’s not right, right.

Or that’s that’s not how we would expect that result to be, etc.. And so I really think, you know, whether it’s organizational or societal or however it is, we have to think about what is that feedback loop, how do we make sure that we’re allowing people to give that feedback so that we can constantly work on optimizing and improving the system, because just inherently having it’s always interesting, you know, my 20 something kid, you know, says, well, ChatGPT said, this happened in the 80s.

And I was like, baby, I was there in the 80s. I can tell you for sure that did not happen that way. Right. And so, like you said, they just have this, right, there are people that just have this complete and absolute confidence in the results that they’re getting out of the system. We have to make sure that we’re creating, you know, structure that allows people to say, that’s not that’s not right.

I mean, right, my my knowledge, my expertise, my, insight. And I want to have a team and an organization that can bring that point of view because I only have a limited narrow based on my life experiences, my, you know, my upbringing and my education, etc.. And so how do we make sure that we’re creating organizations and teams and structures to be able to support that, that people in that part of it.

Raja Iqbal:

And most probably having the right guardrails on, on these systems. Right. So I think that it’s very important. So we’re coming to a close. Let’s actually just wrap up, quickly. So in the next phase, but I will quickly ask you, you know, some, rapid fire questions, and then you will answer, short answers.

I mean, elaborate, if you like. Short answers would be, fun. Right? So, so if resources were limited, which what do you address first in AI, bias mitigation or improving models, correctness of performance?

Robin Sutara:

Bias mitigation. Absolutely.

Raja Iqbal:

Okay. In terms of, different industries, they are going to be disrupted by AI in different manners. Sorry. Which one do you think is going to be disrupted? Which you what use cases? In which industry do you think will be disrupted as a result of, the AI solution that we are seeing?

Robin Sutara:

I think ultimately every industry will be disrupted at some point. I think right now, anything that sort of professional services, anything that has a dependency, I think just on, aggregation of knowledge into a strategy, capability, etc.. So I think professional services immediately and at some point all industries are going to be disrupted.

Raja Iqbal:

So without naming names, I mean, no more of those consulting and services companies as that’s what you’re telling me.

Robin Sutara:

I think they’ll still be around, but I’ll have to I think they’ll have to, you know, innovate new business models.

Raja Iqbal:

And so no more, business and PowerPoints only.

Robin Sutara:

Exactly.

Raja Iqbal:

Yeah. In terms of, jobs, will AI result in job elimination, job creation or job displacement?

Robin Sutara:

Job evolution.

Raja Iqbal:

Okay. Can you elaborate? I mean, I know I asked you for short answers, but, when you say job evolution, what do you mean?

Robin Sutara:

Well, I think for, for most or, I think for most jobs or roles, I think there is the capability to think about how does that role evolve or change as opposed to being completely replaced or displaced by AI. And so, again, bringing that domain knowledge and understanding that those employees have in that current role, there’s value there particular as we think about, the bias mitigation, etc..

And so how do we make sure that we’re helping them evolve their roles and their functions as opposed to displace or replacing them with AI?

Raja Iqbal:

Yeah. And what is the biggest challenge to enterprise adoption of AI? Is it technology? Is it skill gap? Is it regulation? Is it culture? Something else?

Robin Sutara:

I think it’s still primarily culture, right? I think it’s helping people see the the power and the value and then taking them along in that change journey.

Raja Iqbal:

Okay. In terms of, open source versus closed source models, which of the these are going to be which one is going to be the winner? In enterprise, I.

Robin Sutara:

I think most organizations are going to use a combination of both, but I do see more open models becoming, more more prevalent as opposed to closed models for regulatory and legislative requirements and explainability and transparency.

Raja Iqbal:

Is a time for me to sell my stocks for open set like OpenAI. I don’t think it’s graded right, so I should not,

Robin Sutara:

MBA they’ll continue to sell amazingly, you know, big complex problems will definitely still need proprietary. But I think for most organizations, you don’t need that kind of power to solve every business problem. And so I think we’ll start to see more and more and open, sharing of models and capabilities across the ecosystem.

Raja Iqbal:

And perhaps an extension of the same question, large language models or domain specific small language models. Which one do you see being used?

Robin Sutara:

Again, it’s going to be a combination of both. I depend depending on the use case. I, we are seeing more and more domain specific. I think, models being created right now, because large language models have been around much, much longer, and organizations have figured out the limitations on what business problems can and can’t be solved with those.

Robin Sutara:

And so we’re definitely seeing an uptick in, more domain specific business models being created right now.

Raja Iqbal:

Okay. And my last one here, if you have to mention one book or paper or a thought leader or a talk that I should go and watch, to understand, how this revolution is going to unfold. And really, if I want to understand what is going on, what would that book or paper or dog or thought leader be?

Robin Sutara:

I think there are so many phenomenal books that are being created because the space is moving so, so quickly. For me, yeah. Right. I enjoy, reading things like CDO magazine to sort of get the top, the top issues facing executives today. I enjoy things like the Data chief podcast, and Databricks has some phenomenal blogs I think that continue to come out not just talking about the technology and the platform, but also how are organizations leveraging those platforms and capabilities and sort of the real business value that they’re being able to to drive as a result?

And for many organizations, that problem that you’re trying to solve for is probably not unique to you. And so looking across industries, how do you take something from retail or supply chain and provide it to a form of your right, a form company to help solve or supply issues, etc.? I think we’re going to see a lot more knowledge sharing.

Robin Sutara:

Some of these best practices so that we can continue to push the pace of innovation.

Raja Iqbal:

Okay. And my last question, this is not a rapid fire question. Just a closing talk. But what are you excited about as a human and as a technologist? When you look at, that everything that is happening around us.

Robin Sutara:

I think what I’m most excited about is the accessibility of technology is now no longer limited to just technologists. As I mentioned, I mean, my last coding language was Fortran, right? So it’s not super helpful these days. I can get by and skill enough. But what I love is the fact that even my, you know, parents and grandparents have access to the information that, you know, typically.

Raja Iqbal:

And they can write Python. Now, I have a great grandparents can actually write Python code.

Robin Sutara:

So if I think about the power now that that represents and sort of the impact, I think that we can have on society, I think I’m super excited to see what that uncovers for us.

Raja Iqbal:

Well, thank you so much, Robin, for your time. It was a pleasure having you.

Robin Sutara:

Thank you so much. I really appreciate thank you.

Bootcamps

Courses

Case Studies

Reviews

Consulting

Case studies

Community

Company

Robin Sutara on Responsible AI, Governance, Diversity, and People Behind Data

About Speaker

Transcript

Sign up to get the latest on data science events and webinars