fbpx
Learn to build large language model applications: vector databases, langchain, fine tuning and prompt engineering. Learn more

data science and law

Data Science Dojo
Carol Elefant
| October 27

Is there a relation between data science and law? Here’s what a lawyer learned from a 50-hour data science Bootcamp at Data Science Dojo.

With an increased focus on the growing role of data science and data analytics in the future of law, I decided that it was high time to learn what all the fuss is about.

How do data science and law work together?

Initially, I considered taking a course on data analytics geared for lawyers, but shockingly, I couldn’t find much, except a couple of classes that focused on e-discovery where predictive coding is a hot topic.  One new player in the legal data space,  LexPredict also offers  a bunch of training for lawyers, but the company seemed geared towards big law and in any event, didn’t list dates or prices for its classes

Unable to find data engineering classes for lawyers, I decided to get at the subject from another angle: start with the data science tech and work my way back to the law. That approach gave me a plethora of options, from low-cost classes at Udemy and Coursera to 12-week bootcamps costing $10k or more.

However, because I didn’t have the luxury of giving up my day job, I knew that I’d need a compact course since any program that dragged out over weeks or months increased the chances that I’d drop out once my caseload and client emergencies presented a conflict.

Likewise, given that I’d have to take time out of my practice for a class that would cause some financial loss, I didn’t want to shell out several thousand dollars for a class.

Based on my criteria, Data Science Dojo’s data science bootcamp fit the bill: it’s a reasonably priced 5-day, 50-hour onsite program that didn’t have any prerequisites (though there were about 10 hours of pre-class prep).

The class covered broad ground: in the week I learned coding tools like basic R, MS Azure, Hadoop, and Hive along with concepts like data mining and visualization, predictive modeling, Ensemble methods like bagging and boosting, random forests, the importance of cross-validation, the difference between training and test data, AB Testing basics, building a recommendation system and handling real-time and streaming data (we hacked a quick IoT solution using Azure tools, though truth be told, I was pretty much lost by then).

Below are some of my takeaways on big data, especially as it relates to the legal profession and what it’s like for a lawyer to learn a new skill at an advanced age.

Lesson 1

The mechanics of building a predictive model aren’t particularly difficult; understanding what features to include and how to approach the problem is – and that’s where domain knowledge is important.

One of the underlying themes of the class is that data science (itself a buzzword) is merely a collection of skills, intuition, and domain knowledge that matter as much as coding a predictive model.  Yet oddly, when data science is discussed in the legal profession, we downplay the importance of legal expertise and its value in creating effective models.

Predictive models are iterative and constant questioning is a good thing.

Although most lawyers will argue a legal principle ad nauseam, when it comes to data, we’re surprisingly passive.  For the past two years, Clio has released a Trends Report that produced interesting, albeit counter-intuitive results. Yet the results are reported as is, with no questions as to the methodologies used, what the data means, or how it was gathered.  That’s not true data science: it’s groupthink.

Big legal data Isn’t all that big

Our instructor shared with us the Five V’s — Volume, Velocity, Variety, Veracity, and Value – which are used to evaluate whether data rises to the level of big data. For volume, we’re talking about massive amounts of data – not terabytes, but exabytes and beyond – too large to be stored and processed on traditional machines.

For example, on Facebook, 10 billion messages are exchanged each day. It’s hard to imagine many sources of legal data that approach that volume. Our instructor’s point was that we shouldn’t make a data problem into a big data problem unless necessary. So, I wonder whether lawyers are using the term “big data” for small data or treating ordinary data problems as big data problems.

Kaggle competitions are way cool

I didn’t know much about Kaggle before my class. Although our involvement in Kaggle was limited to an in-class competition over who could build the most accurate model to predict survival on the Titanic, more broadly, Kaggle serves as a platform where companies can crowdsource the creation of data models.

Many of the contests attract large numbers of participants – because the sponsors pony up substantial cash prizes as an incentive. Lawyers are often criticized for not crowd-sourcing orb-sharing information like other professions — but I’ve not seen a single platform that offers any financial reward to lawyers for creating content that might be used as the equivalent of case notes.

If any of the companies adding blog content to supplement caselaw – as Fastcase in collaboration with Lexblog are doing now – offered a thousand-dollar award every week for best content, I think we’d see an explosion of high-quality crowd-sourced materials

All practicing lawyers, not just millennials, need to understand new technology

Most of the conversation about the importance of learning about big data, AI, or other new tools comes in the context of advice as to what millennials need to learn. But I think it’s even more important for us mid-career and older lawyers to keep pace with the future if we want to have control over how the last decade or two of our careers play out.

After 50 hours of bootcamp, I’ve had to catch up on client work – and I’m not sure how soon it will be before I can apply all the fancy new tricks and knowledge that I’ve learned.  For now, I’m satisfied that at least I’ve taken the first step.  When will you do the same?

Related Topics

Statistics
Resources
Programming
Machine Learning
LLM
Generative AI
Data Visualization
Data Security
Data Science
Data Engineering
Data Analytics
Computer Vision
Career
Artificial Intelligence