When versions scale faster than you expect, chaos follows.
A couple of months ago, I believed it would be enjoyable to run a “tiny” data science experiment. The plan? Scratch hundreds of product testimonials, run belief analysis, and imagine the cause a tidy dashboard.
It began innocent enough. Yet 3 hours later on, my laptop fans were shrieking like a Boeing 747 Chrome iced up. Spotify stammered. And Jupyter Notebook appeared like it was running on dial-up net.
What went wrong? Well, I ignored simply exactly how quick poor code scales right into disaster.
Here’s the tale and the 3 lessons I’ll always remember.
Mistake 1: Pandas All over
My very first step was loading all reviews (2 3 M rows) right into Pandas. Because, naturally, that’s what every tutorial does.
import pandas as pd
df = pd.read _ csv("reviews.csv")
beliefs = df ["review"] use(analyze_sentiment)
The problem? Pandas doesn’t care that your laptop computer has 8 GB RAM. It will happily eat everything, and afterwards some.
Within minutes, I was staring at an icy display. Lesson learned: Pandas is powerful, however it’s not built for raw range on customer …