Recently I’ve heard the question asked more and more often: when should we use Jupyter vs. Excel? My answer is always the same: we might as well start using Jupyter as much as possible, because it’s only a matter of time before it dominates Excel. I know, it’s hard enough to train people up on Excel. You’re going to have them learn Python? Really? Yes, really. My prediction: within the next 5–10 years Jupyter will overtake Excel. While there’s already been some great posts comparing Excel and Jupyter usecases, here is why I think Jupyter will overtake Excel:
A Low Barrier to Entry
Let’s look at simple economics here: a Microsoft Office suite for an individual (which includes Excel) costs $149.99. How much does Jupyter cost? $0, it’s free to download with Anaconda. With GSuite catching up to Office, I can see a future where companies skip Office altogether (by the way, Google has it’s own version of Jupyter, Colaboratory, which is also, you guessed it, completely free.
There’s also the simple ‘useability’ argument — when I first got started with Python, I was daunted by the idea of loading and accessing Python using the Command Line. Jupyter Notebooks overcomes that hurdle, making Python effortless to access. While this seems trivial, I think it’s one of the greatest benefits of the Jupyter platform — I can’t understate it. Not only that, but Jupyter is significantly more powerful than Excel, not only offering a range of data processing and visualization options, but allowing you to send emails, query databases, and scrape the web. Which brings me to my next point:
Automation and Replicability
Anyone who’s ever worked in an operations role, particularly at a large company, understands that the most time consuming, mentally taxing activities often involve performing the same tasks over and over again, either on a daily, weekly, or monthly basis.
These processes can involve pulling reports, which often involves pulling data from a data source like SQL or some manual operation, and then assembling that data in excel. Alternatively, these processes can involve taking data from one datasource and uploading it / sending it somewhere else. Sound familiar?Currently most people do this soul-crushing work manually, not knowing that many of these tasks can be automated using Jupyter notebooks, and unlike excel, it’s extremely easy to replicate a task, whether it’s editing the structure of a file or sending it from point A to point B. Simply point the notebook at a new file to manipulate, and run it, and your soul is suddenly un-crushed. Weekly reports that would take an hour to process in Excel each week suddenly take five minutes.
As the quantity of data continues to expand, we’ll need to arm ourselves with the proper tools to manipulate and analyze it. While Excel filtering and formulas may work for smaller datasets, analysts are commonly dealing with datasets in the millions if not hundreds of millions of rows, and they need a tool that will not slow them down. Regardless of processing time, the psychological tole of Excel crashing in the middle of analysis is cause enough to promote Jupyter notebooks for this task.
While it may take employees a while to catch on, it’s only a matter of time before business owners and executives recognize the competitive advantage of Jupyter. The question won’t be Jupyter vs. Excel – it’ll be why wouldn’t we use Jupyter? Jupyter is not only a better tool for analysis because it lays out clearly the steps you took to manipulate the data, it also allows for greater automation and the manipulation of larger datasets (which are only going to become more prevalent over time). For these reasons, Jupyter will overtake Excel, sooner rather than later.