Every now and then I find myself in the situation where I need to analyze data coming varying sources, be it in the form of files (spreadsheets, CSV, etc.) or data spread across different servers and/or database platforms. The way that I often see people handle this is by loading the data into a spreadsheet (using Excel, LibreOffice or an alternative) and then start filtering and fidgeting away with spreadsheet formulas and functions, creating temporary columns and/or copying and pasting things around until they get their desired result. When you only have to examine a single table of data this is not much of an issue, it may in fact be the best option to start with: its low-barrier and using column filters is pretty intuitive in most spreadsheet software that I’ve used.
When your searching and filtering needs become more complicated and/or involve multiple tables of data that you want to combine in one way or another things tend to get a bit hairy (at least for me) and I personally prefer using SQL instead. In this article I will give an in-depth explanation on how you can do that using a contrived example, if you are experienced with SQL, command line tools and the shell you can probably skim large parts of this article, if most of this is new to you it hopefully contains enough detail for you to follow along.
Continue reading “Using SQL as an every day tool”