I was first exposed to large data sets when I worked for a Supply Chain software firm. After my first month it became apparent that Excel alone was not good enough to handle the large data dumps from our clients’ systems. I mucked around with MS Access to clean the data into a form I could use, but after MS Access reached 2 Gig file sizes, the program crashed. I stubbornly went on trying to make MS Access work by chopping the data up into different MS Access files and I quickly got super frustrated. I had a conversation with a colleague one lunch break and he asked me why I was not using SQL? I told him that I was more comfortable in MS Access and that I could make it work. He nodded and replied “sure you can also stand on one leg if you like, but SQL is the right tool for large data sets”. He was absolutely correct.
So what does this Big Data buzzword mean? I think about Big Data as the result of collecting information at its most granular level. Depending on the industry, size and structure of your Big Data, there are multiple tools of ranging complexity from distributed computing such as Hadoop and Cloudera to more traditional database solutions. For most Business Analytics applications, a traditional database solution will tackle your Big Data analytics requirements. The common technology with all of the database solutions is SQL.
So what the heck is SQL anyway? SQL (pronounced ess-que-el) stands for Structured Query Language and it is how you talk to a database to access and manipulate data. Learning some super basic SQL goes a long way in being able to handle Big Data and allows you to interrogate transaction data that often gets dumped out of an enterprise system. At first SQL can be a little scary, but once you get a few fundamentals down, you will quickly turn into a Big Data Badass.
So where do you start with a SQL database and what tool should you use? When you first move into crunching big data, SQL Server Express is the tool you want. SQL Server Express,the free version of SQL server, gives you access to almost all the functionality of the full version with just a constraint of 10 Gig per database. I’ll take you through some quick hacks to get you going. First we will figure out how to install SQL Server Express and then we will go through how to uses the tool Management Studio. After you are all set up with SQL Server, check out Use SQL Like a Badass for the key SQL commands that you need for Business Analytics.
1. How do I get SQL Server Express?
The hardest part of getting started in any SQL platform, is installing the software. The first thing you will need to do is to get a copy of SQL Server Express. In this example, I’ll walk you through the basics of getting 2008 R2, SQL Server Express 2008 R2. Note that other versions can be found at Microsoft.com.
1.1 Work out if you have a 32 bit or 64 bit system
Chances are you have a 64 bit machine, but to check go to Control panel->System and Security->System.
Download the 64 bit ‘SQLEXPR_x64_ENU.exe’ or the 32 bit ”SQLEXPR_x86_ENU.exe’.
1.3 Set up SQL Server
There is a youtube video on the install process that is helpful by Jason Roth Install SQL Express 2008 Video. I’ll walk you through the most important steps when you install your instance of SQL Express.
The first step when you launch the software is to do a new installation.
Next, make sure you have all features selected. Management tools are important, because most of your time when using SQL for analytics you will be in tool Management Studio, the graphical interface for SQL.
Keep going until you get to instance configuration, go with the default.
Keep clicking next until you get to Database Engine Configuration. The easiest is to use Windows authentication for your login in.
Hit a few more next arrows and that should be it for the install. The last thing to do is to check that you have SQL Management studio. Go to programs and look for SQL Server Management Studio.
If all went well you should see something as per the above above. Make sure you can see SQL Server Management Studio. If you cannot see SQL Management studio, check whether you installed the management tools in the steps above. In the next blog, Getting Started with SQL Server Express – 2/2, we will go through some Management Studio hacks.