Suzanne Eléonore June 7, 2021 Spreadsheet
As a set of general rules data is most useful when things like text fields hold only names as well as meaningful and validated codes, categories and classifications. Text notes and other free form text should be isolated to a dedicated notes field and thus separated from other numeric data. Numeric fields should hold only numeric values (numbers, dates, %‘s and in the correct quantum or magnitude with no text prefixes, suffixes, spaces, text elements or text notes present. You must also be careful that numeric data is not stored as text and it should be internally consistent in terms of the correct format so that it can be used in calculations or for comparison and queries. Finally, addresses should be separated out into multiple fields such as street address, town /suburb, state / province, postal code and country to allow for geographic analysis and mail outs if required. Fixing up a data set to meet these criteria is called data scrubbing, cleansing or massaging. This data cleansing process can be very time consuming even for an experienced Microsoft Excel user, database engineer, business analyst or computer programmer.
Whilst Excel cannot clean or structure all of your data for you it does come with some useful functionality for manipulating and analysing clean and structured data sets. This in-built functionality includes pivot tables, sorting and filtering. Filtering alone is a powerful tool and can help to quickly isolate data based on specified criteria. But what happens if your data is clean but not very structured (a common problem). For instance what if you, a client or your team is using colours, fonts or some kind of formatting to classify data in an Excel spreadsheet. In short, you wont be able to filter the data, because Excel‘s in-built filtering logic requires rules based on numbers, dates and text only. It will not perform filtering based on formats. In addition Excel filtering only applies down rows. It will not perform filtering across columns.
Spreadsheets such as Microsoft Excel are well suited to tasks involving the manipulation of small amounts of related data. Working out a budget, producing visual reports, organizing lists and calculations that involve many variables are all tasks well suited to a spreadsheet. There are some data related tasks however that spreadsheets such as Microsoft Excel are not suited for. Tasks involving the processing and combination of large sets of data for example are generally not well suited to spreadsheets. There is another technology with a long history and theoretical background that specializes in these sorts of tasks. That technology is relational databases. The most common way people insert data into and extract data from relational databases is via the language of Structured Query Language.
So why does data that inevitably finds its way into a Microsoft Excel spreadsheet often suffer from the problems outlined above. The reasons are many. If the data is imported, it may have been sourced from a combination of other spreadsheets, databases, systems, reports, word documents, emails or web pages. If the data has been entered manually it may have been poorly done so by an inexperienced computer users such as administrative or junior staff with a lack of understanding for data structures. Excel is easy to use and widely accessible, so an inexperienced colleague can quite easily update your spreadsheet with a false sense of confidence and inadvertently enter new data incorrectly. And finally, unlike a fully functional software system, data entry in Excel generally has no automatic validating rules, unless carefully setup by the spreadsheet‘s creator.
Additionally, it would prevent pirating software and Microsoft would be able to have a steady stream of income and continually upgrade their software without having to worry about people stealing their software code. Google of course is beta testing their similar project in order to head Microsoft off at the pass and compete with them for the new paradigm of computing on the Internet. If all your data was stored you would never have to worry about your computer crashing because everything is backed up somewhere else and safe.
Given this data set imagine trying to find out which Fridays you were busy at an appointment at noon while your partner was also busy at an appointment at noon and the descriptions of both of your appointments contained the phrase down town. If you are not familiar with relational databases and SQL it might surprise you to know that the question can be answered by a single simple SQL query. The database and SQL don‘t have it all their own way however. Spreadsheets come in to their own for tasks that benefit from a visual representation. Traditionally databases do not provide a visual way to browse the data in tables without explicitly requesting data.