Imagine that your task is to write a function that can parse a piece of text and find the presence of any indication of time in it: timestamps, mention of time indicators (today, tomorrow etc.,), mention of numbers, numbers following a pattern, and so on..
This is a classic fuzzy problem. To solve it, we can’t go about creating a data-set containing all the patterns that could indicate time. This is a tiring and non-exhaustive exercise. Most solutions for such a problem are dependent on a readily available labelled data set that categorises a right event and wrong event. And that’s data, not code. This problem can’t be perfectly solved. An always ideal output is impossible. And a test case for such a function can never be called all-encompassing.
What does all this mean? Developers need a “we don’t know” perspective. The hacker mindset of diving into code to get a quick prototype is outdated. User research for insights should be table stakes. Developers are data-starved, though collecting and analyzing data to solve a problem is a necessary skill.
Not all problems can be solved by explicit coding alone. The need for a data sensitive programmer is going to increase in future. The effort to understand large data on which the lines of code work is key. Data being unpredictable isn’t a valid reason to underplay its role. Because technology can have unpredictable consequences on society, we don’t stop ideating about technology, do we? Similarly, thinking about data is essential and in fact, will be a rallying cry for programmers in the future.