Category Archives: Data Science and Analytics

What is Pattern Recognition? How is it useful?

Pattern Recognition is a technique of sensing a repetitive occurrence in any situation. It can be very useful in problem solving.
Say we want to count the number of x in the following diagram
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x x
We see there is a pattern in the way the ‘x’ is placed – uniformly across all columns in every row. What would an intelligent person do? Would he count all the singleton ‘x’ beginning from the first row and ending with the last? The answer, as you have rightly guessed, is NO. The person would count the number of ‘x’ in any row and multiply it with the number of rows. In our above figure, it should be 10 (number of x in each row) multiplied by 5 (number of rows) which gives the number of x to be equal to 50.
For problems of much larger sizes and complexity, detecting patterns can help in finding solution faster making it an efficient one. That is why Pattern Recognition is so useful in Computational problem solving.


Leave a comment

Filed under artificial intelligence, Data Science and Analytics



Can you cite an example in real-life where we might have to make use of the median value, instead of the mean?

In most real-life situation, we make use of the mean and average value to predict an unknown variable e.g., in a roll of six dices the likelihood of any number in the dice appearing is 1/6. However, making use of the mean may not be advisable in certain situations.

Let us consider a scenario of campus recruitment for an outgoing batch in an engineering college/university. Students may have to avail of a limited number of choices (often fixed by the institute) / chance to appear for the selection process of a potential employer. Let us say that number is 3 companies. Also, the institute normally bars an ‘already selected candidate’ from appearing in the other lined-up companies.  Suppose a student longs to work in healthcare informatics and analytics. It is quite possible that the best companies to work for in this area are coming to campus towards the end of the recruitment season. Since there is no guarantee of job assurance, how would the candidate devise his strategy, which companies to sit for and which ones to avoid? It is observed that there is a great variance in the pay packet on offer. The best choice for the candidate would be to avoid the highest and the lowest paying companies and appear for companies that offer a mid-range salary. In such situations, median acts as a better measure than mean.


Does strong association always unravel interesting causal relationships?

In the process of Market Basket Analysis, we perform Association rule mining to find items that sell together. Associations that crosses the threshold value of ‘support’ and ‘confidence’ usually uncovers interesting relationships among items that are purchased together by a customer. Sale of computers leading to a sale of security software in significant proportions can be a causal relationship.

At times even though we may discover strong associations, the casual factor can’t be claimed strongly. Data from a particular city may indicate that ‘sale of umbrella’ & ‘number of deaths’ has witnessed an upward trend in recent months. But can we see a causal relationship in this case? It may be noted that ‘sale of umbrella’ has risen owing to rainy season and the surge in the figure of deaths can be as a consequence of floods or any such contributory factors.


Should the sample size be necessarily equal when we are making a comparative study about the potential of two regions for starting a business (for picking the one with better prospects)?

Let us consider a Multinational keen on setting up a tissue business and weighing the prospects of setting it up in an Indian metro city vis-à-vis a European city. Use of the product i.e. , tissue paper is mostly a social one in case of a European city while it may be restricted to the upper segment of society in an Indian metro. Therefore, owing to uniformity, the general sample size for a European city can be less, but owing to this great diversity in the case of an Indian metro, the sample size should be much higher to estimate the true possibilities for the business across the diverse customer segment.   







The questions were a part of/ inspired from a lecture of Mr. Gautam Bannerjee of Computer Brio and his talk on ANALYTICS WITH R delivered at Galgotias University, 9th-13th Dec, 2017


Leave a comment

Filed under Articles, Data Mining, Data Science and Analytics

The power of Social Media Analytics and Data Science …

I’m told a famous cricket commentator is off the air since the last couple of years or so. I am not a Cricket buff and can’t vouch for the authenticity of this fact. But what has transpired it seems that about three/four years ago, a famous actor in a tweet had criticized this commentator about showering lavish praises on foreign cricketers while being not too kind when it came to Indian players. This generated a storm among twitterati with things gradually snowballing into such negativity for this commentator that he was axed from his job. His disappearance from all Sports channel is a consequence of this.

Here lies the power of social media and social media analytics. How do you think the rates of celebrities endorsing an ad on TV for a few seconds determined? One of the prime determinants would be the kind of followers the stars have on social media sites and the number of likes and re-tweets that they generate …

As long as social networking would continue to grow, data science would remain a challenging discipline …

Leave a comment

Filed under Articles, artificial intelligence, Data Mining, Data Science and Analytics