Statistical Analysis of Deviation of Media and Entertainment Sector from Established Sources

The media industry has been a significant example of the globalization of data and westernization of content throughout the country. As we grew older, the already established sources of viewing of entertainment like television have been steady yet dynamic in its own ways. If we compare today's scenario with that of one decade ago, we can point out the numerous methods and sub-branching of the media industry. Once we had flexibility in channels, the evolution of media hit us with the variety in where we view it. The internet acted as the further catalyst to this movement and positively drove it forward to a ludicrous sector of the economy. As we grew up watching television throughout our childhood and today, we barely watch television anymore, we had the knack to further make this shift quantitative, rather than only base it on our intuitive watch over the society. Further, the nature of our undergraduate study interested and provoked us to look into real life data and put it into the language of statistics, of course with the help of mathematics and economics. The motivation was also because we were curious and aware about the vast nature of the topic where we studied every age group, and people from all aspects of the society. The main driving force of the selection of this project was the scope of this topic. The studying and analyzing of this topic directly gave us a solid idea about the contribution of the media sector to the economy and the underlying psychology behind shifting from traditional sources of the media. We got curious just by the heterogeneity of the population watching another set of heterogeneous sources of media. Even in an inchoate stage the data we received was varying and was open to a lot of interpretations. We


INTRODUCTION
In the age of instant gratification, we live by the rule "Here and Now" these days.Be it online ordering of food or riding a cab when and where required, companies are working hard to fulfil our desires.Now we want entertainment on demand because watching television shows at the given time is too mainstream for us.With the soaring number of users turning to online video streaming we can only wonder if the end of traditional cable TV is near.
One of the most obvious reasons why people are turning to online streaming is the ease and the control they have on what they want to watch and when they want it.They can watch whatever they like as any times as they want, skip or repeat any parts of the video.Unlike the traditional television, that provides content, based more on the geographical location.Online video streaming gives the freedom to view whatever video content there is worldwide.So, one has way too many options to what type of content they want to view.Online streaming has become a big business in recent years, with services such as Netflix rapidly increasing in popularity.Improvements in streaming technology look set to continue this trend further, allowing advertisers to engage with consumers in new ways.Looking at the possible future of online streaming can help you see how your business can interact with potential customers in future.In the early decades of television, technological limitations meant that relatively few channels could be broadcast.As a result, these channels tended to carry a broad mix of programming, with something to suit everyone.In contrast, streaming allows for virtually unlimited channels to be transmitted cheaply, with costs continuing to fall as the technology improves.These reduced costs mean that we may see an explosion in very specific channels aimed at niche markets, as content providers will no longer need a large audience to remain profitable.

DESCRIPTION AND COLLECTION OF DATA
• First, we look into the nature of data to be collected and need to be analyzed.It was trend and based on observation of the present and past state of the data, thus we concluded to collect primary data.
• Then to collect sample, we first categorized the population fundamentally based on where they belong or come from, that is, rural and urban.
• Once the nature of the sample was decided, we put together a questionnaire first, based on the needs and taking care of the psychological aspects of individuals of every age groups.The questions were further filtered and chosen to suit people of both urban and rural areas.
• The forms were further divided into manual and digital collection basis where we took into account the ease of form filling of individuals • We spread the forms evenly into places where different age groups were readily available.Thus we put together a bunch of heterogeneous samples collected from varied places over pune.

STATISTICAL THEORY AND DATA ANALYSIS PROPORTION TEST
In proportion test we discuss the cases when two samples are taken from two distinct populations or materials.Suppose a sample is drawn from each of the populations.The test statistic is based on both samples.Suppose these samples give proportions of specific items as p1 and p2 respectively.We want to know whether the population proportions from which these samples are chosen are same.
Let, n1= Size of sample drawn from first population n2= Size of sample drawn from second population p1= proportion of specific items in first population p2= proportion of specific items in second population

Decision:
As p-value is greater than 0.05 and 0.10 for both the confidence intervals i.e., 95% and 90% respectively.Hence, we accept H0.

Conclusion:
From above data, it is evident that, proportion of males and females watching online content is same.

Television Vs Gender
H0 : proportion of males and females watching television content is same.H1 :proportion of males and females watching television content is not same.

Decision:
As p-value is greater than 0.05 and 0.10 for both the confidence intervals i.e., 95% and 90% respectively.Hence, we accept H0.

Conclusion:
From above data, it is evident that, proportion of males and females watching television content is same.

Both Platforms Vs Gender
H0 : proportion of males and females watching both platform content is same.H1 :proportion of males and females watching both platform content is not same.

Decision:
As p-value is greater than 0.05 and 0.10 for both the confidence intervals i.e., 95% and 90% respectively.Hence, we accept H0.

Conclusion:
From above data it is evident that, proportion of males and females watching both platform content is same.

Conclusion:
From above data it is evident that, proportion of people paying more than 200 Rs.for online streaming is same in Urban and Rural areas.

Logistic Regression Analysis using R-software
In our project we came across situations where outcome or response variable is dichotomous or binary variable that can assume only two mutually exclusive values.In our case, these values are usually coded as y=1(yes) for success and y=0(No) for failure.
Since, y has only two values we can assume that y is Bernoulli random variable.Suppose we have single Regressor X Y 1 0 In Logistic Regression model we assume that π(x)=  (+) 1+ (+) π(x) is called Logistic function in X Therefore, So that regression model becomes loge π(X) = a+bx 1−π(X) loge π(X) is called Logit transformation.1−π(X) For urban Population H0: Based on time in hours of viewing, people don`t find themselves addicted (Independent).H1: Based on time in hours of viewing, people find themselves addicted (Dependent).X: Total time spent on Television and Digital platforms in Hours.Y: Addicted(Yes=1) or Not addicted(No=0).

Interpretation:
The estimates of a and b are -0.8030542 and 0.0002499 respectively.The reduction formula is given by Y=a+bX.Y=(-0.803054)+(0.0002499)XThe standard error of the corresponding estimates are 0.1327789 and 0.0070655 respectively.TO TEST: H0: a=0 vs H1: a≠0 Observing the p-value, Coefficient a is highly significant.To test: H0: b=0 vs H1: b≠0.Observing the p value, coefficient b is not significant.Odds ratio estimate is: 1.000249931 Null deviance: 347.76 Residual deviance: 347.76 G=(Null deviance-Residual deviance) =0

Interpretation:
The estimates of a and b are -1.6398 and 0.2779 respectively.The reduction formula is given by Y=a+bX.Y=(-1.6398)+(0.2779)X The standard error of the corresponding estimates are 0.4822 and 0.1111 respectively.TO TEST: H0: a=0 vs H1: a≠0 Observing the p-value, Coefficient a is highly significant.To test: H0: b=0 vs H1: b≠0.Observing the p value, Coefficient b is significant.Odds ratio estimate is: 1.320354155 Null deviance: 107.41 Residual deviance: 100.45G=(Null deviance-Residual deviance) =6.96

Conclusion:
Based on time in hours of viewing, people find themselves addicted (Dependent).

GRAPHICAL REPRESENTATION AND INTERPRETATION Cord Cutting
Cord-cutting refers to cancelling or forgoing a pay TV subscription in favour of an alternative internetbased service.The young may be moving away from TV but there may not be cord-cutting yet, as in a single TV household, these youngsters may still not be the decision makers.However, these are the people who may not introduce the cord when they set up homes.

Interpretation:
Though viewers on digital platforms are increasing people still enjoy w atching television due to which they wouldn`t prefer cord cutting even for altern ative streaming sources.Another reason why cord cutting is not preferred in Rural areas is may b e because of lack ofawareness of other sources.

Paid Services Vs Free Services
According to our data, we have obtained the following graphs:

Interpretation :
Most of the people prefer using free services.But among paid services , Amazon prime is the most used paid service followed by hotstar and then Netflix.Netflix is the least preferred paid service due to its high subscription cost.

Interpretation :
YouTube is the most preferred free service followed Voot ,Torrent , Zee5 a nd then TVF.

MOST PREFFERED GENRE ON TV AND ONLINE STREAMING GENRE ON TV Interpretation:
Movies are the most preferred genre irrespective of the age groups.Followed by News i.e.More preferred by the older generation.The kids Channels are the lea st preferred among all the age groups.

Interpretation:
Comedy is the most preferred genre irrespective of age groups on digital platfor ms.Followed by Knowledge and Science Fiction which is the second most preferre d amongst younger generation.It is also observed that Religious and Anime genres are least preferr ed irrespective of age groups.Based on the average hourly data for Online streaming, the age group 22-27 (younger millennials) spends the highest amount of average time i.e., 3.15714 hours on a daily basis, whereas the least amount of average time spent on online streaming is of the age group 35 and above i.e., 1.463333 hours on a daily basis.

Average time spent on visual entertainment Vs Age group
From the above graph, we can clearly observe that Digital Platforms are more preferred over traditional media amongst all age groups.

Comparison of data with year 2013 and 2018.
Graph obtained from our data:

Interpretation:
We can observe that there is an immense growth in the viewers of digital platform from the year 2018 to 2020 and an immense decrease in viewers of television.The listeners of Radio have further decreased from year 2013 to 2020 by about 50.0%.

Interpretations:
We can observe that Television is mostly preferred to watch by older generation and digital platforms are mostly preferred by younger generations.

𝜒 2 =
3.841 Decision Criterion: Reject H0 at 5% l.o.s.if G ≥  2 Decision: Accept H0 Conclusion:Based on time in hours of viewing, people don`t find themselves addicted (Independent).For Rural Population H0: Based on time in hours of viewing, people don`t find themselves addicted (Independent).H1: Based on time in hours of viewing, people find themselves addicted (Dependent).X: Total time spent on Television and Digital platforms in Hours.Y: Addicted(Yes=1) or Not addicted(No=0).

Free online service users Vs Area H0
: proportion of Free online services users in Urban and Rural is same.H1 : proportion of Free online services users in Urban is greater than Rural.As p-value is greater than 0.05 and 0.10 for both the confidence intervals i.e., 95% and 90% respectively.Hence, we accept H0.Conclusion:From above data it is evident that, Free online services users in Urban and Rural areas is same.

watching TV and digital platforms more than two hours Vs Area H0
: proportion of people watching TV and digital platforms more than two hours in Urban and Rural area is same.H1 : : proportion of people watching TV and digital platforms more than two hours in Urban and Rural area is not same.From above data it is evident that, proportion of people who find their sleep affected by more than 2 hours in Urban area is greater than Rural area.
Decision:As p-value is greater than 0.05 and 0.10 for both the confidence intervals i.e., 95% and 90% respectively.Hence, we accept H0.Conclusion:From above data it is evident that, proportion of people watching TV and digital platforms more than two hours in Urban and Rural area is same.Decision:As p-value is less than 0.05 and 0.10 for both the confidence intervals i.e., 95% and 90% respectively.Hence, we reject H0.Conclusion:H0: Proportion of people paying more than 200 Rs.for Television is same in Urban and Rural areas.H1: Proportion of people paying more than 200 Rs.for Television Is greater in Urban than Rural areas.

:
As p-value is less than 0.05 and 0.10 for both the confidence intervals i.e., 95% and 90% respectively.Hence, we reject H0.Conclusion:From above data it is evident that, proportion of people paying more than 200 Rs.for Television Is greater in Urban than in Rural areas.Proportion of people paying more than 200 Rs.for online streaming is same in Urban and Rural areas.H1: Proportion of people paying more than 200 Rs.for online streaming Is greater in Urban than Rural areas.

Average time spent on visual entertainment Vs Age group
Based on the average hourly data for Television, the age group 10-15 spends highest amount of average time i.e., 3.431818 hours on television the least amount of average time spent on television is of the age group 16-21 i.e., 1.756756 hours on a daily basis. whereas