Migration patterns

ISB’s SRITNE (Srini Raju Centre for IT and the Networked Economy) is a research canter that focuses on the business and societal value of IT. Gramener collaborates with SRITNE to develop and promote visual analytics, and to help foster a culture of open data within the community.

As part of this collaboration, we jointly presented ‘Visualisation of Migration Patterns in India’ at the Bangalore Open Data Camp. Access to the source dataset for this analysis was provided by ISB and Visualisations were done primarily on the Gramener Visualisation Server. MS Excel and R were used for exploratory analysis. The aim of this exercise was to take an outside, analytics view of the Migration Patterns to explore alternate possibilities of representation and visualisation, rather than from the lens of a Demographics Analysis Expert.

The Indian NSSO (National Sample Survey Office) had conducted the 64th round survey on ‘Employment & Unemployment and Migration Particulars’during July ’07 to June ’08 covering 1,25,578 households and 5,72,254 persons.

Of this sample, ~30% were found to be migrants, i.e. those whose last usual place of residence (UPR) was different from the present place of enumeration. In this survey, the usual place of residence of a person was defined as a village/town where the person had stayed continuously for six months or more. Amongst the migrants, a majority were found to be moving within the state (85%) as opposed to those moving across states (15%). Women formed a sizeable majority of this migrant population.

Intra-state migration patterns

image

The map on the left has the intra-state migration pattern (excluding inter-state numbers) showing the absolute number of migrants moving within each state/UT. Green indicates higher migration and red is the opposite. Based on this map, the 5 most populous states in India account for the top 5 intra-state movements, except for Bihar which comes a close 6th. If we rescale the numbers by taking migrants as a percent of the state/UT’s survey size, as shown in the right map, the results change completely. The top 5 states with highest percent churn are Andhra Pradesh, Himachal Pradesh, Kerala, Gujarat and Andaman & Nicobar Islands.

Inter-state migration patterns

image

If we now look at the Inter-state migration pattern (excluding within-state movements) by plotting the Net Inflow of migrants into each state/UT (left-hand-side map), the states with highest net outflow of migrants are Uttar Pradesh and Bihar, while those with highest net inflow are Maharashtra and Delhi. If we rescale the numbers, as a percent of the state/UT’s survey sample, the story changes, yet again. All the Union Territories in India have the highest Net percent Inflow, with Chandigarh showing the highest value at 41%.

Inter-state migration Heat-map

state-migration-heatmap

In order to get a sense of exchange of migrants happening between the states, we plotted the numbers on a heat-map. The y-axis of the heatmap has ‘From-State’ while ‘To-State’ is on the x-axis. The height of each heat-map box is proportional to the net outflow from the contributor-state, while the width of each box is proportional to the net inflow into the recipient-state. The colour is representative of the number of people moving between the states – darker the box, more the number of people.

As can be seen, the top destinations for people leaving UP are Delhi, Maharashtra and Uttaranchal respectively. For Bihar and Rajasthan, the top destinations are highlighted accordingly. What is more interesting is the pattern of top destinations for each of the states. A clear trend is the consistent preference of people across regions to migrate into states with geographical proximity. The survey had also covered a set of international in-migrants, wherein Bangladesh the top contributing country has a sizeable proportion of its migrants moving to West Bengal.

Migration across Rural-Urban areas

image

When migration was viewed from the perspective of movement across Rural – Urban areas, a surprising trend found was the extent movement within Rural Areas – more than half of migration in India happens amongst the Rural regions. About 40% of migration is towards Urban areas. A contra-trend noticed here was for the Union Territories and North-Eastern States – over 70% of migration in these areas is towards the Urban regions, unlike the rest of India.

Reasons for migration

migration-reason-age-gender

When an analysis of Reasons for Migration was done at the Country level, some key trends were observed. Women, who form a sizeable majority of the migrants primarily migrate on account of ‘Marriage’ and their typical age at marriage is between 15 and 24. For men, the key reason for migration is ‘Employment-related’ and this primarily happens in the age band of 18 to 40. Consequently, migration due to ‘Movement of Parent/Earning member’ forms another key reason. ‘Education’ is also found to be a driver of migration and this typically happens for men and women until the age of around 23 years.

When we looked at the Reasons for Migration vis-à-vis States, a few interesting patterns showed up. People in Tripura migrate mostly due to Forced Reasons/Disasters, whereas UP witnesses Marriage-related movement. Kerala and West Bengal witness migration because of Housing related reasons, whereas a lot of people in the scenic state of Himachal Pradesh migrate for post-retirement life.

migration-status-reason

It is evident from the above heatmap that a majority of the women who migrate for marriage, end up doing Domestic duties, while men who move for employment end up as Wage employees/labourers.

migration-reason-year-gender

The survey sample had a good mix of people who had migrated over the years, dating as far back as the 1930s. When we analysed the pattern of evolution of migration reasons, interesting trends emerged. Until Independence, migration was subdued and was restricted only to the women getting married. Post-independence, migration numbers have steadily increased over the next 60 years. After 1970s, increasingly more people started moving for Employment-related reasons. This was also accompanied with migration of the dependent families. It has been only after the 1990s that people move in significantly larger numbers and for reasons such as Business, Education, Housing, Post-retirement, Healthcare – more inline with the Indian Economic Development story over the past 60 years!

Composing data visualisations

How does one create new data visualisations? Apart from the art, is there a science to it?

Let’s explore a few popular charts. We have the vertical bar graph small-vertical-bar or the horizontal bar graph small-horizontal-bar. The stacked bar small-stacked-bar. The variwide or Marimekko chart small-variwide. The waterfall small-waterfall. The scatterplot small-scatterplot. The treemap small-treemap. And so on.

The first thing you’ll observe is that all of these are a series of rectangles. (We’re treating the dots on the scatterplot as little squares.) The only thing that varies across these charts is the position and size of the rectangles – and the colour as well.

That gives us a hint. Perhaps there are many ways of creating visualisations just by changing the position, size and colour of rectangles. For example the horizontal bar graph small-horizontal-bar can be constructed as follows:

  • The x position is constant for each rectangle. It starts at zero.
  • The width is proportional to the value of the series
  • The y position is proportional to the index of the values (1,2,3,…)
  • The height is constant for each of the bars
  • The colour is constant too.

Whereas, if we look at a horizontal stacked bar small-horizontal-stack, then:

  • The x position is proportional to the cumulative value of the series.
  • The width is proportional to the value of the series
  • The y position is constant at zero
  • The height is constant for each of the bars
  • The colour is based on the index of the values (distinct colours labelled 1,2,3,…)

Generalising this, we can construct a table like this that shows the structure of various visualisations:

Chart x width y height colour
Vertical bar chart index constant constant value constant
Stacked bar index constant cumulative value index
Waterfall index constant cumulative value constant
Scatterplot value constant value constant index
Horizontal bar chart constant value index constant constant
Variwide cumulative value constant value constant

That leads to a line of thought: what if we tweaked this table? Would we get new visualisations that might be interesting?

Let’s experiment with a few.

waterfall-variwideWhat if we took the waterfall chart, and made the constant widths proportional to value, instead? The waterfall chart shows a cumulative series of values (e.g. percentages). This new chart – a cascade chart – allows us to depict each bar’s relative importance as well as value.

boxesWhat if we kept the width, height and y constant, and just let the x values vary as the index? It would just be a row of boxes. But we’d have the option of colouring them with a value. This could be useful when showing performance along a discrete series (e.g. attendance by weekday).

boxesWhat if we allowed the x, y, width, height and colour to vary with a different value? The graph looks like a scatterplot, but every dimension here – position, size,  colour, even aspect ratio – indicates some informational measure.

This chart can, for example, show the position and spread of two metrics. For example, if the X-axis were sales, and the Y-axis were price, each bar could be the distribution of price and sales in a branch, with the colour indicating growth of the branch.

Just using the combinations discussed above, there are 75 possible types of visualisations – many of which are meaningful in different circumstances. And this is just using rectangles.

What we’ve done here is mapped data to attributes of a visualisation. This is part of a generalised approach to graphics, similar to that covered by Leland Wilkinson’s Grammar of Graphics and implemented in libraries like ggplot2 or D3. Once we establish that basic concept – that a chart is a mapping of attributes to data – the variety of charts you’ll be able to create is unlimited, and you move from being a user of charts to a composer of data-driven visualisations.

Tracking computer usage

CIHL (a consortium of universities in Andhra Pradesh) offers a masters course in information technology. As part of that, the computer usage for volunteering student was tracked for 7 weeks. The raw data shows how long each application was used.

We visualised the total usage of the top applications by student.

msit-computer-usage

Before we go on to the results, a few words about the visualisation.

  • Each row is one application. They are sorted by usage.
  • Each column is one student. The width of the column is proportional to their usage. They are sorted by the amount of time spent on computers.
  • Each cell shows %time spent on the application. For example, 20% means that student spent 20% of her time on that application.

This is similar to the heatgrid we saw last month, but with a difference – the widths of the columns are not constant, and represent the hours of usage. This means that the colour represents not just the % usage by a student – it has an additional significance. The amount of purple ink used in each row is the total hours of usage of the application.

Now for what we found.

Browsers are clearly the most popular application, with people spending 25-50% of their time on the browser. Firefox is the most popular browser, followed by Chrome. Only 3 students used IE as their main browser.

Microsoft Word emerged as the second most popular application. This is what students submitted their assignments in.

VLC was the next most popular, ignoring the time spent on Windows Explorer. While their coursework did require them to view a number of videos, an analysis of the window titles showed that the percentage of course-related videos were in a minority. This also provided us with a number of interesting movie recommendations that has kept us busy last month.

Two games made their way into the top applications list: Half Life and Warcraft III. While only 4 students were serious gamers, the time they spent on this was significant. The student spending maximum time on the PC spent almost 20% of time on games, with another 30% on movies. (We were yet to investigate whether this had a positive or negative effect on grades.)

Chat applications did not show significant usage. IPMsg was the most popular, with up to 0.5% of time being spent on this. Google talk was used by fewer people, but those that used it spent up to 3% of time on it.

But the strangest observation was regarding two students, both of who spent about 10% of their time looking at screen savers. One of them was, in fact, a blank screen saver. We have still not been able to figure out what exactly they were up to.