Politics of non-reserved territories

In India, a large num­ber of Parliament seats are re­served for SC / Dalits (84 seats – 15.5%) and ST / Tribes (47 seats – 8.7%).

In ad­di­tion, SC/ST lead­ers are also elec­ted from non-reserved con­stitu­en­cies. Though, many polit­ic­al parties claim to es­pouse the cause of SC/ST, in real­ity only few main­stream parties give tick­ets to SC/ST can­did­ates in non-reserved con­stitu­en­cies.

Ratio of SC ST candidates

On ana­lys­ing non-reserved (gen­er­al) as­sembly seats from 2004-2013, we no­ticed re­gion­al heavy-weights SAD, DMK, PMK, TRS, and INLD have not given a single non-reserved seat to any SC/ST can­did­ates.

A closer look at the two ra­tios tables re­veals that:

  • BSP and LJP gave more seats to SC can­did­ates and few­er seats to ST can­did­ates when com­pared with na­tion­al av­er­age.
  • The two ma­jor na­tion­al parties, INC and BJP, are way be­low na­tion­al av­er­age.
  • SDF, AGP and JKPDP, which topped in ST can­did­ates list, didn’t give a seat to SC can­did­ates in non-reserved seats.

SC can­did­ates have higher par­ti­cip­a­tion than ST can­did­ates in gen­er­al seats. However, buck­ing the gen­er­al trend, INC gives few­er tick­ets to SC can­did­ates in non-reserved con­stitu­en­cies. Though, SC can­did­ates’ par­ti­cip­a­tion is higher, ST can­did­ates’ win­nab­il­ity is much higher.

Winnability of SC ST candidates

Note: We only con­sidered parties that won at least 20 seats in total.

In our next post, we will see how can­did­ates have fared at these places and com­pare the per­form­ance of SC/ST can­did­ates.

Optimising the 2014 election results page

Once we had the design and back-end tech­no­logy run­ning, it was time to build the live elec­tion res­ults page it­self.

Counting day screenshot

Sever-side or client-side?

The key con­sid­er­a­tions here were speed and load. We wanted users to get the res­ults as quickly as pos­sible without over­load­ing our servers.

Generating pages on the server lets us cache the res­ults across users. This is how most pages at http://ibn.gramener.com/ were built when ana­lys­ing his­tor­ic­al elec­tion. But while this re­duces the load on the client-side, it in­creases the page size quite a bit. The al­tern­at­ive is to just send the data across and have the browser render it us­ing Javascript.

These visu­als are in SVG – so that already ex­cludes IE8 and older browsers. For the rest of the browsers, as long as we kept the cal­cu­la­tions sim­ple, there would be no is­sue in terms of browser cap­ab­il­ity.

What lib­rar­ies?

The easi­est way for you to ex­plore this is to look at its source. To be­gin with, here’s a snap­shot of the URLs loaded by the page. (This is a snap­shot taken on a mo­bile broad­band con­nec­tion to sim­u­late the re­l­at­ively slower modem-like speeds.)

URLs loaded and speed

After the main /live page is loaded, we load 6 CSS / Javascript files in par­al­lel.

  1. Bootstrap 3 CSS
  2. Bootstrap 3 JS
  3. jQuery 2
  4. Underscore.js
  5. Our cus­tom stylesheets: style.css
  6. Our in­tern­al Javascript util­ity belt: G.min.js

We did ex­plore the pos­sib­il­ity of keep­ing the page lighter and go­ing without these lib­rar­ies. But three things sug­ges­ted oth­er­wise. First, their size was only 54KB gzipped. In con­trast, the rest of the page + data was 285KB. Second, there still are too many browser is­sues for us to get to a good browser speed. Lastly, most users on this page would be re­peat vis­it­ors – so these lib­rar­ies would be cached the first time and do not add to the burden.

We chose Bootstrap many months ago for the elec­tion ana­lys­is page and con­tin­ued the choice here. Bootstrap re­quires jQuery.

Underscore.js, how­ever, was chosen for a single reas­on: it’s tem­plates. Ever since John Resig cre­ated a micro-templates, we’ve been a fan of this ap­proach. For ex­ample, to cre­ate this table of can­did­ates in each con­stitu­ency (and yes, there really where 11 Chandu Lal Sahus con­test­ing)…

Table of constituency results

… and to have it up­dated every time the res­ults change, this is the code re­quired:

Table micro-template

The key to the struc­tur­ing of this visu­al was to load all stat­ic data up­front, and min­im­ally re­fresh with the dy­nam­ic data.

Structuring the data

The only thing that changes dur­ing the elec­tions is the num­ber of votes for a can­did­ate. So, the­or­et­ic­ally, if we sent 8,000 num­bers as up­dates every time, the browser can cal­cu­late the rest.

In prac­tice, how­ever, the votes are not avail­able un­til much later. Only the win­ner / lead­ing can­did­ate is known. So we went for the a single JSON file that cap­tures the min­im­al data re­quired for dis­play.

Summary JSON

(Why JSON in­stead of CSV? Because it sup­ports hier­arch­ies, browsers parse it nat­ively, and when gzipped, it’s not much lar­ger than CSV.)

This file grew from ~20KB (when all votes were 0) to ~65KB (af­ter res­ults were de­clared). When com­pressed, this is just 27KB, which makes the up­dates ex­tremely light.

Summary JSON network

Filtering the data

Since all data is avail­able at the cli­ent side, all fil­ter­ing also hap­pens on the cli­ent side. To en­sure that the filtered URL is share­able, we use history.pushState to change it, but do the com­pu­ta­tions in-browser without ac­cess­ing the server.

The URL struc­ture of the fil­ter closely matches that of the data columns. For ex­ample, http://ibn.gramener.com/live?2014-Party=BJP fil­ters those con­stitu­en­cies where BJP is lead­ing in 2014. These fil­ters can be se­lec­ted mul­tiply – so you can see the res­ults of BJP + SS here – as well as in­de­pend­ently – so you can see where BJP won in Rural India. This makes it in­tu­it­ive to the de­velopers to add new fil­ters as well.

It’s all in the browser

As a res­ult of client-side ren­der­ing and fil­ter­ing, the only time the user ac­cesses the server is to fetch new data. We provided a re­fresh but­ton on the top-right, but for the an­chors present­ing on PPI device in the CNN-IBN of­fice, we needed an auto-refresh fa­cil­ity that would re­fresh at least once every 10 seconds.

BnuAoIyIcAEu5b-

So we built a secret ?refresh= para­met­er and ap­plied it on this device. We did not ap­ply auto-refresh in the morn­ing since we were wor­ried about the kind of traf­fic load that the site could handle.

But once the traf­fic star­ted de­creas­ing at around 9:30am…

Traffic

… it be­came ob­vi­ous that a re­fresh of every 5 minutes wouldn’t hurt things, and would def­in­itely im­prove the ex­per­i­ence. We set it up to auto-refresh every 5 minutes by de­fault.

You can al­ways ex­plore fur­ther by ex­plor­ing the source at view-source:http://ibn.gramener.com/live.

Tech behind the 2014 results page

On 16 May, we pulled off an in­ter­est­ing tech­nic­al feat: serving 10 mil­lion pages worth of elec­tion res­ults in real-time, stay­ing ahead of many chan­nels or web­sites. This is the tech­no­logy be­hind ibn.gramener.com/live.

Data flowWe worked in part­ner­ship with CNN-IBN and Microsoft. CNN-IBN re­ceives the data from Nielsen, who work with the Election Commission to push this data at reg­u­lar in­ter­vals.

The latest res­ults – spe­cific­ally wheth­er a can­did­ate is lead­ing, trail­ing, has won or lost is fed in­to an SQL Server data­base at CNN-IBN’s data centre in Noida. During the early stages of the elec­tion, only the status is known. Later in the day, the ac­tu­al vote counts start stream­ing in. This data­base is the single source of data across all CNN-IBN prop­er­ties (in­clud­ing ibnlive.in.com) and Bing elec­tions.

We took an old Windows laptop perched on a ledge at the (rather cold) data center and in­stalled the Gramener count­ing day ETL scripts. The Python script pulls the data every 10 seconds from the SQL server (0.25s to pull all the data we need) and con­verts it in­to 2 JSON files: one stor­ing the can­did­ates, and an­other stor­ing their status / votes.

These files are copied via rsync every 10 seconds to ibn.gramener.com, which is an Azure VM in Singapore – a 4 core 7GB RAM Ubuntu sys­tem.

This server has a copy of the Gramener visu­al­isa­tion server in­stalled. These are HTTP servers that have built-in ana­lyt­ics and visu­al­isa­tion cap­ab­il­ity. 4 in­stances of that were set up in a state­less round-robin load-balanced con­fig­ur­a­tion. That is, when users re­quest a file, they would be sent to any one of these servers to bal­ance the load.

We wro­te a visu­al­isa­tion tem­plate that takes the data from the two files and renders them in real-time in­to the visu­al­isa­tion that you see.

To re­duce the load fur­ther, we set up nginx as a cach­ing proxy in front. Repeated re­quests would be cached by nginx and served dir­ectly. Again, we set up 4 work­er pro­cesses.

You prob­ably have sev­er­al ques­tions at this point on the choice of tech­no­lo­gies. Here are some of the an­swers:

Why just one 4-core sys­tem, 7GB RAM sys­tem? For the elec­tion cam­paign phase, we ex­pec­ted less than 100,000 hits every day. Further, the visu­als were based on his­tor­ic elec­tion data and would not change –mak­ing them highly cacheable. This con­fig­ur­a­tion could (and did) handle that load with less than 1% CPU util­isa­tion, which gave us a huge mar­gin of com­fort.

For the count­ing day, how­ever, the Bing team en­cour­aged us to have an­other server in the US as a backup. We’d set that up, but nev­er switched over the load-balancing, thanks to last minute run­ning around.

But at 9am on count­ing day, we were very wor­ried. The av­er­age CPU load was 50%. All 4 cores were maxed out peri­od­ic­ally. If the traf­fic grew the way it did, we were in trouble. But thanks to the de­cis­ive­ness of the res­ults, most people star­ted los­ing in­terest by 10am. (Here’s the hourly traf­fic.)

Traffic

Why nginx? Because at the mo­ment, it ap­pears to be the server be­st able to handle a large num­ber of con­cur­rent re­quests. (We chose 4 work­ers be­cause we had 4 pro­cessors.) Ubuntu was a lo­gic­al ex­ten­sion of that choice: nginx runs be­st on Linux dis­tri­bu­tion. (Having said that, we have a num­ber of in­stances run­ning on Windows server, both with and without nginx. But Microsoft, to their full cred­it, en­cour­aged us to use whatever tech­no­logy would give the page even the slight­est edge.) Azure was easy – it was cheap­er than Amazon.

Why Python? Mostly be­cause we know and love Python. It also has a fairly ef­fi­cient set of data pro­cessing lib­rar­ies that lends it­self to com­pact and read­able code. The main func­tion in the ETL script is un­der 150 lines, most of which are data map­pings. Here’s a sample of the code.

ETL code

Why every 10 seconds? We ini­tially got this wrong. We thought 60 seconds would be fast enough for every­body. But while we tested the feed from Nielsen and com­pared it with TV, we found that a lag of even a few seconds was jar­ring. Since the script took un­der a second to run, we brought the fre­quency down to 10 seconds. After that, we were even ahead of the TV res­ults peri­od­ic­ally.

All of this was the en­gin­eer­ing that went in­to the back-end to get the data to the browser quickly. On the front-end, an­other series of in­ter­est­ing en­gin­eer­ing choices were made to en­sure that the res­ults stay fresh as much as pos­sible. We will fol­low up with a blog post on that later.