Documenting the life of this guy

How is everyone feeling?


Are stock investors bearish on Facebook?
Will the Australian dollar ever regain its strength?
How do Canadians feel about the upcoming elections?
What do Americans think about Donald Trump running for president?
What is the latest social trend amongst kids these days?
What about pop culture?

These questions are all about feelings of the population as a whole. But even though we live in this age where everyone has opinions on the internet, it is still really hard to find answers to higher level questions about sentiment. Individuals make posts on Twitter, communities have discussions on isolated forums, influencers are writing up articles for their blogs and thoughts from all over are constantly being uploaded into this online soup. But nobody ever gets a real taste of this online soup, because it’s so hard to get a complete picture.

Of course there’s the best search engine ever. We want to know how the Facebook stock will perform in the next month. Here are some articles that might answer the question. But wait, those articles are written by individuals, who either formed opinions of their own or did some research to gather as much information as humanly possible to write those articles.

Regardless of how many articles we might read on a subject involving the population’s sentiment, we can only get a vague picture at best. If lucky, that vague picture extrapolates well into the actual picture and we make good decisions. If not so lucky, then we’ll just have to deal with it. No matter the outcome, to make any kind of decision based on population sentiment is nothing more than an educated gamble. This is of course, assuming that only technologies available today are used to survey the population.

Many companies employ market research teams. They want to make better decisions based on real data, so employees gather information from all over and do analysis to build a nice report. The same problem about population sentiment still exists if the question follows along the lines of “how many customers are we losing because we do not have a healthy salad option on our menu?”. But giant corporations like McDonald’s have less of a problem here since they have a nice big team researching that question. Not everyone that needs market intelligence can afford one.

We live in a world full of everyone elses’ opinions. If we as individuals, groups, or organizations can get a good grasp of what all these opinions point toward, then we’ll start making more informed decisions and less gambles. Idea for my next project? I’ll try to prototype some kind of sentiment engine, starting in stocks and finance.

Maximum Business Value;
Minimum Effort


Dineapple is an online food delivery gig that I have been working on recently. In essence, a new food item is introduced periodically, and interested customers place orders online to have their food delivered the next day.

Getting down to the initial build of the online ordering site, I started to think about the technical whats and hows. For this food delivery service, a customer places an order by making an online payment. The business then needs to know of this transaction, and have it linked to the contact information of the customer.

Oh okay, easy. Of course I’ll set up a database. I’ll store the order details inside a few tables. Then I’ll build a mini application to extract this information and generate a daily report for the cooks and delivery people to operate on. Then I started to build these things in my head. But wait, there is a simpler way to get the operations people aware of orders. We could just send an email to the people on every successful transaction to notify them of a new incoming order. But this means the business loses visibility and data portability. Scraping for relational data from a bunch of automated emails, although possible, will be a nightmare. The business needs to prepare to scale, and that means analytics.

Then I saw something that now looks so obvious I feel pretty embarrassed. Payments on the ordering service are processed using Stripe. When the HTTP request to process a payment is made, Stripe provides an option to submit additional metadata that will be tagged to the payment. There is a nice interface on the Stripe site that allows business owners to do some simple analytics on the payment data. There is also the option to export all of that data (and metadata) to CSV for more studying.

Forget about ER diagrams, forget about writing custom applications, forget about using automated emails to generate reports. Stripe is capable of doing the reporting for Dineapple, we just had to see a way to adapt the offering to fit the business’s use case.

Beyond operations reporting through Stripe, there are so many existing web services out there that can be integrated into Dineapple. Just to name a few, an obvious one would be to use Google Analytics to study site traffic. Customers’ reviews on food and services could (and probably should) be somehow integrated to work using Yelp. Note that none of these outsourced alternatives, although significantly easier to implement, compromise on the quality of the solution for the business. Because at the end of the day, all that really matters is that the business gets what it needs.

So here’s a reminder to my future self. Spend a little more time looking around for simpler alternatives that you can take advantage of before jumping into development for a custom solution.

Engineers are builders by instinct, but that isn’t always a good thing.

Academics Complete

I’ve written many last exams before. But today I finish writing my last, last exam; graduation awaits. Life is pretty exciting right now. School has been an amazing experience, but being done feels much better.

Goodbye Waterloo; Hello World!

Shutdown: 4A study term

This term has been very unfruitful. I picked up League of Legends after an abstinence streak from DotA that lasted 4 good years. This kinda makes me sad. I’ve also lost a lot of motivation, especially with books and academia. It really isn’t the gaming that’s causing this. It is more just a lack of willpower to carry on doing something that seems so pointless. There’s a whole new post graduation world out there, with new and relevant things to learn.

I’ve really taken a liking to software development. It’s funny because in first year I remember believing that I could never picture myself sitting in front of a computer all day typing away. Yet here I am now, not knowing what else I would rather be doing.

I also remember having long-term plans for myself to run a self-grown start-up right after graduation. It’s not that I haven’t been trying. I have been working hard on these things over the past years but nothing seems to have gained any valuable traction at all. With only 8 months left to graduation, this once long-term goal and deadline is suddenly approaching and hitting the reality of being unattainable. Such a realization kills the motivation to carry on pushing.

Visions of life after university used to be so bright and optimistic. But as the moment slowly approaches I realize how clueless I really am and that’s OK. Engineers are trained problem solvers; we figure things out, eventually.

Release Readiness Dashboard
Qb query Builder

The Release Readiness Dashboard chose Elastic Search as the source of Bugzilla data because it was fast and provided a convenient means of retrieving historic information for plotting trends. However, the native queries used to request data from Elastic Search clusters are long, ugly, and horrible JSON objects.

To solve the issue of dealing with ugly Elastic Search queries, the administrator of Bugzilla’s Elastic Search cluster has set up a helpful Javascript library for web developers. It acts as a middle layer for requesting data from the Elastic Search cluster using Qb queries instead of native Elastic Search queries. The RRDashboard uses the Qb Javascript library.

Qb queries, similar to Elastic Search queries, are JSON formatted objects. One might now ask the question of what Qb queries are, and how it has an advantage over native Elastic Search queries if both are just about as obscure as the other. Qb queries for one, are much shorter than native Elastic Search queries. They are also a lot more read-able.

Coming from an obscure background, the problem with writing Qb queries still persists. The release management team will have significant difficulty in adopting the RRDashboard if they are unable to form Qb queries that request the same data they are currently pulling using the Bugzilla search interface. I am now ready to introduce the Qb query builder.

When the release management team builds a query using the Bugzilla search interface, a long Bugzilla URL is generated containing the query parameters. As an input to the Qb query builder, this Bugzilla URL can be directly translated into a Qb query after specifying some other required parameters like the cluster to query against. In the background, here is what actually happens.

Upon submitting the form on the Qb query builder, a big Javascript event handler is triggered. The search parameters are extracted from the submitted Bugzilla URL, then re-appended to the Bugzilla search interface’s URL. Using a CURL request, we receive an enormous HTML string containing the Bugzilla search interface’s page filled with user-specified search parameters. Scraping then occurs on each individual page element to extract the search parameters, which are then appended into a JSON object formatted as a Qb query.

The benefit of such an implementation is that this is as simple as it will ever get for an end-user to work between Bugzilla and Qb queries. Unfortunately, there are also significant limitations to this implementation.

Scraping HTML of the Bugzilla search interface implies that the Qb query tool is very dependent on how the views of Bugzilla are generated. If the HTML layout of the Bugzilla search interface changes, the Qb query builder will lose its accuracy and start missing search parameters in the output Qb queries. A possible alternative is directly parsing the Bugzilla URL without doing a CURL request to get the HTML. However, this can be a potentially complex task to handle and will require more time beyond what I current have as an intern.

Another limitation to the Qb query builder is that it can be buggy. Bugzilla’s search interface is a mature product that can handle many different combinations of search parameters. The Qb query builder, on the other hand, is new and may not be able to handle as many varieties of querying as Bugzilla can. As a result, until the Qb query builder matures enough to handle all the corner cases that Bugzilla may throw at it, users of the tool may occasionally have to modify the output Qb queries manually to get what they want.

Ideas for Web Applications

Every once or twice in a 4 month term, I get a sudden rush of inspiration and ideas for cool things to take on as projects. This term at Mozilla there has been a lot of exposure with open sourcing, scraping the web for data and software tooling and automation. This lapse of ideas mostly revolve around those areas.

However, I usually get drained after the phase is done which is probably why I don’t feel like doing anything right now. I’ve already created repositories for these projects on GitHub and will contribute to whichever then interests me the most after this lazy phase is done.

  1. Tradester
  2. “Wouldn’t it be awesome to have a financial trading algorithm that anyone can write into and use freely?”. This thought came to me a few years ago when I finished first year at university. I wanted to start an open sourced script for an automated trading algorithm. The hope was for it to encompass all the smartness of traders everywhere into one algorithm for all who are smart enough to take advantage of it. Then I found MetaTrader4, a niche language which I never found the willpower to learn. So the idea died, or in terms of recent medical research, got put in animated suspension.

    Recently, I discovered that online brokers like Oanda and Robinhood (hopefully soon) are starting to offer REST APIs as one of their services to traders. Then it hit me that MT4 could be dropped completely by using common scripting languages like PHP instead. I.e. deploy algorithm script on any web hosting service, set-up CRON jobs, start automated trading. It also helps that web development languages are more popular than the obscure MT4, which is important considering that this will be open sourced.

  3. StoryLine
  4. This started about a year ago. The idea was for a common social platform where story writers could come together and collaboratively create new stories that were open sourced, like the GitHub for people who wrote in languages that were not meant for the computer.

    The project died mid-way when I realized that the product was taking a very bad approach to begin. It was made to try and take on a lot of stories and writers at the same time, which led to a very confusing user experience because of all the empty “social” views. I was working through the version control (back-end) component of StoryLine and Terry on the user interfaces when the project was killed.

    I recently found inspiration from looking at the WordPress model. The code is open sourced, so anyone can deploy and install their own independent instance of WordPress. At the same time, non-technical users can simply use where deployment is made simple. WordPress makes money from advertising on the sites and from writers who wish to export sites to their own hosting servers.

    Instead of a common platform that acts as GitHub for writers, StoryLine is likely to be better off as a deployable web application for all end-users. Each instance of StoryLine hosts a single, independent story. At some point later after maturity, a unified tool for users to create stories easily on can be set up, much like the existing

  5. Languify
  6. I love fontawesome. It’s open sourced, so simple, and adds so much value when used in the right context. Languify, like fontawesome, is an open sourced CSS library that contains commonly used words and phrases in different languages. Languify enables developers to create views that can be adapted to any language they want, just by the loading of a .css file.

    For example, <span class="lf lf-hello></span> shows “hello” when the “en.css” file is included on the page. When “en.css” is replaced with “zh.css”, the HTML page will display “你好” instead of “hello”. Aside from the Languify css file being loaded, no other changes to a HTML page are needed to have it adapted to a community that reads a different language.

    The repository currently contains some css files which have the words manually filled in. It is not scalable and I see it getting messy very quickly, especially if people start contributing to the library. For this project to get anywhere, a front facing tool has to be made to manage the addition/editing of words in the language library. The tool will then automatically generate the [language].css files that will be pushed into Github.

    Those are the project ideas that I will be building upon.
    If anything interests you, reach me!

Release Readiness Dashboard
Rules for Scoring

Looking back at groups of queries, we know that each individual group is a metric that the Release Management team cares about when deciding on the release readiness of that particular version. One of the key objectives of the dashboard is being able to automatically compute a release readiness score based on important metrics. In the screenshot above, we notice that two of the groups have titles with colored backgrounds (green), indicating the status of that group.

Statuses of each group on the dashboard are denoted by 3 primary colors of green, yellow, and red. To automatically determine the status of individual groups, computational rules must be scripted. In a future implementation, individual group statuses in each version will also be able to be aggregated for computing an overall release readiness score for the particular version.

To script a rule for any group, hover over the group and click on the tachometer like in the above screenshot. This will bring up a modal containing the boilerplate for scripting rules for that group as shown below. The user copies the JavaScript in the text area, and pastes it into a js file as shown in the specified directory. Scripting can begin immediately after that, with inline comments available to provide guidance if necessary. Note that if a user scripting a rule does not have deployment rights, the stack owner must be contacted at the end to upload the script file.

In the background, every time a version view is loaded the server checks for the existence of /assets/rules/rule_[group_id].js for each group that will be loaded. If found, the script file is loaded on the view and executed when data is returned from the Elastic Search cluster that mirrors Bugzilla. The resulting status color from executing the rule is then applied to its corresponding groups’ title. Using an architecture like that, we reap the benefits of:

  • Scripting of rules enabled for both default and custom groups.
  • Flexible scripting for defining rules with any level of complexity.
  • Upload into /assets/rules/rule_[group_id].js if group has rule, delete the file otherwise. i.e. scripting rules for a group is optional.
  • Anyone is free to access the boilerplate and start scripting a rule, but only the stack owner has deployment rights.

Mouse / Moose
Taste of Good Old Times

Today I meet Erasmus again after almost 4 long years. Back in Singapore I remember we would hang out with the usuals every other day for lunch/dinner/12am meals. How time has gone by since then. I heard about how everyone was doing well and we had a great time catching up.

Ever since moving to Canada I have often wondered about whether old friends will meet again and despite the years, still have great time hanging out. I occasionally get dreams about meeting my old friends and feeling awkward in the moment because we’ve all changed and been disconnected for so long. The image of these friends frozen in my mind exactly as they were when I last saw them years ago.

Yes, indeed we have changed. The friends we once thought were the best people ever have grown, matured, started new careers, gotten new experiences and built new lives. But friends will always be friends. Meeting with Erasmus today felt good, and sort of heartwarming. It’s like watching Toy Story 3 after many years since Toy Story 2, but way better because this is real life.

Release Readiness Dashboard
Groups of Queries

One of the key requirements for the Release Readiness Dashboard was to have multiple widgets of information. Each widget was to contain visuals for at least one data set. In addition, there was also a need for a numeric view that only displays the current count in cases when a view of historic data is not necessary. The above design was the result of those requirements.

Interpreted in terms of what is stored in the database, each grid on the view represents a group that can have multiple queries. I.e. a one-to-many relationship. Groups can be specified to be a plot, number, or both, eventually affecting how it will be loaded on the front view.

Queries are stored in the database as a JSON formatted string with Qb structure, and used to dynamically request for data from Elastic Search each time a page is loaded. No Bugzilla data is stored locally on the dashboard’s server. After the data is retrieved from Elastic Search and depending on the group’s type, a plot (or number) is then generated on the fly and appended onto the view.

Moving forward, another requirement was to have default groups that will display for every version of a specific product. Being linked to through a group.version_id meant that a new group had to be created every time a new version is being set up. The solution to this problem: Polymorphic objects.

With each group as a polymorphic object, we are able to link rows in the same group table to different entities like product and version. In terms of changes to the schema, we simply remove group.version_id and replace it with a generic group.entity and group.entity_id. Note that an actual FOREIGN KEY constraint cannot be reinforced here.

We are now able to define default groups by linking it to a product as a polymorphic object. However, because a single product group and its corresponding queries can now used across different versions, some version dependent text in fields like query.query_qb must be changed dynamically before it can be useful.

To solve this issue, soft tags like <version_tag> and <version_title> are stored in place of the version dependent text and replaced by the corresponding version’s values when being requested on the server.

So here we have it; Multiple groups that each display multiple data sets:

  • With the option to display the group as a plot or a current count.
  • With the option to display the group for an individual version (custom group) or for all versions of a product (default group).



Release Readiness Dashboard
Running with the Train

Simply put, the release readiness dashboard is an overview of various versions of Mozillian products Firefox, Fennec, and Firefox OS. It provides a trend (or current number) of the various kinds of bugs that the release management team cares about when determining whether or not a particular version is on track to be ready before the scheduled release date. The product versions appearing on the dashboard are displayed automatically by following the train model described in the rapid release calendar.

My first attempt to capture the train model in the dashboard’s database worked but was very inefficient and had serious redundancy issues. The following tables and fields are a part of the initial database schema:

  1. product – id, title
  2. version – id, title, product_id, central, aurora, beta, release, deprecate

For example, if the version that is currently on central is wanted then current timestamp is checked for being in between version.central and version.aurora, which are both timestamp data. Based on that logic, the scripts are able to automatically identify currently active versions.

However, this means an awful lot of timestamp comparisons and also a lot of the same dates being stored (in different coloumns of succeeding rows). How to better capture the train model so that the database is not storing redundant data?

After some brainstorming and discussions with both Bhavana and Lukas, a solution for a normalized data schema was found. See the above for the finalized solution in the form of an ER diagram.

On every reasonable period, a CRON job will be run. In the script, we check for a new train cycle by reading from an external data source. If a new cycle is found, it is insert as a new row in the cycle table. Following that, new rows are made in the version table for each product that will have a version entering the central channel. Now, the mapping begins. The newly created versions are mapped to the newly created cycle as being in the central channel. Previous versions are also bumped up a channel and mapped to the newly created cycle. The rows in the version_channel_cycle can be interpreted in English as the particular version that is in the particular channel for a particular cycle.

When retrieving data, this means that we only have to do in between timestamp comparisons once in the cycle table, then we get the which is used for the other database queries. Convenient. In terms of storing data, this also means inserting rows every time a new cycle is entered, as opposed to updating version rows over and over again, which can get messy. The same dates are also stored only once now, so the redundancy is gone.

All things considered I’m quite happy with the new database schema. Big thanks to Lukas for pushing me to rethink my initial design and walking me through when I was lost.

Perhaps at some point in future I will write more about the other aspects of the database that deal with the storage and use of groups and queries. This project just keeps getting more awesome. Excited to see it go live.