Sunday, August 21, 2016

As good as it gets: Measuring Product Quality

Working in Business Analytics,pretty much everything we measure can fit into one of three buckets: 


How much 𐄿 How Fast 𐄿 How Good




This post is all about measuring "how good".  While it is written in terms of software, you can apply the same framework to anything.

What is "Good"?


In software, How good is sometimes referred to as the "quality" of the software.  You can typically group any quality measure into one of two categories: objective quality and subjective quality.  First, an overview of each type...


Stolen from http://keydifferences.com/difference-between-objective-and-subjective.html

Objective Quality

Does the software do what its specifications dictate?

You order an orange from a restaurant and they return one that is rotten.  That orange is objectively wonky.  Don't eat that orange.

Subjective Quality

Does the thing meet the expectations of the person measuring quality (agent)? 

Note: it's really hard to measure subjective quality without first defining the "agent" who has the expectations.

Businesswoman compares fruit
You order an orange from a restaurant and the waiter gives you an apple.  It might be a fantastic apple, but you wanted an orange.  From what I read, those two fruit are incomparable.

Objective Quality


When measuring objective quality, you will start with some specification for what good software is. These specifications may be outlined in your scoping document or tracked in some other way.  However you keep track of the specifications, it's important they are explicitly documented and unambiguous.  


In software, there are two types of specifications: functional and nonfunctional.

Functional Specifications

Your product has specifications that tell you what it does and how it behaves. 

When I click on this button, the "edit your profile" page loads.  When the pH goes above 8, the water turns green.  The 3/4 inch screw is 3/4 inches long.

Nonfunctional specifications

Your product has specifications that specify how reliably it operations.

The uptime of the website is 99.999%.  The latency of the website is 200ms.  The car doesn't burst into flames when making a left turn. 

Subjective Quality


Unlike objective quality, subjective quality is a measure of how well the product meets the needs or expectations of the stakeholders.  The term "stakeholder" is very broad: it can be any party somewhere in the value-chain.  To keep it simple, we'll focus on the product user for now and call them "agents".


Oh, beautiful sandwich woman.  Who are you?
Example: There's a very disappointing restaurant I eat at.  The sandwiches I order meet their specs: 6-inches and and constructed with the correct ingredients.  Still, I'm usually disappointed when the sandwich doesn't look like the one in the ad with a beautiful woman taking a bite.  

Just like objective specifications, setting the right subjective specifications and documenting them is critical step to achieving them.  Make sure you communicate with the agents early and often!

Now what?

If you're looking for ways to measure and improve the quality of your product, I've found a lot of success with organizing specs into the framework above. With a comprehensive way to think about quality, it is easier to discuss objectives, set targets, and achieve results.  

Try it out and let me know what you find!

If you'd like to read more about measuring "how good", I recommend this great article.

Monday, August 15, 2016

ETL: Friendly Robot or Ticking Time Bomb?

An ETL is a friendly robot that converts data into information
This friendly robot was drawn by HyperPunch84 on newgrounds

Overview


This post outlines a way I structure ETLs to keep them friendly and maintainable instead of ticking time-bombs, waiting to blow up my week.  Note that this is a fairly technical post.  If this isn't up your alley, you have my blessing to skip this one. Next week we'll get back to some non-technical riffraff. 


What is an ETL?



Despite what it looks like in the image, information is what you want
https://en.wikipedia.org/wiki/Digesting_Duck
If you remember my last post, data is the raw material that must be processed before it has any analytical value.  An ETL is the process that converts data to precious Information.  This is a critical step for business analytics because we need to do analysis on information, not data.

ETL: Extract, Transform Load


  • Extract: the script takes in data from somewhere.
    This is kind of like downloading a report to your computer.  Maybe it's a cloud service like Salesforce. Maybe it's a log file, like the high quality, artisanal log files lovingly generated by FreeWheel (plug).
  • Transform: changes the data's format and applies business logic.
    This is kind of like automating a pivot table or calculations from an excel report
  • Load: deposits the formatted information into an accessible place.
    This is kind of like saving the results of your excel "transformation" into a master excel file you use for analysis


How I structure ETLs


When first learning to write ETLs, I didn't know how to make them readable until someone else was hired to read them.  My most maintainable and extensible ETLs all now follow this 3-file format.  It's been effective for me and the other analysts I've shared it with.  


The gist: there is one file for the main flow of the program, another for all the functions that take more than one line, and a third for all predefined variables.
When something goes wrong, the main acts as a
 table of contents to isolate the error. Using this structure, when the ETL needs to be extended I'm usually just a config change away.

Main

The main file is for the flow of your ETL.  This is where you'll call the functions in the right order and iterate over the objects being transformed.  It's ok for this file to be small and straightforward.  

Pseudocode main example:

// etl_object_list is an iterable list of things to extract
import etl_object_list from configuration_file

import extract_function, transform_function, load_function from function_file

For all etl_object in etl_object_list:
extracted_file = extract_function(etl_object)
transformed_file = transform_function(extracted_file)
load_function(transformed_file, etl_object)


Function

The function file is where I put all the supporting functions.  This makes my code MUCH easier to read.  When something is off, I can easily follow the program flow in the main file then jump into the details here.  

Pseudocode functions example:

import platform_sdk

def extract_function (etl_object):
//get the request parameters to call the proper extract function
sdk_request_parameters = etl_object.request_params

//call the sdk with the right parameters to get the data
extracted_file = platform_sdk.get_report(sdk_request_parameters)

return extracted_file

def transform_function (csv_out):
add one to each value of csv_out for some reason
return transformed_file


def load_function (transformed_file, etl_object)

//get the location that the file should be loaded
        load_location = etl_object.load_location


        put the transformed_file into load_location


return true


// this is the class for etl_objects.  This is used to store all the object variables in a nice place.  This could probably live in the config file if you want

class etl_object:

   def __init__(self, request_params, load_location):
      //the parameters for the SDK's request
      self.request_params = request_params
      
      //the location to load the transformed data

      self.load_location = load_location


Config


The config file is where you keep the defined variables that your program needs to run. Originally, I played around with throwing this all in the function file at first, but it got too messy. 

When done right, this is the only file I need to update.  Need to add more objects?  Define them and add them to the list.  Need to halt etl on an object?  Cut it from the list.  Ever need to handle a new data type?  If I've coded the functions correctly, I can manage it all from the config file.

Pseudocode config Example:
import etl_object from function_file

etl_object_list = [object_1, object_2]

object_1 = elt_object(request_params_1, load_location_1)
object_2 = etl_object(request_params_2, load_location_2)

BONUS: Some ETL "Things I Should Have Known Earlier" (TISHKE)

http://www.newyorker.com/cartoons/a16089

I'm not sure these will be applicable to anyone else, but here's a list of things that would have saved me a lot of time if I had known them when I started.

TISHKE 1: Pick a sensible programming language


Don't try to impress your friends with some fancy language like Perl.  Just pick something that works, like Python.  In fact, just pick Python.  

  1. Python code is human-readable.  Readability is always more important than you think.  Remember, your script will be read more times than they are written. 
  2. Python is very easy to learn.  Not only is it accessible, but there are a TON of valuable resources.
  3. Two Python libraries will probably do most of your work for you (numpy / pandas). I put off learning pandas because I always thought it would be faster to write it myself. I was wrong. The turning point was realizing every function that took me longer than an hour to write already existed in pandas. I was literally wasting time to making a shittier version of something I already had.
  4. The Python community is passionate and friendly.
    This is Ravi.
    We became friends geeking out over Python at a greek restaurant.

TISHKE 2: Stop being cute and get er done


When I started coding ETLs, I would take incredible pride in doing something in a clever way when a simpler way would have sufficed. More than once I got so hung up on doing something clever that I lost track of the goal entirely and had to throw away what I wrote once it was done. The point is, I'm much more productive when I stay goal oriented and there's always time for fun projects after work.  


TISHKE 3: Python Requests are easy, don't be afraid of them


After relying heavily on partner SDKs to handle my HTTP requests, I ran out of options one day and had to implement a GET request in Python. Turns out the requests library I was avoiding for so long is dirt simple. All said, the entire ETL took 4 hours from reading documentation to deployment; ~50% faster than simple ETLs typically take me.  

TISHKE 4: Write readable, commented code you doofus!


Think you don't have time for comments? Well, you do. I'll tell you who doesn't have time: the you from the future who just caught a bug in your code.  Remember to comment your code and write unit tests while you still remember what the code does.

Sunday, August 7, 2016

Business Analytics: Getting Value From Data

Amazing view from Haozhi's Beijing apartment (PRISMA)

Business Analytics exists to...

On my current business trip to our Beijing office I was asked to give an overview of what Business Analytics does. It's a very good ask; Business Analytics is a relatively new function and means different things at different companies.

At FreeWheel, business analytics exists to...
Manage and analyze Enterprise Data to help decision makers make better decisions.1 
To most of you that probably sounds like white noise: a wave of words with very little information. The goal of my Beijing presentation is to make that statement meaningful; this blog post is an overview of part of that presentation.


I'll start by reviewing a framework that describes the phases of data refinement (how data gains value). Then I'll introduce an approach of applying knowledge to drive business results.

1 Philosophically, we strive for a hybrid analytics culture where Bus. Analytics may do the analysis or may empower other departments to do their own by accessing our managed, central EDW (enterprise data warehouse). More on this in a future blog post.  


Data, Information, Knowledge Framework


Data, Information, and Knowledge are words that are colloquially used interchangeably, but have very specific meanings in Business Analytics.  They each describe a different phase in the transformation of facts into insight.  Yes, this sounds painfully pedantic.  Stay with me for a minute because it's quite useful for understanding how to refine raw measurements. 

Full disclosure: DIK is actually 3/4 of the DIKW pyramid (Data, Information Knowledge, Wisdom), but I prefer just the first three for this purpose.
Here's a short story that describes the terms above (stolen from this video):
I think this is what a factory looks like.  Don't ask me, I make software.
Imagine at 8AM you are walking down a factory floor.  You are walking along a pipe.  On that pipe you come across a pressure gauge; itt reads 15 PSI.  This is a fact.  That is data.
  • data:  Data is raw, unorganized facts[2].
    At 8AM, on that specific pipe, the pressure is 15PSI.

You continue walking for a while.  Then you go up some stairs and into a room.  This room is the control center and it has a monitor in it.  
The monitor shows a graph of pressure over time, and it shows the pressure in that pipe is rising very rapidly.  That is structured data.  That is information.
  • information:  Aggregated, organized, structured data presented with context[2].
    Between 7AM and 8AM, the pressure in the pipe you walked past has raised evenly from 1PSI to 15PSO
Interesting, the pipe pressure is increasing.  What do you do?  Well, it's hard to say because you don't have any expectations or understanding of this pipe.  Does this pipe do this every morning?  Maybe this isn't normal, and the pipe is in danger of critical failure!  Maybe that's what you want because you're testing what happens under critical failure.  The point is, that information is only valuable if you have a conceptual model for the situation.  That is knowledge.
  • knowledgeKnowledge is a collection of information, beliefs, and expectations that form understanding.  It is the most reformed and useful of the three.  With knowledge, we know what actions will have the best outcome.  [4]
    The pipe shouldn't be doing this!  The pipe is in danger!  Emergency release!

Data Refinement Flow

The take-away is: 
  1. We want knowledge because knowledge is the only thing that can influence decisions.  In other words, only knowledge has any value.
  2. Knowledge can be gained through analysis of data and the subsequent interpretation of information.  

To Find Answers, Start with a Question

The DIK framework outlines the technical approach of generating knowledge.  Before I wrap up this post, I wanted to quickly introduce how to practically generate business value from that knowledge.

The above DIK framework makes it sounds like you can put a bunch of measurements into one end of a machine and shoot out decisions from the other (at least that's what it sounded like to me!).  However, the only way for those answers to deliver business value is if they inform some decision that drives some action with business value. A good way to ensure that happens is to start from the result we want to achieve and work backwards.
This image is borrowed from Peter Murray

  1. Start by identifying the desired results.  This may be a specific Operating Plan or goals from an initiative.  You may begin with a general idea like: do X better, but it's critical to be specific.  Being specific enables result tracking and ensures different departments are aligned on what success looks like.
  2. Next, consider the actions you believe are required to achieve those results.  These can be broad like: make people happier, or specific like: change our pricing to increase revenue.
  3. Then, consider questions that inform those actions.  Those are the questions you want to answer with knowledge.  These must be testable and quantifiable.  
There's a lot more to this approach.  Stay tuned for more in future blog posts.

Business Analytics in a Nutshell

I found this handsome guy in a Beijing 7-11 today
That's what Business Analytics does at a very high level.  On one side, we will work with decision makers to build understanding around decisions they need to make.  On the other side, we build and manage the enterprise's Information that helps inform those decisions.  By being data-informed, FreeWheel makes better decisions.  

Sorry for such an abrupt stop to a meaty topic.  Expect a future post on how you tactically do this.  


Monday, August 1, 2016

How it worked; Hacking in Seattle



At NBCU's hackathon, the Advengers assembled to build adHarmony: a second screen app for advertising. You can read about what it does in my last week's blog post "What I built hacking in Seattle".



Revealing how the trick is done

This post is all about how adHarmony worked.  You will probably find this more technical than my last two posts, but I'll try to keep it accessible.


a VERY quick VERY high level recap of adHarmony


adHarmony is an app that runs on a phone and listens for ads being played on some other device (a TV, another computer, whatever).  When adHarmony hears an ad, it gives the viewer a prompt to provide feedback and engage with or skip the ad.  The app delivers information about the user's feedback to a backend service.  

adHarmony Project Scope

  1. adHarmony can hear and recognize nearby playing ads
  2. adHarmony gives the viewer options specific to that ad
  3. If the user chooses to skip the ad, skip the ad
  4. Whatever the user chooses, keep track of what was selected

Now that you're caught up, the rest of this post will make a lot more sense!

adHarmony can hear and recognize nearby ads




How does adHarmony know when an ad is playing, and not the content?  The secret is: the ads we used in the demo are special.  




LISNR: the real hero

Using LISNR, we put a high-pitched, inaudible tone into the ads we demoed.  Humans can't hear it, but adHarmony (and apparently dogs) can.  
Supposedly, dogs hear a low hum

In each tone is encoded a secret message that the app hears.  Our friend Jill from LISNR described as "an intelligent audio QR code".1



adHarmony gives the viewer options specific to the ad



When the app hears the secret message, it tells LISNR the secret message.  LISNR then responds with a specific message we configured ahead of time. The message tells adHarmony what to show for this ad.  Namely...
  1. what the swipeable image should looks like
  2. what web address to load when the user swipes left
  3. what web address to load when the user swipes right
You wouldn't notice, but the app loads a webview after the user swipes.

If the user chooses to skip the ad, skip the ad


WARNING: below is some real inside baseball.  The gist is: for the demo, we replaced the normal program that handles ad playback on NBC's website with a custom program that waits for our server to tell it to skip an ad.

For the demo, we changed how an NBC video player loads on our computer using a web proxy called Charles. We actually changed the normal functionality in two ways:  
  1. First...ya know how a website might load any video ad when you load it?  For our purposes, we made sure a specific ad (the demo ad) was returned on each page load.  
  2. Second, we loaded a special version of ad manager we tweaked for the hackathon.  The ad manager is the piece of code that coordinates video ad playback on a website.  

How to force the ad you want to return is pretty basic testing functionality that FreeWheel supports.
The functionality we snuck into the special ad manager is noteworthy!

What makes this ad manager special is that it continually pings our dev serv
er asking if it can skip the ad. 

If the dev server ever says "ok", the ad manager stops playing the ad and does whatever its supposed to do next: play another ad or go back to content.  The dev server only says "ok" when the adHarmony app has received an ad skip request.


Tracking Usage

Whenever the user app hears an ad or has some interaction, the app sends that message to the dev server and it's logged in a database.  We built out the backend, but there's nothing too special to talk about.  Nothing flashy or over-designed.  Just a Proof of Concept with a pretty basic database schema.

Putting it all together


  • 1&2. When the webpage loads, it get's the special ad manager and the forced ad response from "the FreeWheel ad server" (actually local via Charles web proxy).
  • 3. While the ad plays, the adHarmony app hears the LISNR code
  • 4. AdHarmony pings the LISNR service and renders what returns
  • 5. When adHarmony hears an ad, or whenever a user interacts, out dev server logs that event
  • 6. While playing an ad, the ad manager continuously checks with the dev server to see if it should skip the ad.  When the dev server says ok, the ad manager skips that ad.

Advengers! Assemble!


So that's how it works.  I was thinking of ending the blog post with where we could go from here or what I learned, but I already covered those things in my past two blog posts. I guess I sort of wrote this series of posts backwards, so let's end by introducing the team!
Ben Pelcyger aka The Face aka General Mischief
In charge of general mischief...and faces
Mengdi Chen aka Death Wish
Very nearly strangled the scrum master every 30 minutes
Xindong Wang aka the Pelican
Ate his weight in fish
Haijun Yang aka The Human Yawn
Apparently does not require sleep to code.  

Yan Sun aka Hawk-eye
He caught like a million Pokemon that weekend


Wei Wei aka Noble Bull the No Bull Bull
Drank more coffee and red bull this weekend that anyone on earth




[1] Footnote: How practical is this?

It may sound difficult to pull this off outside a lab environment because you would need to put a secret message into each ad.  However, embedding a the tone into each ad isn't much of a hurdle.  It could be added to the master (mezzanine) file before it gets transcoded into its various forms.   

Fun fact: Nielsen has done something similar in the past to measure TV and radio viewership.  As far as I know, Nielsen tagged ads AND content, and they used these nifty pagers called "Portable People Meters" to track listeners.



Invented by a company named Arbitron!