Solve Your Data Access and Resource Capacity Constraints
LEARN HOW TO CONSTRUCTIVELY APPLY AN EXPERIMENTATION BASED METHODOLOGY USING THE ECOSYSTEM.AI WORKBENCH OR NOTEBOOKS.
This lesson consists of step-by-step actions that can be taken in the ecosystem.Ai platform. Learn how to constructively apply an experimentation based methodology, accurately configure settings in the Workbench or Notebook, and set up simulations to test hypotheses. Follow the guide to then analyze and monitor the in-process results of your live experiments, using Dashboards.
Experimentation should be an integral part of every Data Science job. The Data Science Experimentation Module is designed to allow you to overcome common difficulties associated with the data science process.
The most common difficulties encountered during the data science process is the time it takes to produce constructive actionable outputs. This could possibly be because of a number of unavoidable restrictions:
DATA ACCESS – Waiting for data to be accessible for modeling, or to be available in production for scoring, is a serious bottleneck in most processes. There could also be a distinctive lack of data for new use cases, forcing a cold-start scenario which might be uncharted territory. It could also be that the data that is available from old use cases could be ineffective for the needs of new ones.
RESOURCES – What’s needed for a particular case could be used on higher priority problems. Or there are not enough data scientists that are needed to build all of the models across an organization.
TIME DEPENDENCE – If the customer context is time dependent this adds additional constraints to the turnaround time.
The Data Science Experimentation Module is designed to allow you to add intelligence to your process. Learn and rapidly iterate while the traditional data science process is being completed. It also has sophisticated functionality to take time dependence into account
How it Works #
The Data Science Experimentation Module uses real-time feedback to learn how to more effectively rank vidgets (virtual items).
The real-time feedback learning system can be activated with or without data, how this is done depends on the availability of your data. Whether there is none available, or using data on user context (demographic, behavioral, etc.) and historical behavior.
Each time a user interacts with the real-time feedback learning system, that interaction is logged. Every logged interaction then advances the state of knowledge of the system. This knowledge can be further used to enhance the effectiveness of the traditional data science process, if running in parallel. Ensuring effective learning without focusing too extensively on a single option through testing.
The system uses an experimentation based methodology, which is a testing approach to presenting vidgets to customers. Rather than selecting just one solution and missing the opportunity to exploit the rest, experimentation allows you to run multiple tests at the same time. The vidgets to experiment with could be in the form of products, customer engagement messages, design constructs, special offers, and more.
The Data Science Experimentation Module does not require any data to be available, but can incorporate additional data as and when it becomes available.
Starting without data does not affect the activation of the experiment, all that is required is a list of vidgets. Having no data could be due to opting in for a cold-start scenario, such as if a new product is being launched and there is no historical data available. It could also be due to capacity and/or technical constraints associated with access to the needed data.
If data is available in addition to the vidget list, it can help to improve the effectiveness of the system’s learning. If segmentation variables are available those can be used to add context to the learning. Context can also be set at a customer level for truly personalized predictions. If historical data is available it can be used to provide a more informed starting point for the experiment.
Systems involving human behavior will always be affected by time.
Time alters human behavior for a number of reasons: evolving trends, communal rituals, personal events and environmental changes. These changes are often not consistent, and will therefore happen at varying points in the time scale.
The Data Science Experimentation Module incorporates a range of functionality, allowing human changes to be captured and effectively taken into account:
- Real-time learning – In the moment capture of activity.
- Sophisticated forgetfulness – Vidget level options that can be set based on applications, and adjusted as the application is running.
- Repeated customer interactions – Sophisticated options that account for human ritual. Tuned based on applications, and adjusted as the application is running.
Let’s Get Started! #
Setting up a data science experiment is quick and simple using the Data Science Experimentation Module
Two main interfaces can be used to access the ecosystem.Ai functionality. The first is the Workbench graphical interface and the second is using Jupyter Notebooks. This lesson takes you through the Workbench configuration for recommenders.
If you would prefer to build your recommender in a Notebook, head to The Worker Ecosystem in your workbench, and click on Jupyter Notebooks. From there, navigate to your Get Started folder and find “No Data Experiments”.
Project Definition #
Define, create and manage your Experimentation Projects from the Project Definition section of the Workbench. Projects allow you keep track of all of the work linked to completing your No Data Experiments.
When you log into the workbench you will see example projects that have already been created for you:
You can edit these projects to set up your own pre-configured first example. The examples that have already been set up for you are:
- Basic AB testing example
- Dynamic experiment – which uses no historical data
- Time experiment – that utilizes the full functionality for handling time dependence.
Add a new project or edit an existing one:
When creating your project you must provide a name, type and description. You can also assign dates and people to the project, but this is more for administrative purposes than a necessity. As you progress with the project you will link items to the project as they are created, such as the models, frames, simulations and other completed elements.
Adding Data to Files and Feature Engineering #
The data used to configure your data science experiment can range from just a list of vidgets, to a full set of customer-level data with historical behavior.
Data can be added to the ecosystem platform using the + Add File functionality:
Files must be uploaded in either CSV or JSON format.
You can also use database connection strings using the Presto Data Navigator:
If you have your own database, you can connect it here. This database access option uses the Presto Worker in the platform. Add a Connection path, similar to this example: local/master?user=admin. Then write a SQL statement to extract the data you want, similar to this example: select * from master.bank_customer limit 2. Then click Execute.
NOTE: IN ORDER TO ADD DATA USING THE PRESTO FUNCTIONALITY, YOU MUST FIRST HAVE YOUR PRESTO CONNECTION ACCURATELY SET UP.
In the Feature Engineering tab you can use the + Ingest Collection functionality:
Once data has been added to the platform it must be ingested into a specified database and collection. You can either select or add a database at this point, and then use the Ingest Collection functionality to ingest a collection into the selected database.
Close the ingestion window, then you can begin the next step of creating a feature store using your uploaded or chosen dataset.
NOTE: IF THE NAME OF THE COLLECTION YOU ARE INGESTING ALREADY EXISTS, THE NEW DATA WILL BE APPENDED TO THE EXISTING DATA. IT WILL NOT BE OVER-RIDDEN.
Dynamic Pulse Responder Experiment Configuration #
Experiment configuration allows you to specify the behavior of your experiment. This includes configuring the options you want to test and how you want to balance the exploring and exploiting in your learning approach. You can also configure how much detail from your data you are going to use, and how the experiment should handle changes in human behavior over time.
Most of the settings in this tab can remain default.
3.1 Configurations and Settings
The configurations tab shows all of the existing experiment configurations:
Select one to view the contents. This is where you can edit existing configurations, or create a new ones.
The Settings tab allows you to name and describe your configurations:
Give your configuration a Unique name and Description if needs be, the UUID field will be populated automatically as you save. This UUID will be used in your Deployment configuration step.
Batch is where you can specify whether an experiment will be run in real-time or batch mode. When Batch is set to false, a real-time experiment will generate results for one customer interaction at a time. Feedback, in the form of an action, is fed into the system as soon as it is available. When Batch is set to True, it generates results for a number of customers at once. It does not incorporate feedback until the actions from those customers are loaded into the system at a later point.
When choosing between batch and real-time approaches it is useful to know that real-time is the more effective approach. However, your ability to run a real-time experiment may be impacted by technical constraints within an organisation.
Use the Feature Store: Training and Scoring dropdown:
Specify the location of the data that you will use to set up an experiment.
Use the Options Store: Real-time Scoring dropdown:
Specify the location where the options store will be created. An options store is a list of experiment vidgets, with information about the state of knowledge for each one.
3.2 Engagement and Variables
The engagement tab allows you to set up and control the behavior of your experiment:
This is where you can select the philosophy that an experiment uses.
Epsilon greedy splits between random rankings, and ranking based on what looks like the best option, given the current state of knowledge.
Binary Thompson Sampling combines how effective an offer seems to be, and the level of uncertainty about that offer. In order to trade-off between proposing the most effective option, and increasing certainty about which option is the most effective.
The history interactions section:
allows you specify how historical information is used to update the state of knowledge of the system. This is particularly useful when you know that the behavior of your system is time dependent.
- Cache Duration is where you can specify whether customers are forced to see the same option for a specific period
- Calendar is the field where you can input whether a calendar is used to capture the impact of shared human rituals
- How much historical information should be taken into account – which can be specified both in terms of time period in Processing Window and interaction count in Historical Count.
The repeated interactions section:
allows you to specify how repeated interactions from the same customer are handled. How many times do you allow an individual customer’s interaction to impact your system, can be specified in the Max Interaction Count field. Should each interaction be equally important? And if not, how should the relative importance decrease? This can be set in the Decay Parameter field.
The engagement parameters section:
allows you to specify the impact that positive (Success Reward) and negative (Fail Reward) feedback results, have on the state of knowledge of the system. This can be set differently for historical information using Prior Success Reward and Prior Fail Reward. These parameter are useful when success is much more likely than failure, or vice versa.
The impact of these different parameters can be explored using the Simulation functionality.
The variables tab allows you to specify the data that will be used in your experiment configuration:
This data is stored in the location you specified in the settings tab. In the variables tab you can then specify whether you are using data on user context (demographic, behavioral, etc.) and historical behavior. This is done by either filling in each of the fields or leaving them blank.
The only required input is Offer Key:
This is where you will specify the name of the vidgets to be ranked in your data set. In addition to the Offer Key you can add a take up field. This is where you can specify if your data includes historical behavior.
A tracking key can be added:
if you want to track behavior and learn at an individual customer level. This should only be used if there is regular repeated engagements with individual customers.
Finally there are two contextual variables that can be set:
This is where you will specify the data on user context. This data will create segments in which interactions will be tracked and learned from. For example, if one of the contextual variables is geographic location, then the data science experiment will produce different rankings for different geographic locations.
Once all of the variables have been specified, click Generate:
This will will create, store and display the options store.
3.3 Options, Graphs and Json
The options tab allows you to view vidgets and the contexts to be considered in your experiment:
You can also manually edit or add options here.
The graph tab allows you to view a graphical depiction of your options store set up:
Use the various dropdowns to set the graph variables, in order to view factors such as Closeness, Cose, and more.
The JSON tab allows you to view your experiment configuration as stored in the platform metadata:
Once your experiment has been created, save it.
Take note of the UUID associated with your configuration in the Configurations tab:
Then back head to Projects and go to Deployment, in order to configure your deployment details.
The deployment tab allows you to set up your experiment. In order to be used in the production, Quality Assurance or Test environment. As well as to push it into the desired environment.
Most settings in this tab can be left with their default values.
Set the case configuration for your experiment deployment:
Create a unique Prediction Case ID name. Add a Description that is relevant to the specific deployment you are doing. Add the Type and the Purpose of your deployment. You can leave the type and purpose blank if you are unsure what to input.
Input the properties details:
Set the version of the deployment step. This version number should be updated every time you make changes to the deployment.
Specify the environment in which you will be deploying your configuration. Then input the performance and complexity settings for your set up.
Scroll down to the bottom of the page to open the New Knowledge dropdown:
insert the UUID of the experiment configuration that you are going to push to production. You will find this UUID in the Dynamic Pulse Responder Settings and Configurations tabs.
Once that is done, click the Push button to set the configuration up in your specified environment. No downtime is required. The Generate and Build buttons are not needed for now, they are designed for Enterprise and on-premise setups.
Once you have pushed your configuration for deployment you should do some testing to see if the results align with your expectations. There are two ways to test your deployment now that it has been created and Pushed.
- Head to our Jupyter Notebooks to configure the simulation of your deployment. The steps of how to complete this part of the journey is laid out in the Notebooks.
2. Go to the Laboratory section of the Workbench in order to test your API:
If you have used one of the pre-configured examples to work through, click on the relevant deployment to view the details. If you have created your own, use the Create New button to make a new API. Provide the deployment name and click next to add it to the list.
Select Configuration to view and edit the details of your API:
Select the one you want to test, fill in the relevant details of the campaign, then select the campaign to bring down the API test window
Click Execute to bring back the API results and ensure your deployment is functioning:
Now that you have built, deployed and tested your recommender, it’s time to watch it in action.
Once your experiment[s] are running it is important to keep track of their behavior, and begin to examine the results.
Head to the Worker Ecosystem section of the Workbench and select the “Real-Time Dashboard” to go to Grafana, to set up and view the real-time results of your deployment. Alternatively, head to the Monitoring section of the Workbench to configure and view your Superset Dashboard.
Access the Grafana Dashboards for a real-time view of your experiment:
Our Grafana Dashboards illustrate the behavior of the recommender in production. Showing which options are being recommended, and which are successful. As well as providing information on performance, and how the recommender is trading off between exploring and exploiting.
To set up your Grafana Dashboard, and link it your chosen deployment, you will need to login as an admin. We have already pre-built a dashboard for to view all the most important elements of your real-time deployment. However, if you have experience with Grafana, or are looking to monitor something very specific, you can built your own dashboard: https://grafana.com/docs/grafana/next/getting-started/build-first-dashboard/.
Now that you have logged in, Navigate to the left hand menu, click on the ‘dashboards’ icon and select “Manage”:
At this point, you will notice a list of folders.
Select the “Runtime2” folder and click on Scoring Dashboard: Client Pulse Responder:
To view the pre-built dashboard configuration. The dropdown menu called “Prediction case” is where you can see all the linked deployments to this dashboard. Find your Deployment there if you have used one of the pre-configured solutions.
To add a new deployment to be viewed on this dashboard, go to the “Dashboard Settings” icon in the top right corner:
This will take you to the settings page where you can manage elements of the dashboard.
Go the Variables in the menu on the left, and then click on Prediction:
You will notice in the “Custom Options” field that the deployments currently linked to this dashboard are listed, separated by comma.
Simply add your deployment case name in this field:
Then click update. When this refreshes, click Save Dashboard on the left, this will link to a popup where you can specify the details of your changes. This is not a compulsory step, but it is good practice to document all changes. Then click Save. Press the “back” button in the top left hand corner to go back to the dashboard, give it a minute to load and then you will be able to view your new deployment in the “Prediction Case” list.
Go to the Superset Dashboard to view further illustrations of your experiment:
The superset dashboards allow you to see which experiments are running. You can also view which experiments were run historically, and which users were impacted by multiple experiments. Finally, you can view and analyze the results of all of your experiments.