Overview

The Goal of this analysis is to identify the key drivers of adoption and predict user future adoption.

In this analysis, I have analyzed 12,000 users and out of the total users, 1,656 users became adopted users, which is 13.8% of the total users. The rest, non-adopted users, can be separated into two groups, never visited and visited but not adopted, which are 26.5% and 59.7% respectively. For never visited users, we should attract them to visit for the first time and for visited but not adopted users, we should improve their experience to increase adoption.

Since the prediction should be made at the time when the account is created, any factors that are acquired after the users started to use Asana should be avoided. Otherwise, the factors would leak information to adoption prediction. By analyzing and preprocessing the data, I have feature engineered 5 new variables to predict adoption, which are email_domain (The domain from the email address), adopted_refer (whether the user is referred by adopted user), same_org (whether the user and the person referred are in the same organization), org_size (the size of the organization) and org_adopt_pct (percentage of the people in the organization are adopted users).

As we can see from the graph above, this data set is highly imbalanced, which means we have much more non-adopted users than adopted users. If the imbalanced data is directly used, the model would tend to predict non-adopted users to achieved higher accuracy but lose the ability to identify potential adopted users. Therefore, I choose the under sampling method to combat the imbalance issue and leverage the state of art machine learning technology (xgboost) to build the prediction model. The evaluation metric is set to be AUC rather than accuracy due to the goal and imbalanced data set.

The graph below shows the importance of the factors based on the model. First of all, Organization adoption rate plays a very critical role in adoption. Second, the size of the organization is also very important in adoption.

Based on the variable importance, I made the following four graphs. Top graphs show the connection between organization size and adoption rate. The small size companies tend to have higher adoption rate than the larger companies. The bottom left graph shows that each email domain has different adoption rate for which hotmail has the highest and yahoo has the lowest. For referral, if the user is referred by adopted users, the person is more likely to become adopted user in the future.

Conclusion

From the analysis, most of the adopted users are from a relatively small organization with high adoption rate in the organization and referred by adopted user; meanwhile, users from certain email domains have lower adoption rate than others. It could be caused by the email domain is blocking asana ads or represents certain demographics. Overall, the adoption rate in the organization plays the most important role in determining whether the new users will adopt in the future.

Due to the limitation of this analysis, I did not dive into the the factors trigger initial visit and the factors keep users engaged. What is more, the adoption prediction is made when the users create the account; however, the adoption rate changes when more factors become available such as days until first visit, visit times, days used asana, etc. Further analysis is highly recommended.