Well in this post my focus will be on approach for building the model. I would present my understanding for approach of model. The steps covered in this post are basic steps which will help in building any model.
Step1: Define Business Problem
Whenever we build any model we actually try to solve a business problem so first input for any model is simple one business problem for which we need to provide a solution. For example simple example of business problem can be just to increase sales, or measuring effective ness of marketing campaign, or what will be customer response to company product and so on. So defining business problem becomes first step or we can say input to any model.
Step 2: Convert Business Problem In to an Analytical Problem
In next step we need to analyse this problem and convert into an analytical problem in discussion with important stakeholders. Main important point to consider to analyse any business problem
- For which variable we need to provide solution, is it revenue or is it net profit, or is it number of customers etc. Basically we are trying to find our target variable. Most important point that we need to consider is not to think about any independent variable or predictor variable at this stage. Just focus on for what we need to solve our problem.
- Select granularity level for which problem need to be solve. For example if we talking about solving problem of retail sales then should we consider one retail point or retail point of particular region or entire retail points.
- Historical time period to consider for target variable. It can be say last 3 months, 1 year or say last 3 years. It will depend on modeller that how much historical data he/she want to consider.
Step 3: Generate Hypothesis for all factors Related to Your Problem
Think for all possible hypothesis which will help affect our target variable. The points to build a hypothesis can be from following categories.
- Based On demographics variable: Age, gender, social status etc. for example which gender purchases our products more, which age group generates more revenue etc.
- Based on macroeconomics factors. For example is inflation or rate of interest affecting our sales?
- Based on Industry specific factors like no of transaction, value of each transaction.
There can be many more categories or subcategories which we need to consider based on our target variable and business problem.
Step 4: Prepare Dataset
Generate independent variables based on each hypothesis framed in previous step. Each hypothesis will give one independent variable. For some independent variables data will be easily present like for example age or gender. But for some variables you will required to ask for data specifically from business owners .Because not all data will be present in one place, sometimes it may happen that you need to generate data after doing some interviews from process owners or some other concerned authorities.
Step 5: Data Audit
In this step you need to basically perform audit check on your dataset. You need to perform some treatment like missing value treatment, outlier treatment, type check for data variable etc.
Step 6: Finding out Relationships between variables
In this step we need to build relationship between independent variables or between dependent & independent variables.
Relationship building methods can be
- Between independent variables only: It can be done through correlation matrix, multicollinearity values etc.
- Between Dependent & Independent Variables: it can be done through bivariate analysis, chi-square testing, entropy value or information value.
Step 7: Model Building & Validation
In this step we simple build our model. It could be linear regression, logistic regression, or decision tree etc. Will depend on type of dependent variables and problem type. In this step you need to also validate your model based on existing metrics of that model. For example for linear regression model Rsquared and adjusted Rsquared value will do or you can do residual analysis as well. For logistic regression you need consider K-S statistics, ROC curve, gain/lift value. If there are multiple logistic model with same dependent and almost same independent variable then you need to consider AIC value.
Step 8: Production
After validating model you need to run you model on new data or for data for which you need to predict values. If required you need to do some changes and then rerun the model.
According to me, above specified step are basic general steps which are required for most of the model.
Thanks friends for reading this post. Please share your feedback regarding this post.