Study of Loans in the Bank Using Data Mining



Study of Loans in the Bank Using Data Mining

It is widely known that the most damaging risk to banks is credit risk. The financial institutions and the whole country’s economic system will be affected if this risk is not mitigated.

MiniCalc with vip services



It is very crucial for any financial institutions or banks (this term will be used interchangeably herein) who offers loans to their individual or corporate customers to manage risk appropriately. This is especially true because loans are the biggest asset on the bank’s balance sheet. Frequent exposure to risk will result in failure, loss, and eventually insolvency.

Therefore, to avoid this the bank should develop a strategy and identify risk-free or credible customers to avoid potential problems that may occur. Some of the major bank risks include credit risk, market risk, operational risk, business risk, reputational risk, liquidity risk, and interest rate risk. This study will focus on the most widely known damaging threat to banks: credit risk.

Credit risk simply involves loans that are not paid back. Banks classified these loans as non-performing loans or NPL(s). These loans do not provide income anymore because repayments are delayed, in default, or near default, the maturity date has expired, full payment is not completed as per the contract signed and full payment of the principal loan amount and interest is not provided.

As a result, a bank’s overall performance will be affected when there is a high presence of non-performing loans. The reputation of a bank will be jeopardized and it may affect the whole economic system of a country. Hence, it is significant for banks to ensure that there is an organized method of credit appraisal system, distinguish factors affecting their non-performing loans, and how to administer them prudently.

Consequently, when faced with credit risk banks will usually have funds allocated or loan loss provisions to cover the losses and write-off bad debt in their profit and loss accounts. Also, in some countries when banks have a high number of NPLS they can sell them to asset management companies to recover at least some of the amounts owed. However, banks in different parts of the world suffered more severely in crisis financial-hit countries such as Greece, Portugal, Spain, and Italy, while in other regions of the world this issue is still manageable.

Problem Statement

As managing risk is very important, there is a need for financial institutions or banks to look for ways to decrease the problem of default loans. Indeed, they have been attempting to look for models to be able to predict between good credit applicants and bad credit applicants. However, this process remains detailed, time-consuming, and requires thorough human-based financial analysis.

Research Motivation/Justification

Based on the literature review, the non-performing loan ratio in the selected country's economic system is relatively high.

The objectives of this study are to investigate the types of customers and loans that are inclined to default which contributes to the credit risk of a bank.

Research Questions

  1. Who are the good and bad credit applicants?
  2. What are the causes of defaulted loans?
  3. What is the segregation and distinguish of NPL?
  4. Which type of loans is risky?

Research Aims/Objectives

The aims of this study are:

  • to understand the reason for the ongoing growth of the non-performing loans;
  • to be able to predict when non-performing loans will arise;
  • to distinguish NPL customers (corporate or individual customers or SMEs) and types of NPL prone loans;
  • to classify the determinants of default customers;
  • to investigate the correlation between factors affecting defaulted loans;
  • to compare causal relations between performing and non-performing loans;
  • to provide indicators for a systematic credit appraisal;
  • to help understand the loans offered to customers better.

Summary of Research Methodology

This study aims to gather secondary data that is already available based on the database on a selected financial institution.

While the overall design of this research will take a quantitative approach, where data are utilized to test and analyze patterns existing in the area of research. The proposed data analysis method is by data mining using IBM SPSS Statistics 23 Software.

Data mining refers to extracting knowledge, hidden trends, and patterns from large amounts of data. It is about explaining the past and predicting the future using data analysis. Data mining combines statistics, machine learning, artificial intelligence, and database technology.

There are different types of data mining techniques such as classification, clustering, association rule, prediction, and sequential patterns, neural networks, regression, etc. Data mining techniques to be used in this study are:

  1. Decision Trees

A decision Tree-based classification algorithm is well suited to credit risk applications. In classification, a training set is used to build the model as the classifier which can classify the data items into its appropriate classes. A test set is used to validate the model (Gerritsen, 1999).

  1. Clustering

Clustering involves segmenting variables in a large database into clusters containing similar characteristics. It is widely used in target marketing. This will be able to identify the type of customers the banks have into clusters based on selected variables (Yap et al., 2011).

Structure of the Dissertation

The structure of the dissertation is as follows. Chapter 2 gives a review of empirical researches on the causes of non-performing loans. Chapter 3 describes data and the methodology used in the paper. The results of the empirical analysis are presented and discussed in Chapter 4. Chapter 5 provides conclusions and gives some policy implications of the results as well as suggestions for future work.

Literature Review

Determinants Non-Performing loans (NPLs)

Ascertaining the determinants of credit risk for banks is a significant issue for regulatory bodies that are concerned with the management and financial stability of banking institutions. Credit risk for banking institutions occurs in the form of NPLs. Reinhart and Rogoff (2011) assert that NPLs are an indicator of the beginning of the crisis in the banking sector. The bulk of studies investigating NPL determinants use bank-specific determinants, macroeconomic determinants, and the aggregate NPL level as explanatory variables (Reinhart & Rogoff, 2011).

Few studies have mixed microeconomic and macroeconomic factors in explaining the aggregate NPL levels. These authors place emphasis on the determinants of NPLs for savings and commercial banks and report that ban-specific factors can act as early indicators of anticipated NPL changes in the future (Ghosh, 2015). The majority of empirical research studies have explored the impact that the macroeconomic environment has on NPLs. In this respect, studies have established that monetary conditions, unemployment, and disposable income have a significant impact on NPLs (Beck et al., 2015; De Backer et al., 2015).

Berge and Boye (2007) reported that NPLs are particularly sensitive to unemployment as well as real interest rates in the case of the Nordic banking sector. Another school of literature places emphasis on the impacts of bank-specific variables on NPLs. Nkusu (2011) highlighted the relationship between NPLs and bank-specific variables and efficiency. In particular, Nkusu (2011) hypothesized likely bank-specific factors, which included capital adequacy, bad management of banks, bad luck, moral hazards, and skimping.

They empirically tested their hypothesis using a sample comprising of commercial banks in the US from the period between 1985-1994 and concluded that a fall in the cost efficiency is likely to increase NPLs in the future. Using the same variables, Podpiera and Will (2008) explored the association between NPLs and efficiency for the case of the Czech banking sector between 1994-2005. Their study provided evidence to show that a negative association exists between NPLs and a decline in cost efficiency. Other studies have also examined the impact that institutional variables have on NPLs (Chaibi & Ftiti, 2015; Assaf et al., 2013).

Macroeconomic Determinants

Numerous studies in banking have explored the association between the quality of loans and the macroeconomic environment (Barra et al., 2016; Hassan et al., 2015; Çifter, 2015). With respect to this, some authors have proposed that an economy undergoing an expansionary phase is typified by low NPL since both banks and consumers have adequate revenues and income streams for servicing their debts. Nevertheless, with the continuous booming of the economy, lower quality debtors have increased access to credit, which consequently results in recessions that increase NPLs.

Rinaldi and Sanchis-Arellano (2006) maintain that the state of the economy is one of the most crucial macroeconomic factors that have an influence on the rates of diversified debt portfolio loss. Other research studies have also affirmed the relationship between NPLs and the phase of the economic cycle. For instance, Quagliariello (2007) reported that the business cycle affected the NPL for a large sample of Italian banks from1985-2002. Moreover, Cifter et al. (2009) offered evidence suggested a lagged effect of industrial production on the volume of NPL in the banking sector in Turkey from 2001-2007.

The Gross Domestic Product (GDP) growth has also been found to have a negative concurrent growth on NPLs; hence, the macroeconomic conditions affect economic agents’ ability to repay their loans. Empirical evidence also exists indicating that low-income borrowers are more likely to default their loans because of the higher risk associated with unemployment (Cifter et al., 2009). As a countermeasure, banks impose higher interest rates on customers deemed riskier. Studies have also provided evidence to suggest that unemployment and income levels affect NPLs.

On another hand, another study investigates factors affecting the non-performing loans rate of the Eurozone’s banking systems before the beginning of the recession. The factors taken into considerations were macro-variables such as public debt as a percentage of gross domestic product and unemployment as well as micro-variables such as loans to deposit, ROE (return on equity), and ROA (return on assets). The finding shows that there is a strong correlation between macroeconomic variables, bank-specific factors, and non-performing loans (Makri et al., 2014).

Similarly, a study conducted on Turkish banks shows there exists a relationship between bank credit risks and macroeconomic factors. The set of variables identified includes inflation rate, interest rate, the ISE-100 index, foreign exchange rate, growth rate, M2 money supply, unemployment rate, and the credit risk represented by the ratio of non-performing loans to total loans in Turkey during January 1998 and July 2012 period (Yurdakul, 2014).

What our Clients say

Check out our customers' feedback
# 1616 | Research paper

Thanks GUYS! I'm awestruck by the majestic attitude you guys have. You truly helped me. The paper you offered was even more advanced than my level. I got A....THANKS once again!

11:28 AM, 19 Sep 2018

# 1616 | Research paper

Thanks to, I managed to pass an extremely difficult subject!

10:44 AM, 19 Sep 2018

# 5436 | Research paper indeed proves to be the most credible writing company. When I got my essay, I wanted to change some parts. I sent a revision request and received an amended version just like I needed.

12:39 PM, 19 Sep 2018

Sovereign Debt

The relationship between crisis in the banking sector and sovereign debt crisis was acknowledged following the 2008 financial crisis and the sovereign debt events that followed. Reinhart and Rogoff (2011) offered evidence indicating that crisis in the banking sector usually either occurs together with or before the sovereign debt crisis. However, the authors reveal that a causal link between a banking crisis and the sovereign debt crisis cannot be disregarded light. The sovereign debt crisis affects the banking system in two ways. The first relates to the worsening of the public finances, which puts a ceiling on the market evaluation regarding the national bank’s credibility, which in turn increases the liquidity pressure on such banks (Reinhart & Rogoff, 2011).

With respect to this, banks are compelled to lower their lending; therefore, debtors face a considerable challenge in refinancing their debts. In addition, an increase in public debt might result in fiscal measures, particularly involving cutting social expenditure and the government’s wage bill (Rinaldi & Sanchis-Arellano, 2006). This is likely to render a considerable portion of due loans unserviceable because of the negative shock affecting household loans.

Bank Specific Determinants

NPL determinants are not only limited to macroeconomic factors exogenous to the banking sector but also comprise the unique characteristics of the banking sector including the policies adopted by each bank, particularly regarding their efforts geared towards risk management and efficiency improvement. Some authors have explored the influence of bank-specific variables on bad loans. In this respect, Ennis and Malek (2005) reported causality between bad management practices, and moral hazard variables, and NPLs.

Specifically, their research indicated that low-cost efficiency attributed to bad management with regard to poor credit scoring and monitoring borrowers increased NPLs (Ennis & Malek, 2005). The authors also showed that banks that commit less effort to ensure higher loan quality suffer from high NPLs. Additionally, the low capitalization of banks has been linked to a higher incidence of bad loans (Ennis & Malek, 2005).

Customer Demographics

Despite the significance of customer demographics in determining the ability to repay the loan, little empirical attention has been paid with respect to the relationship between a customer profile and NPL. Only recently, a study by Asfaw, Bogale, and Teame (2016) explored the customer-specific determinants of NPL in the Ethiopian banking sector. The research outlined borrower-specific factors contributing to NPLs, which include:

  • diversion of the borrowed funds to unintended purpose;
  • limited knowledge of borrowers;
  • intentional default by borrowers;
  • the underdeveloped credit culture.
Get 24/7 Free consulting
Order now

Other customer-specific factors outlined in the research include misuse of the loan and investments in projects that lack viability. A study conducted by Munene and Guyo (2013) to investigate the borrower-specific variables contributing to NPLs showed that a lack of technical training for the borrowers and the performance of the borrower’s business are crucial factors driving the occurrence of loan delinquency.

Significance of NPLs on Bank Performance

Additionally, NPL(s) are used as an indicator to assess the bank’s performance. Boudriga et al. (2009) confirm in their study ‘NPLs are frequently used as a variable that explains banking outcomes such as bank performance, failures, and bank crisis….’. They also claim that banks will suffer severe losses on their credit portfolio which leads to bank failures and to control this issue they suggested that financial organizations together with international regulators (IMF, World Bank, and the BIS) present restructuring and programs to strengthen banking and financial systems all over the world. Also, bank performance will be negatively affected by bad credit portfolio which can risk bank capital and leads to liquidation (Curak et al., 2013). Therefore, it is essential for banks to have a good knowledge and understanding of factors contributing to non-performing loans.

However, a study conducted in Asia, Japan shows different determinants than studies carried out based on European crisis-hit countries. It shows that government assistance plays an important role. The study further confirms that non-performing loans lead to a decline in economic activity and government bailouts trigger NPL(s) even more. An aggregation mathematical model is used to support their findings (Barseghyan, 2010).

It is also important for banks to look at the different products of loans they are offering and to identify which types of loans are more prone to being in default. Once this is established, banks can strategize necessary measures to control and improve the products offered before it becomes bad debt. Louzis et al. (2012) in their study compares mortgage, business, and consumer loan portfolios and identify their determinants of non-performing loans in Greece. While Kauko (2012) identifies that non-performing loans show an increase when combined with a current account shortfall in the bank which can act as an early warning signal during a financial crisis. NPL(s) can be handled by identifying the current account deficit before the financial crisis and that NPL(s) development can be predicted (Sarlin & Peltonen, 2011).

Breuer (2006) provides an interesting insight into why there are non-performing loans. He stated that banks have dual-role which introduce conflicts of interest. This dual-role is the main reason as it can lead to bank mismanagements, therefore, the presence of non-performing loans. He also further defines that legal, political, sociological, economic, and banking institutions contribute to the presence of loans that cannot be recovered (Breuer, 2006).

Use of Data Mining for Prediction of NPLs and Banking

The study by Sudhakar and Reddy (2016) confirms that an analysis of credit risks and non-performing loans can be examined by using data mining techniques. It shows that the banking industry deals with extensive data and analysis of the data into useful knowledge are beyond the capability of a human. Therefore, data mining techniques are used to identify patterns, associations, and correlations. By using data mining techniques, it is easier to analyze and predict accurately to customers' reactions to change in interest rates, customers who will accept a particular product, customers with a high risk of loan default, etc.

Moreover, this technique helps the bank to predict credible customers so that the bank can prevent fraud, launch new products for credible customers to retain them. In the banking industry, areas that utilize data mining are Risk Management, Customer Relationship Management, Marketing, Default Detection, Demand Forecasting, Non-Performing Loans Prediction, and Anti-Money Laundering (Sudhakar & Reddy, 2016).

Research Methodology

Initially, fact-finding was done in order to understand terms, concepts, and methods to be used in this research. It was done through various approaches such as reviewing academic journals and articles, past research papers on similar topics using the university’s library system and Internet search engines. Reading different readily available materials aids further comprehension of the topics. A table of key terms was developed while reviewing the reading materials so that the subject of interest can be viewed from different viewpoints and evaluation can be constructed.

In addition to researching the subject, preparation is also made to use the data analysis tool. Several statistical programs such as Weka, R, and IBM SPSS are studied concurrently through online tutorials, reviewing notes, and various textbooks. With limited knowledge on the software, comparisons and its pros and cons are considered when choosing the data analysis tool. The key was to look for a system that is more user-friendly, reliable, and easy to understand in a short period of time.

Data and Data Collection

Researchers can use either primary or secondary data. Primary data refers to the information collected by the researcher on his/her own. While the use of primary data for this research was considered, it was deemed impractical in meeting the objectives of this research. The use of primary research would have resulted in some constraints, especially concerning cost and time. In terms of costs, conducting primary research might be an expensive undertaking because of the considerable involvement of the researcher in preparing and executing the research.

Moreover, primary research requires the researcher to develop and execute a research plan. Most importantly, primary research was not feasible for this research. Although primary research could have offered some valuable information for this study, it was not regarded as a possible alternative. Essentially, primary data was not within the reach of the researcher. For example, contacting customers who have defaulted on their loans would require banks to disclose customer contact information, which would present ethical and legal issues. Hence, the only viable for this research was to conduct secondary research based on readily available data. Secondary research eliminated the hurdles associated with conducting primary research for the case of this research study.

Secondary data denotes data gathered by another individual or party. There are various sources of secondary data including information gathered by governmental departments, company records, censuses, and information initially gathered for research purposes. The analysis of secondary data helped save the researcher a considerable amount of time that would have otherwise been spent on collecting primary data regarding the demographic profiles of loan defaulters. Moreover, the use of secondary data, especially for quantitative data, offers higher quality and larger data that would relatively be impractical to gather using primary research.

Despite its advantages, cautions must be exercised when using secondary data. Braha (2013) recommends looking out for instances of inaccurate or outdated data. Moreover, when using data gathered for dissimilar study purposes, there is a possibility that the data might not cover samples from the target population (Braha, 2013). Insufficient detail is also another issue associated with secondary research. In order to address these issues associated with relying on secondary data, the researcher took a number of steps to evaluate the data to determine its availability, relevance, accuracy, and sufficiency.

The availability criterion entails ensuring that the kind of data needed to answer the research question is available. In case secondary data is unavailable, Rokach and Maimon (2008) recommend using primary data. In this study, secondary data was available and accessible. The relevance criterion requires that the data to be of help in addressing the research problem; hence, the measurement units should be the same as that of the research problem, Moreover, the concepts utilized in the secondary data ought to be the same. Again, these criteria was satisfied in the secondary data collected (Braha, 2013).

The third important criterion when using secondary data is accuracy, which requires examining the dependability of the source, the error margin, and the specification and the methodology adopted. In this regard, the secondary data was obtained from the customer records; hence, dependability was not an issue of concern in this research. The secondary data was also sufficient. This is because the data provided all the data for their bad loans during the 2011-2012 period.

Data collection was made by request to a bank with concurring criteria. The data consists of information on the bank’s customers’ demographics and details of their loans. A loan is considered NPL if it is overdue; however, a loan that is overdue by 90 days or more is considered a bad loan having little chance of recovery whereas a loan that is overdue for less than 90 days stands a better chance.

The data used in this research captures various information relating the loan including amount, payment size, the release of the loan date, overdue days, loan type; and the demographic attributes of borrowers including the age, marital status, and income category.

Preliminary Analysis

Critical and careful data analysis is a crucial step when conducting statistical analysis (Shmueli et al., 2016). The conclusions reached by the study cannot be more reliable than the data analyzed. The compilation of data was made using a common commercial software package, MS Excel. With the spreadsheet package, data were filtered and sorted accordingly and later imported to SPSS Software for further analysis. SPSS Software will be used to analyze the researcher’s comprehension and background knowledge on the software is better.

The format of data given is also suitable for use with SPSS Software. The variables included in the dataset included the branch, the type of loan, the loan amount, the overdue days, the loan year release, whether the loan is secured or unsecured, gender of the borrower, age of the borrower, the income level of the borrower, the job of the borrower, race, and marital status.

Information regarding the branch was identified by the branch codes, which included 1, 3, 4, 8, 9, 10, 11, 12, 14, 15, 18, and 20. The 39 categories for the loan type included in the dataset were:

  3. AR-RAHNU,
  20. BBA AOP,
  21. BBA AOR,
  22. BBA AOS,
  25. BBA Staff Al Falah,
  26. BBA Staff Commercial,
  27. BBA Staff Computer,
  28. BBA Staff Vehicle,
  29. BBA Staff Housing,
  30. BAA-3CP,
  35. AR-RAHNU,
  37. BBA AOP,
  38. BBA AOS,
  39. BBA HOS 1.

The loan amount was also included in the dataset. The categories for the overdue days included 1-30 days, 31-60 days, 61-90 days, 91-180 days, and everything above 180 days. The year in which the loan was released was also recorded. The gender for the loan applicant was a categorical variable grouped into male and female. The age was also documented. The income level was a categorical variable comprising of various groups including <500, >1000, 1000-11750, 1750-3000, 3000-4000, 4000-5000, and 500-1000. The professions of the loan applicants were categorical-variable divided into various groupings including:

accountant assistant, accounting finance, administrator/assistance, ambassador, architect, assistant clerk, assistant operation, auditor, banker, bellhop, business/system analysis, businessman/woman, cabin crew, caretaker, carpenter, chef/cook/baker, chief executive officer, clerk, communication, consultant, coordinator, customer service, deck, dentist, deputy permanent secretary, deputy/assistant director, designer, director/managing director, district officer, doctor, engineer/assistant engineer, farmer, fore services personnel, gardener, general manager, government servant, graduate, helper, housekeeping/cleaner, housewife, HSE coordinator, human capital/human resources, insurance agent, investment personnel, IT specialist, kitchen helper, lab assistant, labourer, lawyer/solicitor, librarian/librarian assistant, logistic expediter, logistic manager, maid, maintenance lead, mechanic, medical personnel, military, minister, nurse, officer, operational representative, operator, other military personnel, overseer, pensioner, permanent secretary, personal assistant, pharmacist, pilot, photographer, postman, receptionist, religious personnel, researcher, restaurants and hotel, rigger, sales representative, secretary, security personnel, self-employed, special duty officer, steward, store keeper, student, supervisor, surveyor, teacher/lecturer, technical assistant, technician/electrician, tradesman, traffic assistant, trainer, and translator. Finally, the marital status was a categorical variable grouped into divorced, married, single, and widowed.

Since most of the data variables obtained from the bank were string variables, they were recoded into numeric data type to eliminate the eliminations associated with conducting statistical analysis on the string data type.

Data Pre-processing

In the process of data mining, data pre-processing is an instrumental step. The process used for gathering real-world data often lacks appropriate controls, which might result in out-of-range data, impossible combinations of data, and missing values among other problems that might complicate the analysis process. Data pre-processing is often conducted on raw data to transform the data to facilitate its further processing. The analysis of data that has yet to be extensively scanned for such problems is likely to yield misleading results. Therefore, it is imperative to ensure that the data is of high quality before analyzing.

Data pre-processing also entails removing cases of data redundancy. Numerous techniques can be employed during data pre-processing, including sampling, de-noising, normalization, and feature extraction. Sampling entails selecting a representative sample from a large data population (Braha, 2013). In this study, sampling was not performed as all the data collected was used for analysis. De-noising entails removing noise from data, which was performed in this reach by looking at instances of incomplete and inaccurate data. For instance, the researcher looked for cases of data that made no sense based on the category. An example is finding the marital status filled under the loan type. Such instances were eliminated from the data set. Normalization was performed by grouping some data types into categorical data types to facilitate the identification of patterns and trends in the data.

Each data type was then categorized into the various groups described in the preliminary analysis section above. Once the defaulted loans were identified during the preliminary analysis, the total NPLs were assessed to build a model.

Building Data Models

The key attributes are Loan Type, Overdue Days, Gender, Age, Income Level, Job, Race, Marital Status, Mortgage (Secured or Unsecured), Loan Tenure, Loan Amount was selected to create the data mining models. The primary focus of the model was on the Overdue Days, which was used as a measure of the NPLs, wherein a loan that is overdue for 90 days or more is considered a bad loan having little chance of recovery whereas a loan that is overdue for less than 90 days has a better chance of being recovered. Therefore, each attribute in the model was analyzed with the Loan Overdue Days as the reference point to determine the relationship between the aforementioned key attributes and the Overdue Days.

Predictive Model: Decision Tree

The decision tree algorithm represents a structure comprising branches, toot nodes, and leaf nodes. Each internal node presents a test conducted on an attribute whereas each branch represents the outcome of the test. In addition, each leaf node represents the class label. The topmost node represents the root mode. The decision tree helps in building regression or classification models using a tree structure. It is crucial in breaking down a dataset into small subsets. In this research, a decision tree was used to determine whether a customer with a specific attribute will default his/her loan. The predictors in the model included Loan Type, Gender, Age, Income Level, Job, Race, Marital Status, Mortgage (Secured Or Unsecured), Loan Tenure, Loan Amount.

Descriptive Models: Clustering

A cluster represents a set of objects belonging to the same class. Simply stated, clustering entails grouping similar objects together and dissimilar objects together. In clustering, the data set was first partitioned into groups by data similarity, after which labels were assigned to the various groups. Clustering was helpful in pattern recognition and discovering distinct customer profiles for the bank based on their ability to repay their loans.

To ensure the accuracy of the models and whether the model is correct or wrong. Accuracy can be checked by percentage correct or percentage wrong. Additionally, training a model is used to test and check various accuracy measures.

Research Design

In order to achieve the objective of this research, quantitative descriptive research was adopted to help with the identification of the customer-specific factors that contribute to NPLs from the selected bank. The research was geared towards distinguishing the type of customers who are likely to be loan delinquent. The descriptive research design focuses on describing the behavior of people concerning a specific phenomenon. A key strength of the descriptive design is that data is gathered without manipulating any variable (Gerritsen, 1999).

It is also referred to as correlational or observational design and is an invaluable research tool for demonstrating relationships between things. In this study, the descriptive design was used to describe the relationship between customer-specific factors and NPLs. Essentially, the research design was used to model customer-specific characteristics likely to be associated with bad loans. Descriptive research can be either longitudinal or cross-sectional (a single point in time) or longitudinal (over an extended period).

In this research, longitudinal descriptive research was conducted, where data was collected from the period between 2001-2012 for the selected case bank. Figure 1 below shows the steps undertaken to execute the research design. The first step was the selection of the topic. This was followed by reviewing past literature relating to the issue under investigation. The third step involved collecting data. In this regard, a request was made to a local bank to access and use their loan data for the purposes of this research.

  • Topic Selection
  • Review Past Research on Similar Topic
  • Understanding and evaluating
  • Concepts, Key Terms, Background Research, and Analysis Tools
  • Study of Statistical Analysis Software
  • Weka, R, MS Excel, SPSS
  • Data Collection:
  • Request Data
  • Data Clean-Up
  • Reduction, Sort, Filter
  • Test Analysis
  • Validate Result

Limitations of Methodology

The limitations of the research methodology are:

  • the study is limited to a certain area due to time constraints;
  • the data is limited as it is not very informative, some of the required information is missing.
  • although the results can be said to be representative, more accurate results could be obtained if more areas are included in the study.
  • due to limited knowledge of the analysis software, only some analyses are carried out.

Client's Review

"The quality of the writings is really good. Guys who work there are friendly and help a lot. I ordered papers and got them on time as we arranged. As for me, this service does the job properly without any problems."

reviewed on May 20, 2020, via Trustpilot Click to see the original review on an external website.

Conclusions and Recommendations

This study has revealed important insight regarding the predictors of NPLs using customer data and loan data. The study used data mining techniques to find patterns that distinguish customers who are most likely to pay back their debt versus those who will not. These patterns are helpful for managers to predict customers who will produce a negative effect on the bank’s reputation and capability to provide credit.

The findings indicate that male applicants are more likely to default on their loans when compared to their female counterparts. Moreover, the percentage of loan applicants in all loan overdue categories is higher in male applicants. The results also indicate that married applicants are more likely to default on their loans, which makes them bad credit applicants. On the other hand, widowers are the group that is least likely to default on their loans.

The study also found that Malays comprise the majority of those with overdue loans followed by Chinese and Natives, with Indians being the least likely to default on their loans. Moreover, the results suggest that those earning more than BND 5,000 per month comprise the majority of those with loans overdue. Surprisingly, those earning less than BND 500 per month are less likely to default on their loans for longer durations. Therefore, the data suggests that customers with high incomes are likely to have loans overdue for longer periods

Furthermore, the data suggest that government servants comprise the majority of those who have overdue loans, followed by other military personnel and teachers.

From these patterns, it can be inferred that male, married, Malays, government servants, and those earning more than BND 5,000 per month are bad credit applicants. On the other hand, females, and widowers, earning less than BND 500 per month are good credit applicants

The study also investigated the type of loans that are NPLs. In this respect, it was found that unsecured loans comprise the majority of those overdue. Also, BBA AOS comprises the majority of the loans overdue followed by BAA-3CP, BBA MANZIL, and BBA AOP. Thus, it can be concluded that unsecured loans in the form of BBA AOS, BAA-3CP, BBA MANZIL, and BBA AOP are NPLs. Thus, these types of loans are risky, especially when issued to customers who fit the profile of being likely to default.

Evidence from data mining indicates that bad loans can be distinguished using two categories of attributes, which include the type of loan and customer attributes. Regarding the type of loan, the evidence presented earlier suggests that the riskiest loans are those unsecured loans of types BBA AOS, BAA-3CP, BBA MANZIL, and BBA AOP. Regarding the profile of customers who have a higher risk of defaulting, evidence indicates that male, married, Malays, government servants, earning more than 5000 are bad credit applicants. Therefore, bad loans are BBA AOS, BAA-3CP, BBA MANZIL, and BBA AOP issued to male, married, Malays, government servants, and those earning more than BND 5,000 per month.

From the table, a significant positive correlation exists between age and loan amount, which means that older customers are more likely to borrow higher loan amounts. Another noteworthy trend is the significant negative relationship between overdue days and age, which means that younger customers are likely to overdue their loans for longer durations and vice versa.

The only statistically significant predicting variables in the model included Mortgage (Secured or Unsecured Loan), Gender, Job, Race, and the Type of Loan. Marital status and loan tenure is not significant coefficients in the model.

The recommendation for the banks is to screen customers when determining their credit risk. The key emphasis when screening customers should be on their gender, job, and race. Also, the bank should reconsider providing unsecured loans, especially those in the form of are BBA AOS, BAA-3CP, BBA MANZIL, and BBA AOP, which are risky. These recommendations can help the bank to improve the management of its lending program and lessen the incidence of bad loans.

scroll to top call us
Chat with Support