Natural language is a huge source of data about complex phenomena, but it is difficult to qualify and measure. With development of computer science, the method of textual analysis enables the researcher to include large amounts of textual information and systematically identify its properties, such as the frequencies of most used keywords by locating the more important structures of its communication content. The measures contain readability, similarity, sentiment and so on.
Qiang Gao and Mingfeng Lin recently connect the textual information with P2P online lending activity. They examine whether linguistic styles of texts can help mitigate issues of information asymmetry, and more importantly, whether investors can “correctly” interpret the economic value of texts. Using data from online debt crowdfunding, they show that investors indeed take into account the “loan purpose” descriptions that borrowers provide in their loan requests, even though these texts are not verified or legally binding. they then analyse the linguistic features of these descriptions, and show that well-established features related to creditworthiness (readability, objectivity, negativity, and deception cues) all meaningfully relate to loan repayment. Interestingly however, investors do not correctly interpret the economic values of all linguistic features, most notably deception cues. Finally, they show that these automatically extracted features can improve the predictive accuracy of loan defaults. This suggests that even though “texts” are often considered “soft” or “non-standard” information in finance, it can be quantified and standardised into credit risk modelling.
Full paper click Here