Latest News and Events

The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.
Posted: October 02, 2012
The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.
Posted: September 05, 2012
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num
Posted: June 30, 2012

Processed Kiva Data Set (Readable in Matlab)

Click here to download (1.4 GB).

If you are using this data set, please cite the following paper [BIBTEX] :
Understanding and Promoting Micro-finance Activities in Kiva.org, Jaegul Choo, Changhyun Lee, Daniel Lee, Hongyuan Zha, and Haesun Park, ACM International Conference on Web Search and Data Mining (WSDM), pages 583-592, 2014

Description

The Kiva data set contains a massive set of heterogeneous information about the following types of entities:

  • lenders or kiva users (1,174,383 in total)
  • lending teams (25,481 in total)
  • loans (564,177 in total)
  • field partners (254 in total)
  • borrowers (1,099,997 in total)

Entities of each type contain various information involving both unstructured data, such as image, video, and text, and structured data, such as geo-spatial, numerical, categorical, and ordinal data. For example, lender entities is represented in terms of its essential web profile data, e.g., a profile image, a registration timestamp, a geo-location, a lending count, an occupation, and other fields. Lending team entities also have its own information including a name, a team category (e.g., religious, common interest, etc.), a brief description, and a webpage URL. Finally, loan entities, which have the most rich set of information, are described by a loan description, a loan sector (e.g., agriculture, food, retail, etc.), a list of borrowers requesting the loan, a field partner, a geo-location, a loan amount, and posted/funded/paid timestamps.

In addition, the data set includes term-document matrices for texual fields, such as loans' descriptions, lenders' occupations, and lenders' and teams' loan_because, and teams' descriptions.

A complex set of many-to-many relationships are also available in the data set. For example, lenders may concurrently participate in more than one lending team and contribute to multiples loans. Field partners manage loans within their local region, while borrowers request loans from their local field partners. These relationships can be represented as various graphs between different entities, and the following two important graphs are directly available from the data set:

  • a graph between lenders and loans, which indicates who funded which loans (12,355,814 edges in total)
  • a graph between lenders and lending teams, which indicates the team membership of lenders (313,040 edges in total)