EB5 – Technology vs Real Estate Investments

Recently some emerging technology companies have turned to EB5 capital raise to execute their growth strategy. Many investors do not realize that technology companies are offering a much better job creation opportunity and has a much higher overall value return potential.

Technology traditionally has been a powerful business enabler. However Artificial Intelligence, Blockchain, and Immersive Experience are disrupting Industries exponentially by transforming the Industry architecture itself. This fundamental shift is fueled by automation, machine cognition and new C2E (Consumer to Everything) engagement models.

In this article we will compare real estate and technology investments on how they stack up for an EB5 investor.

Criteria Real Estate Project Technology Company
Return on Incremental Investment Real Estate projects have a target project development cost. Without the full budget amount being available, the project cannot be completed and revenue cannot be realized. Technology companies on the other hand can put any amount of investment to use immediately and generate a return from it. This provides an opportunity for every investor to be assured of their return.
Risk of Green Card approval

 

Investment in EB5 is expected to take a huge dip soon after the November 21, 2019 deadline. You would not expect the same investors to put down 90% more for the same green card soon after the deadline but in due time we expect the EB5 candidates to return to investing.

Real estate projects run the risk of not filling their full capital raise before the deadline and thereby not being able to reach the finish line for job creation. This may put the green card approval at risk. Technology projects on the other hand can continue to provide returns and hire with incremental investments and weather the dip in EB5 investments. It is part of the technology business model to easily adjust course based on available capital.
Valuation Potential

 

Companies are valued based on their revenue, income, assets, pipeline of customers, and the market potential.

Real estate projects are valued with traditional valuation models based on mostly on the income and assets. The multiplier and the goodwill are very much limited to the location. Technology companies hold patentable intellectual capital that has a long term growth potential. Selecting a tech company with the right specialization or secret sauce usually can result in exponential valuation – the multiplier on revenue would 10 to 15 times.

 

Convert your loan to equity

 

Your EB5 investment is usually loaned to the Job Creating Entity (JCE) and the returns are in the form of interest payments at set annual rate. Investors do not get a share of the business. But that is different with technology companies.

Most real estate investments due to the limited valuation opportunity may not offer the option to convert your investment into a share in the business. Technology companies enjoy a much higher valuation after the initial growth years and since the initial investors were the enablers, it is common practice to offer a conversion to shares in the company at a discount price. For example after 5 years if your capital is returned at $500K, in addition to all the interest payments, then you have the choice to buy into shares of that company that will be immediately valued at $625K or higher based on the discount offered to you.
Potential Market Size and Geographical reach

 

This is important for the growth and resiliency of a business. A wider geographical reach means less dependency on the local market shifts.

Real estate is by definition a local business and is limited to the market available to it in that area. Even real estate at tourist attractions are dependent on the local tourism industry. Technology companies cater to the global marketplace across regions and countries. Usually the intellectual capital they create is applicable across the world. This is especially beneficial when economies take different turns in various parts of the world.
Job Creation Potential especially when the Green Card processing dates are not current. Real estate projects hire at a high rate during the construction phase and once the construction is completed operating the real estate does not require as many employees. Technology companies have a continuous growth as the number of customers increases and the employees hired are retained and developed for the long term.

And more importantly, the skills that a tech company develops in their employees is longer lasting and very attractive for USA economy.

 

As we see above the technology companies are undeniably a better EB5 investment opportunity But real estate may still be attractive to investors who like the comfort of knowing that their money is invested in a physical asset despite its limitations.

However in this new world where data is the oil, Artificial Intelligence is augmenting our collective intellect and our physical and digital worlds are merging, tech is the place to be for every EB5 Investor! Learn more at https://eb5.propmix.io

PropMix introduces Data-in-a-Box – a unique data service ideal for high-performance analytics and machine learning

MANHASSET HILLS, N.Y., Sep. 2, 2019 – PropMix.io, a real estate data and insights company, introduced a brand new way to interact with data using its Data-in-a-Box offering. Data-in-a-box is a cloud facility that provides easy and immediate access to very large amounts of property data to power various analytics and deep learning platforms in the real estate industry. Using this data-as-a-service will especially help lenders and appraises or any other real estate technology provider to reduce their internal data operations and leverage the economies of scale that PropMix is bringing to the industry.

PropMix has been diligently assembling the dream database for the real estate industry over the past several years and curating it with its artificial intelligence techniques. Their data quality improvement techniques include many patent pending capabilities to extract information from natural language and from real estate photos. The data lake now includes data on over 151 million properties, tax and assessment records, deed and mortgage records, foreclosures, and a lot more.

Many large companies in the mortgage and appraisal management market have their proprietary analytical and machine learning models that need access large amounts of data. These companies may already have a model that is proven in a local market with limited data and are ready to scale it up for use across the country. With Data-in-a-Box PropMix is enabling the growth strategies of such companies by offering cloud access to its proprietary data set. “Our customers can now focus on building value on top of the data instead of spending their time and money on gathering and standardizing data”, said Daniel Mancino, Vice President of Data Solutions and Sales at PropMix.

The real estate industry is undergoing a transformation with billions of dollars invested in PropTech each year. “With Data-in-a-Box offering, our goal is to accelerate innovation in the real estate industry by creating an environment where a company of any size can focus on creating their best machine learning and analytical models with seamless access to nationwide curated data”, said Umesh Harigopal, CEO of PropMix.io. “We are excited to invite all industry participants to leverage our high quality data to accelerate the ongoing transformation of the industry driven by AI and Blockchain.”

About PropMix.io

PropMix.io LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, PropMix.io empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

Media Contact: Sakeer Hassan, PropMix.io, 7329799507, sakeer@innovationincubator.com

PropMix announces discounted pricing for Valuation Expo and Appraisal Buzz members

MANHASSET HILLS, N.Y., March 18, 2019 – PropMix.io, a real estate data and insights company, has announced in conjunction with Valuation Expo, a unique offer to try out its Market Conditions Advisory (MCA) product for appraisers.

All participants and delegates at the Valuation Expo at Charleston, SC from March 19 to 2, 2019 will be eligible for a 20% discount on all contracts that are signed before March 30, 2019 for upto a duration of 12 months from the date of signing. “With this offer, we are providing significant value to the independent appraisers as well as large AMCs to experience and adopt the seamless data and insights access platform for appraisers”, said Daniel Mancino, Vice President of Data Solutions and Sales at PropMix.

MCA was first released in February of 2018 and a large number of appraises have been leveraging its single point access to data and insights nationwide. PropMix recently integrated its image recognition technology into MCA to automate and simplify certain mundane tasks for the appraiser.

About PropMix.io

PropMix.io LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, PropMix.io empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

Media Contact: Sakeer Hassan, PropMix.io, 7329799507, sakeer@innovationincubator.com

PropMix brings Image Recognition to the Real Estate Appraisal Industry

MANHASSET HILLS, N.Y., February 20, 2019 – Appraisal Vision is PropMix’s new addition to the suite of products and services it has been enabling for the real estate appraisal industry. Appraisal Vision is an image recognition solution using a deep learning engine that has been trained on terrabytes of real estate image data over the past couple of years. It enables the extraction of information from images which is used for enriching data and improving and validating home valuation with information in the home photos.

Appraisal vision can power many solutions such as fraud detection, appraisal validation, and automate some simple tasks for the appraiser such as ordering and labeling photos in an appraisal form. “We will continue to integrate appraisal vision into many applications under the Market Conditions Advisor brand” said Umesh Harigopal, CEO of PropMix. “Our goal is to reduce and appraisers mundane tasks and help them focus on high-value activities.”

The core technology that powers Appraisal Vision is a complex chain of cascading neural networks each a convolutional neural network. PropMix’s heuristic algorithms combine results from multiple deep learning engines to arrive at its final predictions on a real estate photograph. The neural networks have been trained over the last 2 years on about 22 terabytes of image data. This has resulted in accuracy levels of about 93% and it continues to improve.

About PropMix.io

PropMix.io LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, PropMix.io empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

Media Contact: Sakeer Hassan, PropMix.io, 7329799507, sakeer@innovationincubator.com

Bradie – The Bar is Now Set Higher for Digital Engagement in Real Estate

Redefining digital in real estate

We are excited to announce Bradie – Broker and Agent Digital Engagement platform – our suite of capabilities for the real estate marketplace. We are redefining the online real estate experience completely using AI and Machine Learning techniques. It integrates our various existing products that have been used by hundreds of agents and brokers.

We believe that the real estate agent has the opportunity to build lifelong relationships with homeowners as their trusted advisors on the largest investment of their lives. Bradie is our journey to help agents nurture that relationship and provide value to the homeowners. Visit

Bradie brings to market brand new ways of interacting with real estate information using computer vision. Home buyers using an IDX portal can now stop staring at loads of data about the homes they have shortlisted and instead see them side-by-side using pictures of each room and focus on what makes the homes different from each other.

 

 

Stale and weeks old Home Value Reports are a thing of the past. Our comparable market analysis engine – iCMALive – provides an engaging platform with live updates to their personalized value analysis as the market changes in their neighborhood – a new home on the market, a new sale, or a price change. All such updates can be screened by the agent in real time and Bradie will communicate with the customer on the agent’s behalf – building a trustworthy relationship with the customers. Visit bradie.propmix.io to get more details.

You have to see a demo of Bradie today to get a peek into a whole lot more groundbreaking features. Contact us at register.propmix.io or write to us at info@propmix.io.

Real estate data mining for your next business need using Public Records

Here are a few examples of how our public record data mining is leveraged

Comprehensive nationwide real estate public record data helps tackle various business needs in many industries beyond real estate. We mine terrabytes of real estate data to find the those needles or patterns in the haystack. This post covers a few real estate and non-real estate use cases we are actively supporting.

As previously announced, our public record data provides a comprehensive set of property attributes such as owner occupancy, last sale information, and more detailed tax assessment information along with full property details. It also provides property identification, seller/buyer information, tax exemption details, building information, and legal description of the property.

The valuation models and comparable similarity scoring are now based on authoritative property details and current market conditions from listing data.

Marketers in any industry

PropMix real estate data is very well suited for finding target customer base for many businesses. Here are a few examples:

  1. A skylight company recently needed information on all homes that have a skylight so that they could offer upgrades or servicing options
  2. A flooring company is able to provide an automated estimate of carpeting or hardwood flooring costs using our building area information
  3. An insurance company is able to target customers who have lived for more than 10 years in a home to consider modifying their insurance coverages

Real Estate Investors

Investors are interested in finding undervalued homes in good rental markets to buy and convert them to income generating rental properties.

  1. We identify tenant occupied properties in each neighborhood in the country and find areas where rental demand is increasing
  2. We then find owner occupied homes in these areas that can potentially be converted to investment properties.

Our data can also power a full investment pro forma including the total cost of ownership and return on your investment.

realestate
Mortgage Industry

Appraisers and lenders need information to accurately assess the risk of a collateral before underwriting a loan –  purchase, refinance, and/or home equity.

  1. Appraisers improve the accuracy of their valuations using extensive assessor recorder property data and comparable sales from public records – including new homes sales and/or owner sales not in the Multiple Listing Services.
  2. Underwriters or lender reviewers can run their appraisal review and AVMs using:
    1. Transaction history on a property
    2. Comprehensive report of the property details

As we continue to solve additional business problems we will provide updates on this blog on new and creative ways in which our customers are mining our data.

PropMix launched Market Conditions Advisor – a recommendation & analytics platform for Appraisers

MANHASSET HILLS, N.Y., February 6, 2018 – PropMix.io, a real estate data and insights company, has announced the general availability of its Market Conditions Advisor (MCA) product for appraisers. MCA  provides a single user interface to research and analyze property records from across the USA including current and past sales information to help appraisers generate analytics required for there GSE forms.

MCA comes packed with numerous features that have been developed using feedback from appraisers on the field. The product includes automated comparable recommendations powered by its customizable similarity scoring mechanism and further allows appraisers find comparables using a number of methods that merge listing data and public records. All the research and analytics can be easily exported to many formats that can be directly consumed by the most common appraisal forms software. “With the MCA integrations we are building, appraisers can now access all the data they need from within their favorite forms software without having to switch to the MLS portals.” said Daniel Mancino, Vice President of Data Solutions and Sales at PropMix. “A single access point makes it even more beneficial in cities where multiple MLSs serve the same location”.

MCA is powered by PropMix’s data and insights platform built using decades of experience in AI and machine learning. MCA will grow in the coming months both geographically as more Multiple Listing Service (MLS) relationships are added and more functionality as more machine learning insights are introduced into the product. The application also provides the images of the properties as well as the listing history. “We are developing MCA as a brand for the appraisal industry and this is the beginning of our pursuit to complement and augment the appraiser’s capabilities with real world decision making powered by the PropMix cognitive engine for real estate” said Sakeer Hassan, CMO for PropMix.io. 

About PropMix.io

PropMix.io LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, PropMix.io empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

 

Media Contact: Sakeer Hassan, PropMix.io, 7329799507, sakeer@innovationincubator.com

Improve the Quality of Your Real Estate Data

Part 2 – How to improve Real Estate Data Quality?

 

In Part 1 of this series we broadly covered why data quality is important in real estate, why real estate data quality has become a hard problem to solve, and presented a few examples of how to measure the quality of your real estate data. In this second and final part we will present a few ideas on how you could begin the practice of improving real estate data quality.

Data Quality Best Practices

As you would expect data quality is a common problem in many other industries irrespective of how old or new the industry is. As a result many best practices already exist for managing and improving data quality that can be easily adopted within real estate. Here are a few important areas to focus on.

Data Quality Assessment

Before we can start improving quality we need a solid understanding of the current state of the data. As we presented in the last section of Part 1, knowing how to measure for the quality of your data is a first step. These data quality metrics are very specific to the industry we are in and we have provided a few good starting points.

 

In addition, to knowing your current state a good data quality assessment practice is required to assess yourself periodically to measure improvements and also measure any data quality leaks due to data trickling into your platform. It is also a great way to present to senior management on the strides you are making in your organization.

 

Design of the quality metrics needs to be traceable directly to your company’s business objectives which would be different depending on where in the real estate market you play – lead generation, mortgage origination, appraisals, brokerage, etc. Such a traceability is important to get buy-in from the management to invest in data quality.

Data Governance
To have a strong commitment from the organization towards data quality and to continuously support the people, processes, and technologies to maintain the data quality a data governance board must be established with participants from the business and IT. Business participants would be those who are close to the consumption and production of data and the IT participants would be the data architects and modelers. The objectives of the governance board would be to

 

  • Establish data policies and standards
  • Defining and measuring data quality metrics
  • Discover data related issues and provide resolution paths
  • Establish proactive measures to reduce data quality leakage

Data Stewards

One of the most important roles within a data governance board and the overall data management practice is the Data Steward. Data stewards are the ultimate owners of specific sections of the data – usually called subject areas, and they would represent business users and producers of data. The buck stops with the data steward for all data quality issues and the steward takes a leadership role to resolve data accuracy, consistency, or integrity issues.

 

Data stewards are often the liaisons between the business and the IT department that manages the data for the business. In this role, they are required to work with the business and IT to define relevant quality metrics, have it interpreted and implemented appropriately with the IT department and ultimately showcase their quality improvements that improve business outcomes.

Create a Data Quality “Firewall”

Most data resulting within an organization are traceable broadly to 2 types of sources – applications where users are entering data or data feeds that are processed to load data into data stores.The idea of a data quality firewall is to catch and reject any data that violates data quality rules at the time of its entry into a data store. All data ingestion points will have to hit this one virtual firewall to be validated before being processed and stored.

 

The keyword above is “virtual” – because it is impractical to create a single system to act as a data quality firewall given the various subject areas of data and the departmental data ingestion points across the organization. The idea is not to create a choke point but a proactive mechanism to catch data quality issues for follow up and resolution before it goes downstream into transactional or analytical systems.

conclusion

Data Standardization vs. Data Quality – What’s the difference

Does compliance to a data standard mean high data quality? In other words, if your data is Platinum level certified by RESO 1.5 data dictionary would you also considered it to be of high quality? It turns out the answer is not that straightforward.

 

There are typically 2 different views on data quality – conformance to a standard specification or usability of data for a specific purpose. If we take the first definition the data quality would be very high if a data set is certified by RESO. On the other hand as we discussed in Part 1, an agent could inadvertently enter erroneous listing data or purposefully tweak the listing for improved marketability. This can result in data inconsistency between a public record and a listing record for the same property leaving the user of the data to assign trustworthiness to the data sources before consumption. Since business objectives are driven by data use as opposed to conformance to a standard we prefer the second definition of data quality which is measured by its usability.

 

Consider another example of standard vs. quality: Assignment of a PropertySubType value of Condominium or Townhouse or Single Family Residence is standards compliant but an erroneous assignment of this field can cause the property to be missed from appearing in IDX searches. In addition, it can also cause valuation issues if not combined and cleansed against other data sources.

 

Having said that, certain standards specifications include elements of data use as well, in which case conformance to standards and usability begin to mean the same. But given the various uses of a particular data set it is in unfair to expect a standards organization to completely define all the usability of specs for the data resulting in an unwieldy standard that may reduce its adoption.

 

Here are some typical data quality concerns to consider:

Completeness Are we missing any values of critical fields?
Validity Is the data in a field valid? Does the whole record match my rules?
Uniqueness How much of our data is duplicated?
Consistency Is information consistent within a single record, across multiple records, and across multiple data sets?
Accuracy Does the data represent reality?
Temporal Consistency & Accuracy Does a snapshot in time represent reality at that time and are all data sets consistent with that snapshot?

 

As you can see, a data standard such as RESO would not be able to answer the above for all the real estate ecosystem players. We could define detailed rules for each of the concerns above and such rules will look different in a mortgage company and a sales lead generation company.

Practical data quality for real estate

Now let us bring all this down to a few specific takeaways to improve the quality of data in your company. We will define these in a few steps to begin with. But certainly stay tuned into our blog for future posts on this topic where we will continue to provide specific rules and heuristics you could implement.

 

Many of the activities below must be driven by an appointed data steward for each major data set you are dealing with – assessment, listings, deeds, mortgages, permits, etc.

Identify critical fields

The first step in your data quality journey is to identify the most critical fields for your particular application. Out of the 639 fields contained in the RESO 1.6 data dictionary, you would want to identify the fields that are required for your computations. There are some fields commonly required for any application and were listed in Part 1 of the article and repeated here for quick reference:

 

Parcel Number ListingContractDate AssociationName
Address StandardStatus AssociationFee
PropertyType OriginalListPrice Subdivision
PropertySubType ListPrice School Districts
Lot Size CloseDate
Zoning ClosePrice TotalActualRent
NumberOfBuildings DaysOnMarket
BedroomsTotal ListAgent Information
BathroomsTotal ListBroker Information
LivingArea SellingAgent Information
Tax Year SellingBroker Information
Tax Value Public Remarks
Tax Amount
Land Value
Improvement Value
StoriesTotal
ArchitectureStyle

Define Data Quality Rules

The next step is to define a set of rules that will consider 2 dimensions to begin with:

 

Data Quality Concerns: Completeness, Validity, Uniqueness, Consistency, Accuracy, and Temporal Consistency & Accuracy.

Extent of measurement: Single record, multiple history records of the same property, multiple history records of the same listing, multiple data sets (public records and listings)

 

You would end up with rules for each field, for each type of record, for a data set, and rules that cut across multiple data sets. These rules would validate the field, a record, a set of records, or the whole data set. Execution of these rules would result in either errors or warnings about the quality of your data.

Discovery with Data Profiling

Data Profiling helps you run a statistical analysis on the data to discover hitherto unknown problems

For example, we usually expect PropertySubType values to be always one of the known ones. But as new data gets processed, we might discover that certain PropertySubType mappings are absent in our standardization routines and as a result non-standard PropertySubTypes may be getting added to our DB.

 

To catch such issues, a data profiling capability will provide detailed stats on field populations, null counts, blank counts, and also field value distributions. For the PropertySubType values, the field value distribution will reveal to us that there is a new PropertySubType value with over 100,000 entries. This will mean that we should remap these values as required.

 

Running a data profiler periodically will help identify issues that creep up into the data. Note that a data quality firewall would only prevent “unclean” data when we have modeled such cleansing rules or quality rules within that firewall. But for previously unknown issues that get loaded via daily incremental data ingestions, we need to discover the issues and model prevention rules into the firewalls.

 

Establish Data Quality Metrics

Having defined the rules it is time to measure your quality against the rules you have established. Common quality metrics are:

  • Number of records that failed a particular quality rule
  • Field population thresholds and where we fall short
  • Field value distributions
  • Number of records with invalid data for each field
  • Number of records that failed a record level quality rule
  • Number of multi-record quality rule failures
  • Number of data-set level quality rule failures

 

For each of the above it is important to understand the trends and so you need to run the Data Profiler in regular intervals – weekly or monthly, to know how your data quality is trending – improving, getting worse, or discover issues that did not exist before.

Enforce the rules at the data ingestion points

This is the first and proactive step in improving and maintaining high quality of data.

 

Having defined the rules for measuring data quality, it is now important to maintain a higher quality data by ensuring we enforce these rules at the time data is created in the organization. Get the data steward to become the evangelist for the rules he/she has defined to work with each data origination point to implement the validation rules.

Define Heuristics for Quality Improvement

The reactive posture to data quality improvement is considered more of a data cleansing process and is a required element of a data quality practice. Most of the times, you are not in control of the data origination points and if the rule enforcement at the data origination point is too restrictive you might not have enough data for your applications. And hence the need for a reactive measure to cleanup data you have received.

 

There are broadly 2 alternatives – either perform the cleanup and then put it through a highly restrictive data quality firewall or have a lenient firewall with a downstream cleaning process. The choice depends very much on your application and its ability to deal with imperfect data.

 

Any data quality improvement mechanism is dependent on a set of heuristics that the data steward and the data architects work together to define. For example, you could reclassify a rental listing correctly by looking at the listing price and comparing it to local median sale price and to the median rental price. A strong partnership between a data steward and the data architect is necessary to define and develop these cleansing heuristics.

 

It is also recommended that you maintain a list of all active and retired heuristics used for cleansing. Another need alongside data cleansing is the ability to track the data lineage where you would keep track of the source of the cleansed data and the heuristics that caused the data to be modified.

Conclusion

Data quality is a cyclical process that begins with establishing rules, implementing them to measure quality, profiling the data, cleaning up the data as required, and finally go back to tweaking the rules to execute the cycle once more. The target metrics would start small but continue to tighten it with time.

firewall

 

We hope this article provided an overview and some key takeaways to implement a good data quality practice within your real estate technology platform. We will continue this conversation with more blog posts to provide you:

  • Practical data quality rules and metrics
  • Data cleansing heuristics to implement
  • Machine learning techniques in real estate data cleansing

 

We are planning to release our Data QA Tool specialized for real estate data free to the community. Please sign up here to be notified when the tool is released.

Want access to Data QA Tool?

Please provide your email to be alerted when Data QA Tool is published.

Improve the Quality of Your Real Estate Data

Part 1 – The Real Estate Data Quality Problem Part 2

Introduction

Real Estate data comprises of many categories – characteristics of a property, history of the property and how did it change during its lifetime – renovations, add-ons, permits, etc., current for sale properties, history of sale records, history of tax assessments, current mortgage information, any outstanding liens, utility consumption, neighborhoods, schools, and the list goes on. You can see that there is data about a real property and a lot of additional data about how the property is influenced. And as you read through that partial list of data categories you would have also observed that each of those categories are created and maintained by a different company or a government agency. Given this disparate sources of data and how the real estate industry has evolved, assembling all this in one place to know everything about a single property has become a challenge. Before we begin explaining why this is a challenge, let us briefly explain who uses this data and why this is so important.

Relevance of Data Quality in Real Estate

Housing alone contributes about 15-18% to the GDP of the US economy [1]. If you consider commercial real estate the numbers climb to well over 20% [2]. The real estate ecosystem is comprised of numerous industries and each of them are dependent on data. Here are a few of them in the table below.

 

Producers & Consumers of Real Estate Data

Local Municipalities
County Governments
Federal Agencies
Mortgage Lenders (Banks, Credit Unions)
Mortgage Brokers
Mortgage Servicers
Investment Banks
Appraisers
Home Inspectors
Title Companies
Real Estate Brokers and Agents
Home Buyers and Sellers
Home Improvement Companies
Home Improvement/Repair Contractors
Builders and Developers
Architects
Civil Engineers
Investment Banks
ETF and Fund Managers
Retirement and/or Sovereign Funds
GSAs – Freddie Mac and Fannie Mae

Here is one reason why data quality matters across these players: Consider the loan processing steps in home buying. The homebuyer applies for a mortgage at a lender and the lender’s underwriter hires an appraiser to determine the actual value of the property before lending a percentage of that value (a maximum of 80% in most cases) to the buyer. Once the mortgage is issued it is often transferred to a mortgage servicer and the mortgage itself is sold to another financial institution to enable securitization of the loan. Securitization enables other investors across the world to participate in the US mortgage market and in turn in the US real estate market. Each party in this chain of activities and especially the investor in the security needs to understand the security’s Value at Risk (VAR) which is directly dependent on the value of the home among many other categories of risk such as borrower risk, market risk, and so on.

data quality

Home valuations are dependent on the property’s characteristics, recent sales in the market, current inventory of homes, neighborhood information, recent development and employment activity in the area, and many more such factors. As you can see accurate and consistent real estate data is highly important to arrive home valuations of high degrees of confidence for every player in the ecosystem.

For instance, consider a property with 4 Bedrooms, 3 Baths, 2,500 sq. ft. living area, on a one acre lot is listed in the MLS as a 5 Bedroom, 3 Bath property since the agent counted an additional room in the basement as a bedroom. By comparing this to other 5 Bedroom, 3 Bath comparable properties, the subject property could get overvalued or other properties can get undervalued if the list price of such a property is used as a comparable. Similarly, the subject property being compared to another one in better condition, or missing out on improvements made to the kitchen or the basement, will reflect an inaccurate value in an appraisal. As a result an appraiser tries not to solely depend on the MLS listing data for their work; she supplements it with onsite inspections to collect detailed information. Appraisals are thereby delayed and it further cuts into the profit margins in the appraisal business. Much worse, this has a direct bearing on the ability of the homeowner or the buyer in closing the transaction. So, unreliable data sources inadvertently exert strong influence on the whole process.

Why is it difficult to maintain data quality in Real Estate?

Of all the sources of information about a particular property, the most dependable data is that made freely available in most counties in the US via the public records act in each state. That covers tax assessment, deeds, mortgages, liens, etc. These data sets are again completely independent typically tied together by an APN (Assessor’s parcel number). But each county or municipality creates and maintains this data in their preferred model even though conceptually they all cover the same types of information. Integrating data from over 3000 counties across the country and unifying them to a single data model is one necessary step to ensure data consistency can be maintained across all properties.

 

Real Estate listings data gathering, on the other hand, has been a wild west even with the Real Estate Transaction Standard (RETS) maintained by the National Association of Realtors (NAR), which only provides a protocol standard for data exchanges but not a payload standard for the data actually exchanged. Enter Real Estate Standards Organization (RESO) with the RESO standard data dictionary and it has immensely improved consistency in data representation across the various players. But RESO does not address the types of home valuation related data issues discussed earlier (we will presented why RESO is justified with that position in Part 2 of this article). The MLS data capture platforms most often do not enforce any data consistency rules within the system or with the local county/municipality data. Even though a Board of Realtors or MLS may have a recommended format, there could be hundreds, if not thousands of agents, brokers and their assistants that could submit a listing. Much as no two people are alike, their choice of words and descriptions of key features could vary. The description of features is another common area where subjectivity is prevalent. For every person that calls a home a “fixer upper”, another person will say it is “an incredible value, with lots of potential”.

 

data source

Inaccuracies in the data can be introduced through other means as well. Property characteristics are largely affected by this. Real estate appraisers require the Gross Living Area (GLA) of a home to be the “Above Grade” square footage, which would be how the assessor would report it, but when the property is listed, the Living Area is often, inclusive of finished basements, which can be misleading. Even though the intent is not to create a wrong listing, misinterpretation of the data creates tricky situations during the appraisal process. Data entry errors can create a listing with the wrong number of bedrooms or bathrooms, living area or lot size area. When there are several hundreds of fields to update for a listing and time is limited, these errors tend to multiply exponentially.

 

Know Your Data – Measure its Quality

As we explained in the previous sections, data quality in real estate is much required but hard to achieve given the integration complexities across the various players. Identifying the individual root causes and fixing them can take a long time, but in the meantime we can try to improve the quality of current data to achieve immediate business objectives.

 

Before we can “cleanse” the data to improve its quality, we need to be able to identify how bad is the data at hand using a few applicable metrics. It is important to understand that the target quality and the metrics to measure it by depends a lot on the target use for the data. For example, a selling agent is most interested in data related to property characteristics, financing terms, showing instructions, etc. but a home improvement company would be interested in property features, property improvements, etc. Here are a few suggested common metrics to measure the quality of real estate data.

 

Field Population Statistics with specific focus on the following fields from the RESO standard data dictionary.

Parcel Number ListingContractDate AssociationName
Address StandardStatus AssociationFee
PropertyType OriginalListPrice Subdivision
PropertySubType ListPrice School Districts
Lot Size CloseDate
Zoning ClosePrice TotalActualRent
NumberOfBuildings DaysOnMarket
BedroomsTotal ListAgent Information
BathroomsTotal ListBroker Information
LivingArea SellingAgent Information
Tax Year SellingBroker Information
Tax Value Public Remarks
Tax Amount
Land Value
Improvement Value
StoriesTotal
ArchitectureStyle

 

Address Standardization measures the extent to which the address components for a property are usable to uniquely locate a property or helps in deriving a high accuracy geocode.

Geocode Accuracy is sometimes required to support accurate property searches for radius or polygon searches. Rooftop accuracy may be required for certain applications but a street side geocode might suffice for many.

Listing Duplication must be reduced as much as possible again depending on the application but at the least listings from the different MLSs will need to be linked with a common unique property id.

Raw listing data from an MLS will trickle in with multiple updates and improving in quality over the first few days or weeks of a property being listed. Listing history records may have to be merged to improve data consistency.

Often a listing will move into a Cancelled/Withdrawn status before it is recorded as Sold. In such cases the listing history data may require a consolidation to drop the superfluous status transitions.

Very often sale and rental listings may get mixed up in different RETS resources/classes. It may be required to reclassify such listings appropriately.

Click here to continue to Part 2 of this article which explores the following ideas.

  • Data Quality Best Practices
  • Data Standardization vs. Data Quality – What’s the difference
  • Practical data quality for real estate

References

Please click here to provide your contact information to be alerted when Data QA Tool is published.

Want access to Data QA Tool?

Please provide your email to be alerted when Data QA Tool is published.

Improve the Quality of Your Real Estate Data

PropMix published Part 1 of our latest Point of View series of articles called “Improve the Quality of Your Real Estate Data” earlier this week. The first part of the article covers in detail the relevance of data quality in Real Estate, why it is difficult to maintain good quality in Real Estate data, and ways to measure the data quality. Click here to read Part 1 of the article by Daniel Mancino, Sakeer Hassan, and Dr. Umesh Harigopal.