Machine Learning Analysis on Real Estate Values in Southern Hampton Roads

By Josh Glessner

Description and Background

Known as Hampton Roads, the 757 area code in southeastern Virginia is a unique blend of demographics, income levels and industries. Home to the 4 cities of Norfolk, Chesapeake, Portsmouth, Virginia Beach, it a widely diverse area, with a large percentage of economic activity focused on the nearby bodies of water, the Chesapeake Bay to the north, and the Atlantic Ocean to the east. There is a large naval base in Norfolk, beach tourism in Virginia Beach, commercial industrial activity in Portsmouth, and a large tract of fertile agricultural land in Chesapeake (which is advantageous with how close it is to a major port).

For this study, we will focus on the southern region of the Greater Hampton Roads area, defined as the area south of the I-264 Hampton Roads Bridge Tunnel (HRBT), as far south and west as Chesapeake City and east to the coast. The upper Hampton Roads area, Newport News and Williamsburg, will be omitted from this study for the sake of simplicity.

Where to Invest in Southern Hampton Roads?

The real estate market in Hampton Roads is complex to say the least. Combined with widely varying industries, a somewhat transient population, and a significant range in income distribution, there are many factors to consider when looking for an investment opportunity in southern Hampton Roads.

So Where is the best place to invest?

Conventional wisdom says, “By the Beach!”. Many Real Estate pros would generally agree with this statement, but for the best long-term investment in real estate, where can one find the highest return?

This analysis will use the latest in Machine Learning (ML) clustering techniques with a wide variety of data to explore that very question.

We will be exploring the correlations between home values and foursquare location data, and will work towards building a predictive model for possible business development opportunities or real estate investment possibilities.

The geographic distribution is skewed in the Hampton Roads area, with the western side (the cities of Norfolk and Portsmouth) being more densely populated than the eastern areas of Chesapeake and Virginia Beach.

It should also be taken into account that there are several large military facilities in the area. The largest naval base in the world is Norfolk Naval Station, located on the northwest tip of the above map. Dam Neck Naval Station (home of the famous Navy SEALs), Little Creek Amphibious Base, Fort Story and the airbase at NAS Oceana, home of the constantly buzzing F-18 fighters that can often be seen in the sky.

To the south, the population spreads out, and some of the land is used for farming. The border of North Carolina lies beyond to the south. The furthest southeastern area is largely undeveloped beach and wildlife refuges, which makes for some beautiful photo opportunities (both taken by yours truly!)

False Cape State Park, 3 miles from North Carolina border.
Virginia Beach Pier, 16th St.

The problem faced by many commercial builders and residential real estate agencies in the area is finding a financially viable area to invest in.

The Hampton Roads area is very dynamic in nature, boasting a significant military population with 5 major installations. The constant flow of military families departing the area on orders and new families coming into the area makes for a fairly stable real estate market when compared to other areas of Virginia.

In this study, we will utilize available data to determine viable real estate investment opportunities, as defined by value and return over a decade.

Data Description and Sources

For our analysis, we will use data from a variety of sources:

  • Using the Foursquare API, we will get data on neighborhoods within the Hampton Roads area, including type and popularity
  • Real Estate Valuation data from Zillow, including property values, current listings and rental indices.
  • Neighborhood json data from OpenDataSoft to define neighborhood boundaries.

Methodology

There were several pieces of data that needed to be gathered for this analysis. First, information on the neighborhoods and their boundaries were found from the OpenDataSoft repository, which gave me a CSV of the map coordinates of all the ‘neighborhoods’ in the region.

Zillow, a worldwide leading real estate data aggregator, published their data through their website. I was able to gather several CSV files that contained data such as:

  • Median Home Values
  • Growth (Month over Month, Quarter over Quarter, Year Over Year)
  • Rental Rates

We utilized the FourSquare API to gather location data on the area, including the type of venue (‘category’), its popularity, and its distance from a given neighborhood. From this data, we will attempt to draw conclusions about the correlation between the types of venues present near a neighborhood and its rate of growth in property value.

Utilizing a python folium library, we visualize the neighborhoods geographic distribution, shown below.

Because of the geographic distribution, we can see that there is a wide variance in the number of venues returned in a 1000M radius when we query the Foursquare API:

Some neighborhoods hit the 100 venue limit, while others do not come close.

Using Pandas, I created a table of the 5 most common venues in each neighborhood (within the 1000m radius)

After one-hot encoding the venue categories (so that the machine can perform the analysis, we will see how many clusters give us the maximum effect. Some call this the ‘elbow point’. When running the cost function for a variety of K-values, we find that 6 is our ideal, as we get a pronounced ‘elbow’, and 6 zones is a good number for our analysis.

We run the K-means unsupervised algorithm on the data, and get our 6 clusters. We can then plot them on the folium map, using the ‘Stamen Terrain” tileset:

After Studying the outputs of the most common venues for each cluster, we can put the neighborhoods into clusters of what they have in common:

  • Cluster 0 – Recreational areas. These are neighborhoods that have very close access to outdoor recreation. Many of the Top 5 Venues are things like, Trails, Parks, Beaches, Surf Spots, Baseball Fields.
  • Cluster 1 – Reveals part of the demographic makeup of Hampton Roads. There is a significant concentration of people of Eastern European descent in the area, something commonly known to residents here. These neighborhoods have significant Eastern European flair, as evidenced by their most common venues being dumpling and Eastern European Restaurants.
  • Cluster 2 – Only one neighborhood falls into this category, and it is far from most of the population. We will consider this an outlier.
  • Cluster 3 – The biggest cluster. This type of neighborhood is heavy with shops, restaurants and strip malls.
  • Cluster 4- These are the beach neighborhoods. Beach is their closest venue, with resorts and harbors nearby.
  • Cluster 5 – This cluster is mostly heavily residential areas, with a wide variety of venues nearby.

An interesting analysis, which gets to the heart of the problem is the real estate values and movements within each cluster. We find some very interesting trends.

Here, we see that the Zillow Median Home Value to its corresponding Cluster. There is no real surprises here, as the top home values are either in a Beach Area (cluster 4) or a ‘Recreational Area’ (Cluster 0). The Recreational areas are often located near water, so this is no major surprise.

Cluster LabelCluster AliasMean Zillow Home Value
0Recreational Area$444,083
1Eastern European Area$301,675
3Strip Mall/Commercial$219,742
4Beach Areas$476,675
5Residential Areas$246,850
Cluster vs Home Values

Results – Investment Opportunity

Where the real surprise comes, and the significant insight in this analysis is in the clusters that are the fastest growing in value. This metric is the Year Over Year growth in home valuations. Look at which clusters have the highest growth rates:

Discussion

When performing K-Means clustering analysis on the southern Hampton Roads area, we make a surprising discovery. While the Mean Home Values are highest near the beaches and recreational areas, like many expected, the real investment opportunity is in Clusters 0 and 3.

Cluster 0

Cluster 0 is the recreational areas of southern Hampton Roads, and they hold the second highest mean home value. What should be insightful for investors is that Cluster 0 also holds the second highest value growth rate in the area, at over 4% annually. This means a property purchased for $500k, in a decade will be valued at around $740k.

Cluster 3

The biggest investment opportunity for real estate investors in the southern Hampton Roads area is Cluster 3, the ‘Strip Mall Areas”. These areas are filled with shops, restaurants and necessities. Cluster 3 neighborhoods also boast a relatively low mean home value to start, with a median home value of less than $220k. At a 4.34% growth rate, a $200k home purchased now could be valued at over $305k in a decade. This is a significant real estate investment opportunity with a low barrier for investment.

Cluster 4

Cluster 4, while the highest median home values, is not the best investment opportunity, like many may think homes close to the beach would be. The maintenance and risks of tropical storms, along with only half of the growth rate vs Clusters 0 and 3, make this a low-growth cluster, and not one ripe for investment returns.

Conclusion

In this study, we have used data from multiple sources, the Foursquare API, Zillow, Opensoft data and others to determine the best real estate investment opportunities in the Southern Hampton Roads area. The neighborhoods identified as Clusters 0 and Cluster 3 represent the highest growth areas in the region.

The clusters were developed by using a K-Means unsupervised Machine Learning algorithm on data gathered using proper methodology.

Real Estate investors and agencies will find this information valuable, and hopefully will generate returns for their investors.

Leave a Reply: