Recently, I've been looking at a problem where data is joined with enrichment data stored in a Pandas dataframe. For each record being processed, a lookup is performed and the single record from the enrichment data selected based on a key. Essentially, this is a SQL join operation, but one of the datasets couldn't fit into memory. Having had experience with dataframes being slow (particularly when iterating through rows), I investigated whether the lookup would be faster if a enrichment data was stored in a Python dict as opposed to a dataframe. In my experiments, I was able to get a 70 times speed improvement using a dict over a dataframe, even when indexing the dataframe. The Python code used in the experiments is here: