Skip to main content

Posts

Showing posts from July, 2020

Using Pandas dataframes to perform a lookup

Recently, I've been looking at a problem where data is joined with enrichment data stored in a Pandas dataframe. For each record being processed, a lookup is performed and the single record from the enrichment data selected based on a key. Essentially, this is a SQL join operation, but one of the datasets couldn't fit into memory. Having had experience with dataframes being slow (particularly when iterating through rows), I investigated whether the lookup would be faster if a enrichment data was stored in a Python dict as opposed to a dataframe. In my experiments, I was able to get a 70 times speed improvement using a dict over a dataframe, even when indexing the dataframe. The Python code used in the experiments is here:

Neo4j 4.1 in Docker

Getting Neo4j 4.1.0 to work in Docker has been a real struggle! The docker-compose file was: version: "3" services: neo4j: image: neo4j container_name: neo4j ports: - 7474:7474 - 7687:7687 environment: - "NEO4J_AUTH=none" Note that authentication has been turned off, so just login with a blank username and password. The browser kept returning: WebSocket connection failure. Due to security constraints in your web browser, the reason for the failure is not available to this Neo4j Driver. Please use your browsers development console to determine the root cause of the failure. Common reasons include the database being unavailable, using the wrong connection URL or temporary network problems. If you have enabled encryption, ensure your browser is configured to trust the certificate Neo4j is configured to use. WebSocket `readyState` is: 3 To solve this, when you login change neo4j:// to bolt://