Using Open Data to find a license plate

Samuel

2023-12-12

bash, data-analysis, machine-learning, python

I recently had the unpleasant experience of discovering someone had hit my shiny new red car. It happened on the parking lot of my partners job. You would imagine that if you hit a colleague you have the decency to leave your details, but they didn’t. So there we were, no details and significant damage.

Luckily we have a modern car, a car that records a video after detecting an impact.

The video contained the entire series of events, but it was impossible to read the other car’s license plate. At best we could make out 2 letters with some imagination, which is not enough for anything. So what to do, what to do, ..

After a bit of searching I discovered that the Netherlands has een open dataset for all the cars on the road. Including their license plate, make and model, color and other attributes.

So we got to work. While my partner used her machine learning skills to try and get more details out of the video. I went to work filtering down the dataset from 16 million cars, to hopefully one.

Below are the commands I used to filter the dataset down to about 1.500 cars. Lots of filtering was based on the video, but also some common sense was applied.

Download the Open Dataset from https://opendata.rdw.nl/Voertuigen/Open-Data-RDW-Gekentekende_voertuigen/m9d7-ebf2 and rename it to data.csv

Filter out all company cars

1	awk -F';' '$2 == "Personenauto" { print $0 }' data.csv > filtered-1.csv

Filter out all white cars

1	awk -F';' '$10 == "WIT" { print $0 }' filtered-1.csv > filtered-colour.csv

Filter out cars with 4 or 5 seats

1	awk -F';' '$9 == "4" \|\| $4 == "5" { print $0 }' filtered-colour.csv > filtered-seats.csv

Filter out cars heavier than 2.5T

1	awk -F';' '$14 < 2500 { print $0 }' filtered-seats.csv > filtered-mass.csv

Filter out expensive cars

1	awk -F';' '$22 < 60000 { print $0 }' filtered-doors.csv > filtered-price.csv

Filter out brand

1	awk -F';' '$3 == "AUDI" { print $0 }' filtered-price.csv > filtered-brand.csv

Filter out model

1	awk -F';' '$4 == "A4" \|\| $4 == "AUDI A4" { print $0 }' filtered-brand.csv > filtered-type.csv

Run a regex on the license plate

import csv
import re

# Define your regex pattern here
pattern = re.compile(r'^....PT')

# Open the CSV file
with open('filtered-type.csv', newline='') as csvfile:
# Create a CSV reader that expects a semicolon delimiter
csvreader = csv.reader(csvfile, delimiter=';')

# Iterate over each row in the CSV
for row in csvreader:
# Check if the first column matches the regex pattern
  if pattern.match(row[0]):
    # If it does, print the whole line
    print(';'.join(row))

After all the filtering and the final regex (based on the letters we could identify in the license plate) we had one car left. Success!

I’m honestly not sure if my insurance will accept this, but it was fun to do anyway.

(I did notice an odd behavior with sed on MacOS. For some reason it performs slower compared to Linux, but that’s the topic of another blogpost.)