The FCC provides databases of license information called the Universal Licensing System (ULS). The data is useful for number of reasons and comes in a pipe-delimited format.
The FCC provides online search capability, however it can be slow at times. I also wanted a way to manipulate the data in ways not available using the FCC online search tool. My goal was to get this into an SQLite database as both a learning experience and to prepare for experimenting with large datasets.
The ULS data files obtained from the FCC contains pipe-delimited data, which can be imported into any common spreadsheet application such as Microsoft Excel. In order to get this converted to valid SQLite schema, I needed to understand the fields.
The FCC provides a PDF file containing the definitions. Since the PDF file is over 80 pages of information, I wanted to come up with a programmable way of extracting the definitions from the PDF file. I tried a few different methods using various PDF modules available in Python, but nothing was reliable. The tables in the PDF file in particular caused most of the headaches in trying to come up with a reliable parser.
Sheepishly, it took me a bit to realize that the FCC also provides a plain text SQL definitions file, which could be reliably parsed.
create table dbo.PUBACC_A2 ( Record_Type char(2) null, unique_system_identifier numeric(9,0) not null, ULS_File_Number char(14) null, EBF_Number varchar(30) null, spectrum_manager_leasing char(1) null, defacto_transfer_leasing char(1) null, new_spectrum_leasing char(1) null, spectrum_subleasing char(1) null, xfer_control_lessee char(1) null, revision_spectrum_lease char(1) null, assignment_spectrum_lease char(1) null, pfr_status char(1) null )I used this information to extract the record type, which is the two characters after the underscore in the table name. For example, in the above table
dbo.PUBACC_A2
, A2
is the record type. The fields are then listed in the table. I used this information to create a parser to build a table of valid record types, accounting for any new record types and associated fields that might be added or removed
by the FCC in the future.
Once valid record types could be identified, I was able to run through a ULS database and carve out valid record types and place them into CSV files, one per record type. Part of this process included adding a header row to the CSV file, containing all the columns which will be used to create the SQLite table schema.
For the SQL schema, my goal was to use a single database, and each table within would correspond to a valid record type from the the corresponding CSV file. Inside the table would be the valid columns generated from the CSV file header row for each record type.
With the generated CSV files for each record, I used the sqlite-utils
Python module to import the data into a SQLite database.
Once the data is populated into an SQLite database, you can perform any number of SQL queries against it to access the data in meaningful ways. Find expired licenses, locate frequencies assigned to businesses, and more.
It was a fun project, with plenty of room for improvement. Some of the larger databases take a while to process and would probably benefit from some multiprocessing to speed things up. I made an initial attempt at making this happen, but I'm still wrapping my head around the concept.
You can download the tool here.