The dictionary in a dictionary was clever trick. I did change
if re.compile("-?\d+(\.\d+)?").match(values[3]):
to be more general , using ret instead of value[3]
Also there are cases in CRSP extracts where Tickers are missing (read in as NULL I think) if the date range is long enough for a few PERMNO cases, so I used
if ticker and re.compile("-?\d+(\.\d+)?").match(ret):
another option is to use
isinstance(ret,(int,float))
to test for a true number
Finally, I am fairly sure that opening a file as ‘rb’ avoids any need to worry about \n breaks. And it is probably not worth using for this case, but python has a nice CSV module for reading delimited text files. I thought it could both read in large chunks, or iterate through row by row, but looking at the documentation I am not sure about reading row blocks. I have used the csv.writerows() method as a very fast writer of row blocks.