From Michael Boldin (WRDS):

The dictionary in a dictionary was clever trick. I did change  

   if re.compile("-?\d+(\.\d+)?").match(values[3]):

to be more general ,  using ret instead of value[3]

Also there are cases in CRSP extracts where Tickers are missing (read in as NULL I think) if the date range is long enough for a few PERMNO cases, so I used

        if ticker and re.compile("-?\d+(\.\d+)?").match(ret):

another option is to use

     isinstance(ret,(int,float))

to test for a true number

Finally, I am fairly sure that opening a file as ‘rb’  avoids any need to worry about \n breaks.  And it is probably not worth using for this case, but python has a nice CSV module for reading delimited text files. I thought it could both read in large chunks, or iterate through row by row, but looking at the documentation I am not sure about reading row blocks.    I have used the csv.writerows() method as a very fast writer of row blocks.

Back to Stat 956 page.