It fell to me to script the removal of the comments. Being a bit of a python fan, I went searching for some pythonic regexp-based comment remover. I found a C decommenter here, but it needed a few modifications to work with verilog comments which I present below.
#! /usr/bin/env python
# remove_comments.py
import re
def remove_comments(text):
""" remove c-style comments.
text: blob of text with comments (can include newlines)
returns: text with comments removed
"""
pattern = r"""
## --------- COMMENT ---------
/\* ## Start of /* ... */ comment
[^*]*\*+ ## Non-* followed by 1-or-more *'s
( ## group 1
[^/*][^*]*\*+ ##
)* ## 0-or-more things which don't start with /
## but do end with '*'
/ ## End of /* ... */ comment
| ## -OR-
//[^\n]* ## // comment to end of line
| ## -OR- various things which aren't comments:
( ## group 2
## ------ " ... " STRING ------
" ## Start of " ... " string
( ##
\\. ## Escaped char
| ## -OR-
[^"\\] ## Non "\ characters
)* ##
" ## End of " ... " string
| ## -OR-
##
## ------ ANYTHING ELSE -------
. ## Anything other char
[^/"'\\]* ## Chars which doesn't start a comment, string
) ## or escape
"""
regex = re.compile(pattern, re.VERBOSE|re.MULTILINE|re.DOTALL)
noncomments = [m.group(2) for m in regex.finditer(text) if m.group(2)]
return "".join(noncomments)
copyright = """// --------------------------------------------------------------
//
// My Company Inc. - Confidential Information
// Copyright 2005-2008
//
// --------------------------------------------------------------"""
if __name__ == '__main__':
import sys
filename = sys.argv[1]
code_w_comments = open(filename).read()
code_wo_comments = remove_comments(code_w_comments)
#fh = open(filename+".nocomments", "w")
#fh.write(code_wo_comments)
#fh.close()
print copyright
print code_wo_comments
First of all, I added a bit to the regexp to spot one-line comments that start with
//
- as mentioned in the perl FAQ - see the emphasised section in the above code.I also got rid of the single quote string matching section of the regexp because verilog doesn't have such strings. It was also accidentally matching the code between two number specifiers which prevented the removal of the comments in what it thought was a string. For example, the comment below would not be removed:
assign a = 1'b0;
// Some comment
assign b = 1'b1;
The regexp itself saves two groups; group 1 is comment group and group 2 is a non-comment group. Printing group 2 is the thing to do if you want the comments removed. If the regexp matches a comment, then group 1 is text and group 2 is empty - printing group 2 effectively "removes" the comment. If the regexp matches a non-comment, then group 2 is text we want to keep, so we print it.
This decommenter script is used as part of an overall script which prepares our code for handover. The RTL is
exported
from our CVS directory, decommented and tar.gz
'd - ready for secure FTPing to our customer...