Application Data Integration: 2013

Timezone and DST

A list of US 'time zone IDs' for use with java.util.TimeZone
If you program in Java and have to convert dates and times between different timezones then you will know that the Date, Calendar & TimeZone objects of the java.util package are the way to go.The JavaDoc for java.util.TimeZone mentions that you can use a 'time zone ID'of "America/Los_Angeles" to get the US Pacific Time. It doesn't give examples of any other time zone ID, so here is a list of the standard US time zone IDs:
:

Area	Standard	DST	Std Abbrev	DST Abbrev	Zone Name
Eastern Time	UTC-5	UTC-4	EST	EDT	America/New_York
Central Time	UTC-6	UTC-5	CST	CDT	America/Chicago
Mountain Time	UTC-7	UTC-6	MST	MDT	America/Denver
Mountain Time (no DST)	UTC-7		MST		America/Phoenix
Pacific Time	UTC-8	UTC-7	PST	PDT	America/Los_Angeles
Alaska Time	UTC-9	UTC-8	AST	ADT	America/Anchorage
Hawaii-Aleutian Time	UTC-10	UTC-9	HST	HDT	America/Adak
Hawaii-Aleutian Time (no DST)	UTC-10		HST		Pacific/Honolulu

Note:Daylight Saving Time runs from the first Sunday in April at 02:00 until the last Sunday in October at 02:00 in all zones.(Table sourced from Statoids: Time Zones of the United States.)

Useful *nix Commands

Large Files

Find files larger than 10MB in the current directory downwards…

find . -size +10000000c -ls

Find files larger than 100MB…

find . -size +100000000c -ls

Old Files

Find files last modified over 30days ago…

find . -type f -mtime 30 -ls

Find files last modified over 365days ago…

find . -type f -mtime 365 -ls

Find files last accessed over 30days ago…

find . -type f -atime 30 -ls

Find files last accessed over 365days ago…

find . -type f -atime 365 -ls

Find Recently Updated Files

There have been instances where a runaway process is seemingly using up any and all space left on a partition. Finding the culprit file is always useful.
If the file is being updated at the current time then we can use find to find files modified in the last day…

find . -type f -mtime -1 -ls

Better still, if we know a file is being written to now, we can touch a file and ask the find command to list any files updated after the timestamp of that file, which will logically then list the rogue file in question.

touch testfile

find .  -type f -newer testfile -ls

Finding tar Files

A clean up of redundant tar (backup) files, after completing a piece of work say, is sometimes forgotten. Conversely, if tar files are needed, they can be identified and duly compressed (using compress or gzip) if not already done so, to help save space. Either way, the following lists all tar files for review.

find . -type f -name "*.tar" -ls

find . -type f -name "*.tar.Z" -ls

Large Directories

List, in order, the largest sub-directories (units are in Kb)…

du -sk * | sort -n

Sometimes it is useful to then cd into that suspect directory and re-run the du command until the large files are found.

Removing Files using Find

The above find commands can be edited to remove the files found rather than list them. The “-ls” switch can be changed for “-exec rm {}\;”=.
e.g.

find . -type f -mtime 365 -exec rm {} \;

Running the command with the “-ls” switch first, is always prudent to see what will be removed.
The “-ls” switch prints out summary information about the file (like owner and permissions). If just the filename is required then swap “-ls” switch for “-print”.

GREP Reference

GREP Reference
GREP (General Regular Expression Program) is a search tool for finding strings in text files. I frequently use it for finding specific pieces of code in the source files.

This section of the document describes some of the ways I use GREP, and some of the helpful options and types of regular expressions which make GREP more than just a string finder.

One thing to keep in mind is that GREP is case sensitive.

If you want to pass multiple options to GREP, it expects that you will concatenate the options and use only one -. The order of the options does not matter, but the options must be specified before the regular expression.

Searching through all your source files (*.?)
Since a typical project uses only two types of source files, C (.C) and header (.H), you can tell GREP to search your entire set of source files by using *.? to specify all files with single letter extensions.

For example, the following command finds all occurrences of the string PIX_LEN anywhere in your source:

grep "PIX_LEN" *.?
Passing special characters (the \ character)
If you want to search for special characters (the ones which GREP uses, or which DOS does not want to pass) you can specify them in the regular expression by putting \ before them.

For example, to search for the string "HELP" (where the quotes are part of the search), you could use the command:

grep "\"HELP\"" *.c
The \ characters will prevent DOS from removing the extra quotes and will tell GREP to treat them as quote characters, not string delimiters.

Matching on either case (the -i option)
Usually, when looking up something in the code, you will want to ignore the case of the string (that is, you will want to match, even if you did not get the capitalization correct). To tell GREP to ignore case, use the -i switch.

For example,

grep -i "reset" *.c
will find all occurrences of reset in any C file, even if the r is capitalized.

Getting the line numbers for the match (the -n option)
If you are trying to fix code, you may want the line numbers for the lines which match the expression. GREP provides the -n option to give the line numbers at the start of the line.

For example:

grep -ni "reset" *.c
will find all occurrences of reset, independent of capitalization, and report the line numbers. This gives you a list of places you might want to check.

Searching for 2 words on the same line ( "word1.*word2" )
To search for 2 words on the same line, where you know the order in which they will appear, but not what will appear between them, you can use the command:

grep -in "word1.*word2" *.c
will find all lines containing word1 followed by any number of any characters, followed by word2. The sequence .* tells GREP to match any character (.) any number of times (*).

Note: Using * tells GREP that it can match . (any character) zero or more times. If you want to specify one or more times, use + instead.

Searching for assignments to a variable ("var *=[^=]")
To get GREP to search for all assignments to a variable called var, you can use the command:

grep "var *=[^=]" *.c
This tells GREP to match any line containing var, followed by any number of spaces ( *), followed by an equals sign, followed by any character which is not an equals sign ([^=]) (to eliminate the == comparison).

This use of the character set ([]) and its "not a member" option (the ^ as the first character in the set) can be used for several other complicated searches.

Searching for setting a specific value (" =[^=] *value")
To get GREP to search for all assignments which assign a specific value to any variable (for example, for looking for changes to a specific state), you can use the command:

grep " =[^=] *value" *.c
This tells GREP to match any occurrence of =, followed by something which is not an equals sign, followed by any number of spaces, then followed by the value of interest.

Note that the use of =[^=] to skip over equality comparisons only works because of our coding standard which surrounds equals signs with spaces. That allows GREP to find the 'non-equal-sign' character before finding the value.

Find all conditionals relating to TIME ("#if.*_TIME[^a-zA-Z_]")
You can get GREP to find all conditionals based on symbols ending in _TIME by specifying:

"#if.*_TIME[^a-zA-Z_]"
as the regular expression.

This tells GREP to match any line which contains #if, followed by any number of any character, followed by _TIME, followed by anything but a letter or the underscore character.

This should find any conditional compile which uses any symbol ending in _TIME, including those with trailing comments or with multiple symbols in the condition.

Find something, excluding marked lines from the search
If you mark your source code with initial characters to indicate sections which have been iff'd out, then you can restrict the search for grep to eliminate these lines by adding a check to the beginning of the search string. The following example assumes that lines to be ignored are marked with either * or > in the first column.

Here you tell grep to find any line whose first character is not * or >, followed by any number (including zero) of any character, followed by the string searching for.

grep -n "^[^*>].*searching for" *.?
If you want to restrict the search even further, to eliminate ASSERT lines, and if you know that none of the valid lines will have A as the first real character, then you can build a more exacting match by telling grep to require spaces, followed by something which is neither a space nor an A.

Note that you need to tell grep that you you want spaces followed by something which is not a space, or it will match anything since it can use the last space to match against the "not A" requirement.

grep -n "^[^*>] *[^ A].*searching for" *.?

Following are some bunch of commands that might be useful if you want to find files in unix/linux.

Tuesday, October 29, 2013