Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
NESUG 2008 Programming Beyond the Basics Extending the Power of SAS® with Java™ Kevin Harper, Matrix Consulting Steve Schwartz, Prudential Financial Joseph Belice, Prudential Financial ABSTRACT Combining SAS and Java can solve many difficult problems, however, combining SAS and Java is not a difficult problem. We need to understand the situations that will benefit from this combination, the bridging mechanism, and a style of Java programming that is easily supported by SAS programmers. We will use the SAS Macro Facility to allow our SAS programmers to use Java classes. INTRODUCTION Many problems can be effectively solved using SAS software; many other problems cannot or should not be solved using SAS software. Why would we choose not to use our trusted, tested and verified SAS software? First, there are problems that cannot be solved using SAS, such as reading or writing Excel (xls) files on Unix. In other cases, we might not have the budget to purchase additional SAS modules to solve a problem. Our installation might not have SAS graph, or SAS Intrnet, or even SAS Access, in fact our installation might be limited to only Base and Stat. We might desire a solution that could be used with software from several vendors. Writing correctly designed Java code will provide multi-platform, multi-vendor solutions that can be supported by your existing SAS team. Using SAS and Java Together (Plain Vanilla) Before we get into the nuts and bolts of bridging SAS and Java we need to dispel some myths: 1. When talking about Java we are dealing exclusively with web based applications 2. I need to have SAS Connect, SAS Share, SAS IOM, SAS WebAF, SAS Intrnet to use Java with SAS 3. I need to have a strong object-oriented background to understand Java 4. I need special IDE’s to write Java None of these mantras could be further from the truth. Java is a programming language that is widely used for both web based and non-web based applications. Additional SAS products might not be helpful, and in many cases slow development if your team does not have expertise in these specific SAS modules. Although there are many Java IDE’s they are not necessary and might only add confusion as you begin your Java journey. SAS Macro Structure We will use the SAS Macro Facility to bridge Java and SAS. Let’s consider the standard SAS procedure layout. The proc call includes the proc name with a set of predefined options. Since that format is successful and familiar we will use it. Let’s write a macro call that will create an Odometer graph for a typical dashboard application (left side). Figure 1. Example Graphic 1 NESUG 2008 Programming Beyond the Basics Example 1 – Odometer Graphic First, the macro call to produce an odometer should look something like this: %odometer(outfile=odometer.png, percent=30, title=Overall Response, titlefont=helvetica, titlefontsize=24, titlefontstyle=bold italic, titlefontcolor=red, backgroundcolor=#ffffff, tickcolor=#000000, sidecolor=#00cccc, pointercolor=black, numbercolor=#ff0000, radius=300); Figure 2. Macro Call It seems reasonable and straightforward to define these attributes. We should probably define the macro itself with defaults in some fields in case our user doesn’t want to specify every attribute with every call. Let’s use this as the macro invocation: %macro odometer(outfile=?, percent=?, TiTle=?, titlefont=helvetica, titlefontsize=24, titlefontstyle=bold italic, titlefontcolor=red, backgroundcolor=#ffffff, tickcolor=#000000, sidecolor=#dbeaf5, pointercolor=black, numbercolor=#ff0000, radius=300); Figure 3. Macro Definition Now, let’s add some edit checks: **************************************************************************************; **** Check and set variables ****; **************************************************************************************; %if &title = ? %then %let title =; %let dot = 0; %let dot = %index(&outfile,.); %if &dot ne 0 %then %let graphtype = %substr(&outfile, &dot + 1 ); %else %if &dot eq 0 and &outfile ne ? %then %let graphtype = png; %if &outfile = ? %then %do; %let outfile = png.png; %let graphtype=png; %end; %if &percent=? %then %do; %put No Percent Was Specified for Odometer ; %abort ; %end; Figure 4. Macro Edit Check 2 NESUG 2008 Programming Beyond the Basics Finally, we’ll write out the parameters to a configuration file and call our java program to build the graph. **************************************************************************************; **** Write template file that holds graph specifications ****; **************************************************************************************; %cwd ; data _null_; file "template.odometer"; if _n_ = 1 then do; put "outfile:&outfile"; put "percent:&percent"; put "TiTle:&title"; put "titlefont:&titlefont"; put "titlefontsize:&titlefontsize"; put "titlefontstyle:&titlefontstyle"; put "titlefontcolor:&titlefontcolor"; put "backgroundcolor:&backgroundcolor"; put "tickcolor:&tickcolor"; put "sidecolor:&sidecolor"; put "pointercolor:&pointercolor"; put "numbercolor:&numbercolor"; put "radius:&radius"; end; else stop; run; **************************************************************************************; **** Call java class and pass the template created above ****; **************************************************************************************; x "java -classpath /home/x140418/graphics odometer &cwd./template"; run; %mend odometer; Figure 5. Write Configuration File and call Java Let’s clear up what the X-Command in figure 5 is doing. When we successfully compile a java program a java class is created. To run the compiled java class you use the java command (1), use the classpath keywork (2) to specify the directory that stores the class (3) and the name of the class that is being executed (4). In our case, we are passing one parameter to the odometer class that is the odometer configuration file (5) . (1) (2) (3) (4) (5) x "java -classpath /home/x140418/graphics odometer &cwd./template"; We are writing the configuration file (template.odometer) to the directory from which the SAS code is being run. The macro variable &cwd is being created in the following macro: %macro cwd; filename here pipe "pwd"; data _null_; if _n_ = 1 then do; length here $500. ; infile here lrecl=500 pad ; input here; call symput('cwd',trim(left(here)) ); end; else stop; run; %put current directory is: &cwd ; run; %mend cwd; Figure 6. Determine working directory 3 NESUG 2008 Programming Beyond the Basics Bridging Macro Summary 1. Determine what parameters will be necessary to the process. Simply think about the parameters you would send to an existing SAS proc to specify your results. 2. Build the code to do edit checks and output the parameters to a configuration file that will be read by the Java program 3. Call the Java program using an X-Command Java Program There are many ways to write Java programs. The style we will use for this paper is one that is easily supported by SAS programmers. The Java program will resemble a traditional third generation modular program. Using this approach will moderate the learning curve and allow a SAS team to support the full application. A few rules: Java variables must be declared prior to their use. The Java class name (program name) and filename must be the same. In our example the class is odometer, so the filename must be odometer.java. The entry point to a Java program is: public static void main(String args[]) Okay, let’s dump the program (with line numbers added for annotation ) and see what’s going on. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 import import import import import import import import import import import import import java.awt.*; java.awt.image.*; java.io.*; javax.imageio.*; java.awt.geom.*; java.awt.Dimension; java.awt.Color; java.awt.Graphics; java.awt.font.*; java.lang.String; java.lang.Math; java.text.DecimalFormat; java.awt.FontMetrics; /* Invoked by the command: java odometer test (where test is test.odometer the config file ) */ public class odometer { public static int[] conv_rgb_num(String colorstr){ int[] rgb = new int[3]; String s1,s2,s3; int holdint; if (colorstr.indexOf("black") > -1){ rgb[0] = 0; rgb[1] = 0; rgb[2] = 0; } else if (colorstr.indexOf("red") > -1){ rgb[0] = 255; rgb[1] = 0; rgb[2] = 0; } else if (colorstr.indexOf("blue") > -1){ rgb[0] = 0; rgb[1] = 0; rgb[2] = 255; } else if (colorstr.indexOf("cyan") > -1){ rgb[0] = 0; rgb[1] = 255; rgb[2] = 255; } else if (colorstr.indexOf("darkgray") > -1){ rgb[0] = 80; rgb[1] = 80; rgb[2] = 80; } else if (colorstr.indexOf("gray") > -1){ rgb[0] = 128; rgb[1] = 128; rgb[2] = 128; } else if (colorstr.indexOf("magenta") > -1){ rgb[0] = 255; rgb[1] = 0; rgb[2] = 255; } else if (colorstr.indexOf("lightgray") > -1){ rgb[0] = 192; rgb[1] = 192; rgb[2] = 192; } 4 NESUG 2008 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 Programming Beyond the Basics else if (colorstr.indexOf("green") > -1){ rgb[0] = 0; rgb[1] = 255; rgb[2] = 0; } else if (colorstr.indexOf("orange") > -1){ rgb[0] = 255; rgb[1] = 200; rgb[2] = 0; } else if (colorstr.indexOf("pink") > -1){ rgb[0] = 255; rgb[1] = 175; rgb[2] = 175; } else if (colorstr.indexOf("white") > -1){ rgb[0] = 255; rgb[1] = 255; rgb[2] = 255; } else if (colorstr.indexOf("yellow") > -1){ rgb[0] = 255; rgb[1] = 255; rgb[2] = 0; } else if(colorstr.indexOf("#") > -1){ //need 3 pairs of dec/hex (eg 255 255 255) numbers – //use substring to pull the pair, then parseInt to base 10 s1 = colorstr.substring(1,3); s2 = colorstr.substring(3,5); s3 = colorstr.substring(5,7); rgb[0] = Integer.parseInt(s1,16); rgb[1] = Integer.parseInt(s2,16); rgb[2] = Integer.parseInt(s3,16); } return rgb; } /* End conv_rgb_num */ public static void main(String[] args) throws IOException { /*************************************************************************************/ /**** Pull Config filename from args, then read and parse the file setting all ****/ /**** attributes passed in the file ****/ /*************************************************************************************/ /*input file name passed as parameter */ String fn = args[0] + ".odometer"; /* Initialize potential configuration parameters */ String outfile = "odometeroutput.png"; /*initalize output filename */ String str = " "; String backgroundcolor="#ffffff"; /*initialize background color to white*/ String sidecolor="#dbeaf5"; /*initialize side color */ String pointercolor="#000000"; /*initialize pointercolor to black */ String numbercolor="#000000"; /*initialize numbercolor to black */ String tickcolor="#000000"; /*initialize tickcolor to black */ String graphtype="png" ; /*initialize graph type to png */ int radius = 200; /*initialize radius of odometer */ double percent = 0.0; /*initialize percent to zero */ String title = " "; /*initialize title */ String titlefont = "helvetica"; int titlefontsize = 18; int titlefontstyle = 3; String titlefontcolor="#000000"; int i ; int xsize, ysize; double dxsize, dysize; int[] colorarray = new int[3] ; try { /*************************************************************************************/ /**** Read any config parameters ****/ /**** If the config file does not exist then trap the error ****/ 5 NESUG 2008 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 Programming Beyond the Basics /*************************************************************************************/ BufferedReader in = new BufferedReader(new FileReader(fn)); int p,q,l ; char ch; String uhold, cstr; while ((str = in.readLine()) != null) { uhold = str.toUpperCase(); if (uhold.startsWith("OUTFILE:")){ p = uhold.indexOf("OUTFILE:") + 8; outfile = str.substring(p); } else if (uhold.startsWith("SIDECOLOR:")){ p = uhold.indexOf("SIDECOLOR:") + 10; sidecolor = str.substring(p); } else if (uhold.startsWith("TICKCOLOR:")){ p = uhold.indexOf("TICKCOLOR:") + 10; tickcolor = str.substring(p); } else if (uhold.startsWith("POINTERCOLOR:")){ p = uhold.indexOf("POINTERCOLOR:") + 13; pointercolor = str.substring(p); } else if (uhold.startsWith("NUMBERCOLOR:")){ p = uhold.indexOf("NUMBERCOLOR:") + 12; numbercolor = str.substring(p); } else if (uhold.startsWith("BACKGROUNDCOLOR:")){ p = uhold.indexOf("BACKGROUNDCOLOR:") + 16; backgroundcolor = str.substring(p); } else if (uhold.startsWith("TITLE:")){ p = uhold.indexOf("TITLE:") + 6; title = str.substring(p); } else if (uhold.startsWith("TITLEFONT:")){ p = uhold.indexOf("TITLEFONT:") + 10; titlefont = str.substring(p); } else if (uhold.startsWith("TITLEFONTCOLOR:")){ p = uhold.indexOf("TITLEFONTCOLOR:") + 15; titlefontcolor = str.substring(p); } else if (uhold.startsWith("TITLEFONTSIZE:")){ p = uhold.indexOf("TITLEFONTSIZE:") + 14; cstr = str.substring(p); titlefontsize = Integer.parseInt(cstr); } else if (uhold.startsWith("TITLEFONTSTYLE:")){ p = uhold.indexOf("TITLEFONTSTYLE:") + 15; cstr = str.substring(p).toUpperCase(); if (cstr.equals("PLAIN")){titlefontstyle = 0;} else if(cstr.equals("BOLD")){titlefontstyle = 1;} else if(cstr.equals("ITALIC")){titlefontstyle = 2;} else if(cstr.equals("BOLD ITALIC")){titlefontstyle = 3;} } else if (uhold.startsWith("GRAPHTYPE:")){ p = uhold.indexOf("GRAPHTYPE:") + 10; graphtype = str.substring(p); } else if (uhold.startsWith("RADIUS:")){ p = uhold.indexOf("RADIUS:") + 7; cstr = str.substring(p); radius = Integer.parseInt(cstr); } else if (uhold.startsWith("PERCENT:")){ p = uhold.indexOf("PERCENT:") + 8; cstr = str.substring(p); 6 NESUG 2008 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 263 264 265 266 267 268 269 270 271 272 273 274 Programming Beyond the Basics try{ percent = Double.valueOf(cstr.trim()).doubleValue(); } catch (NumberFormatException nfe) { //Do not abend program with bad numeric data - set to zero //System.out.println("NumberFormatException: " + nfe.getMessage()); percent = 0.0; } } } /* end of while */ in.close(); /*close file buffer */ } /* end of try - read configuration file */ catch (IOException e) {System.out.println("Input CFG file not found"); } /*end catch*/ /************** Instantiate Graphics2D object **************************************/ xsize = radius + (int)(radius * .22); ysize = radius + (int)(radius * .17); dxsize = (double) xsize; dysize = (double) ysize; double pointerlength = 0.9 * (radius / 2); RenderedImage myimg = new BufferedImage(xsize,ysize,BufferedImage.TYPE_INT_RGB); Graphics2D g = ((BufferedImage)myimg).createGraphics(); /************** Background Color ***************************************************/ colorarray = conv_rgb_num(backgroundcolor); g.setPaint(new Color(colorarray[0],colorarray[1],colorarray[2])); g.fill(new Rectangle2D.Double(0.0,0.0,dxsize,dysize)); int x = 0; /*define left corner x */ int y = 0; /*define left corner y */ Stroke drawingstroke = new BasicStroke(3); g.setStroke(drawingstroke); int[] colors = new int[3]; /************** Set and fill the Side Color ****************************************/ colors = conv_rgb_num(sidecolor); g.setPaint(new Color(colors[0],colors[1],colors[2])); g.fillArc(x,y,(int) (radius + (radius * .15)) ,(int) radius ,269,182); /************** Set and fill the Odometer Face***************************************/ g.setPaint(new Color(255,255,255)); g.fill(new Ellipse2D.Double(x, y, (int) radius , (int) radius)); /************** Draw the Odometer Face Circle ***************************************/ g.setPaint(new Color(colors[0],colors[1],colors[2])); double x2,y2; g.draw(new Ellipse2D.Double(x, y,(int) radius, (int) radius)); /************** Fill in the Odometer Face ***************************************/ double angle, newangle, newradian; double xstart, xstop, ystart, ystop; /************** Draw Tick Marks ***************************************/ Stroke tickstroke = new BasicStroke(2); g.setStroke(tickstroke); double f1,f2,r2; r2 = radius / 2; for(i=1;i<100;i++){ colors = conv_rgb_num(tickcolor); g.setPaint(new Color(colors[0],colors[1],colors[2])); angle = i * 360.0 / 100 ; newangle = 270 - angle; newradian = newangle * Math.PI/180.0; x2 = Math.abs(Math.cos(newradian) ) ; y2 = Math.abs(Math.sin(newradian) ) ; 7 NESUG 2008 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 Programming Beyond the Basics if ((i <= 25) && (i >= 0)){x2 = Math.abs(x2) * -1; y2 = Math.abs(y2) * 1; } if ((i <= 50) && (i >= 25)){x2 = Math.abs(x2) * -1; y2 = Math.abs(y2) * -1; } if ((i <= 75) && (i >= 50)){x2 = Math.abs(x2) * 1; y2 = Math.abs(y2) * -1; } if ((i <= 100) && (i >= 75)){x2 = Math.abs(x2) * 1; y2 = Math.abs(y2) * 1; } if (i % 10 > 0){ /*************** Minor Tick Marks ************************************/ f1 = r2 - 5; f2 = r2 ; xstart = (f1 * x2) + r2; xstop = (f2 * x2) + r2; ystart = (f1 * y2) + r2; ystop = (f2 * y2) + r2; g.draw(new Line2D.Double(xstart,ystart,xstop,ystop)); } else { /*************** Major Tick Marks *************************************/ f1 = r2 - 15; f2 = r2 ; xstart = (f1 * x2) + r2; xstop = (f2 * x2) + r2; ystart = (f1 * y2) + r2; ystop = (f2 * y2) + r2; g.draw(new Line2D.Double(xstart,ystart,xstop,ystop)); /*************** Major Tick Mark Labels **************************************/ int addonx = (int) r2; int addony = (int) r2; if (i > 50){addonx = (int) r2 - 10;} xstart = ( (r2 - 25) * x2) + addonx; ystart = ( (r2 - 25) * y2) + addony; String istring = new Integer(i).toString(); colorarray = conv_rgb_num(numbercolor); g.setPaint(new Color(colorarray[0],colorarray[1],colorarray[2])); g.drawString(istring, (int) (xstart) ,(int)(ystart) ); } } /*end for loop */ /*************** Odometer Label ***************************************/ Font titlefontw = new Font(titlefont,titlefontstyle,titlefontsize); int sw; /* String Width */ double titlepos_x, titlepos_y; g.setFont(titlefontw); FontMetrics title_metrics = g.getFontMetrics(titlefontw); colorarray = conv_rgb_num(titlefontcolor); g.setPaint(new Color(colorarray[0],colorarray[1],colorarray[2])); sw = title_metrics.stringWidth(title); titlepos_x = .5 * (dxsize ) - .5 * sw -(.1 * radius) ; if ( (percent>=30) && (percent <=70) ){ titlepos_y = .6 * radius; } else{ titlepos_y = .4 * radius; } g.drawString(title, (int) (titlepos_x) ,(int) (titlepos_y) ); /************** Compute Pointer ***************************************/ angle = percent * 360 / 100; newangle = 270 - angle; newradian = newangle * Math.PI/180.0; x2 = Math.abs(Math.cos(newradian) ) ; y2 = Math.abs(Math.sin(newradian) ) ; if ((percent <= 25) && (percent >= 0)){x2 = Math.abs(x2) * -1; y2 = Math.abs(y2) * 1; } if ((percent <= 50) && (percent >= 25)){x2 = Math.abs(x2) * -1; y2 = Math.abs(y2) * -1; } if ((percent <= 75) && (percent >= 50)){x2 = Math.abs(x2) * 1; y2 = Math.abs(y2) * -1; } if ((percent <= 100) && (percent >= 75)){x2 = Math.abs(x2) * 1; y2 = Math.abs(y2) * 1; } x2 = (pointerlength * x2) + r2; y2 = (pointerlength * y2) + r2; colors = conv_rgb_num(pointercolor); 8 NESUG 2008 350 351 352 353 354 355 356 357 358 359 360 361 Programming Beyond the Basics g.setPaint(new Color(colors[0],colors[1],colors[2])); Stroke pointerstroke = new BasicStroke(4); g.setStroke(pointerstroke); g.draw(new Line2D.Double(r2, r2, x2, y2)); /************** Write Output Graphics File ImageIO.write(myimg,graphtype,new File(outfile)); ***************************************/ } /* end of main */ } /* end of class */ Figure 7. Java Program Let’s see how this program fits together. The purpose of this program is to read the configuration file created by our SAS macro, %odometer. We will create the odometer graphics file by reading the odometer’s attributes and using the capabilities of Java to create our graphics file. Line Numbers 19 1 - 13 21 - 79 83 - 358 26 71 91-108 120 and 215 121 131 - 209 133 - 204 219 - 225 229 – 231 Purpose Class name -- the filename must correspond to this (odometer.java) Including external Java code necessary for building graphs. The complete reference for these is located at http://java.sun.com/j2se/1.4.2/docs/api/overview-summary.html Method conv_rgb_num – Java Methods are equivalent to functions in other languages. In this case, the method takes a string that contains color information and returns an integer array that is used by Java to define fill colors. Information on web colors can be found at: http://www.w3.org/TR/html4/types.html#h-6.5 In Java, every program must have a main method (this is the entry point for the program). The format is always: public static void main(String[] args) The args array will contain all arguments passed to the program. In our case we are passing the configuration file name. Comments on the Code if (colorstr.indexOf("black") > -1) is equivalent to if index(colorstr,”black”) in SAS rgb[0] = Integer.parseInt(s1,16); places a dec/hex integer in the first element of the rgb integer array Variable initialization Try / Catch sequence for reading the attributes file. The program shuts down gracefully if odometer.template is missing Declare file buffer for reading input file Loop that reads each line from the input file and processes Parsing Code that reads the odometer’s attributes Compute graph size and initialize graph object Typical graphics sequence 229 - colorarray = conv_rgb_num(backgroundcolor); - get 3 element integer array for color 230 - g.setPaint(new Color(colorarray[0],colorarray[1],colorarray[2])); -set current color 231 - g.fill(new Rectangle2D.Double(0.0,0.0,dxsize,dysize)); - fill a rectangle with the current color. The rectangles dimensions are defined by (0,0) Upper Left (dxsize, dysize) lower right 228 – 353 356 These lines build the various shapes and text that will be displayed on the odometer Write the graphics image to the output file. The supported file types are jpg and png, however adding the Java Advanced Imaging Image I/O will add: gif, bmp, accelerated jpg, accelerated png, pnm, tiff, wbmp. See: http://java.sun.com/developer/technicalArticles/Media/AdvancedImage/ ImageIO.write(myimg,graphtype,new File(outfile)); Summary of this Java Code: The program flow in this Java code is the same as SAS code. The function (method) calls are slightly different, but are essentially the same functions (indexOF vs index, substring vs substr). File handling in Java requires that you define a buffer then write( read) to it. All in all, with a little effort, a SAS programmer should be able to understand this code. 9 NESUG 2008 Programming Beyond the Basics What about using Java Code (Libraries) found on the Web? Example 2 – Reading MS Excel on a Unix Platform There are many free, quality java libraries that can provide additional functionality to SAS. Let’s use the task of reading Excel files on Unix as an example. The workbooks could be converted to csv or XML files and SAS/Unix would handle them fine. But, if you have an upload servlet for users to post information you will probably receive a fair share of .xls files. The nicest way to handle these files is by using a Java library from the Apache POI Project (http://poi.apache.org/). First, download the Java library from Apache then unzip and install on your Unix machine (note: this package lets you read most Microsoft formats (.doc, ppt, xls ) under your SAS macro directory. You will be adding the Java programs you write to this directory also. A note on directory structure and Java A typical import line in Java code will look like: import org.apache.poi.poifs.filesystem.POIFSFileSystem; This defines a directory path. On Unix you would need the path org/apache/pos/poifs/filesystem underneath your application and POIFSFileSystem.class stored in that directory. Okay, let’s follow the same process we used in the Odometer Example. First we’ll define what our macro call should look like. Something like: %read_xls_named_sheet(xlsfile=/miswork/workarea/znavird/excel/qa2.xls, dsout=test2, sheet=Sheet2 getnames=yes, cleanup=no ); Figure 8. Macro Call should work. The macro requires the following: xlsfile – incoming workbook sheet – name of worksheet to be read getnames – like proc import – will the first row contain the variable names cleanup – will the macro remove the work files What will this macro do to get Excel data into SAS? Let’s read the .xls file with java, dump that to a temporary .csv file, then use Proc Import to pull it into SAS. Here’s how the macro will look: %macro read_xls_named_sheet(xlsfile=?, dsout=?, sheet=?, getnames=y, cleanup=y ); %if &sysenv ne BACK %then %abort ; *** This is non-interactive and should be run –noterminal ***; x "java -classpath /miswork/workarea/znavird/excel dump_excel_to_csv_named_sheet &xlsfile &sheet"; run; data _null_ ; if _n_ = 1 then do; str = "&xlsfile"; fn = scan(str,-1,"/"); p = index(fn,'.'); fn1 = substr(fn,1,(p-1)); fn1 = compress('/tmp/' || fn1 || '.csv'); call symput('fn',trim(left(fn1))); end; else stop; run; %let dsout = &dsout. ; %let getnames = &getnames. ; %if %upcase(%substr(&getnames,1,1)) ne N %then %let getnames = yes; %else %let getnames = no; proc import datafile= "&fn" out= &dsout 10 NESUG 2008 Programming Beyond the Basics dbms=dlm replace; guessingrows = 32767; delimiter='~'; getnames= &getnames ; run; %if %upcase(%substr(&cleanup,1,1)) ne N %then %do; x "/usr/bin/rm -f &fn "; run; %end; run; %mend read_xls_named_sheet; Figure 9. Macro Let’s look at the macro flow. 1. The Java program dumps the .xls file to csv format 2. We parse the names from their macro variables 3. Use proc import to read the csv file (in this case it is really a tilde (~) delimited file) The last item to look at is the Java program. 1 2 3 4 5 6 7 8 9 import import import import import import import import import java.io.*; java.io.IOException; java.io.InputStream; java.util.Iterator; org.apache.poi.poifs.filesystem.POIFSFileSystem; org.apache.poi.hssf.usermodel.HSSFCell; org.apache.poi.hssf.usermodel.HSSFSheet; org.apache.poi.hssf.usermodel.HSSFWorkbook; org.apache.poi.hssf.usermodel.HSSFRow; /** Class: dump_excel_to_csv_named_sheet Written by: Kevin Harper Purpose: This class takes an excel workbook and dumps the named spreadsheet into a csv file */ 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 public class dump_excel_to_csv_named_sheet { public static void main( String [] args ) throws IOException { String fn = args[0] ; /*input file name passed as parameter */ String sheetname = args[1]; /*sheet name is second parameter */ String outfile ; FileOutputStream out1; PrintStream p; String outname; int maxrows, maxcols, beginrow, begincol, endrow, endcol; int worksheet; maxrows = 0; maxcols = 0; worksheet = 0; String xx; int position = args[0].lastIndexOf("/"); String str1 = args[0].substring( (position + 1) ); if (str1.indexOf(".") > 0){ position = str1.indexOf("."); str1 = str1.substring(0,position); } outfile = "/tmp/" + str1 + ".csv"; out1 = new FileOutputStream(outfile); p = new PrintStream( out1 ); try { InputStream input = new FileInputStream( fn ); POIFSFileSystem fs = new POIFSFileSystem( input ); HSSFWorkbook wb = new HSSFWorkbook(fs); HSSFSheet sheet = wb.getSheet(sheetname); 11 NESUG 2008 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 } Programming Beyond the Basics // Iterate over each row in the sheet Iterator rows = sheet.rowIterator(); while( rows.hasNext() ) { HSSFRow row = (HSSFRow) rows.next(); Iterator cells = row.cellIterator(); while( cells.hasNext() ) { HSSFCell cell = (HSSFCell) cells.next(); switch ( cell.getCellType() ) { case HSSFCell.CELL_TYPE_NUMERIC: if ( cells.hasNext() ){ p.print( Double.toString(cell.getNumericCellValue()) + "~" ); } else { p.print( Double.toString(cell.getNumericCellValue() ) ); } break; case HSSFCell.CELL_TYPE_STRING: if ( cells.hasNext() ){ xx = cell.getStringCellValue().replaceAll(",","|") ; p.print( xx + "~" ); } else { xx = cell.getStringCellValue().replaceAll(",","|") ; p.print( xx ); } break; default: if ( cells.hasNext() ){ p.print( " " + "~" ); } else { p.print( " " ); } break; } } p.println(); } p.close(); out1.close(); } catch ( IOException ex ) { ex.printStackTrace(); } } Figure 10. Java Code to use POI Libraries Example 3 – Using JDBC to connect to a SQL-Server Database First, obtain the tools essential to this project by downloading and installing the Microsoft SQL Server JDBC Driver (http://msdn.microsoft.com/en-us/data/aa937724.aspx). This driver comes with extensive documentation that covers installation, use and examples. 12 NESUG 2008 Programming Beyond the Basics Using our top-down approach let’s consider the SAS macro call. For selecting data we will need a structure like: %macro sql(dbname=?, server=?, port=?, userid=?, passwd=?, outfile=%str(/tmp/sql_results.csv), limit=999999, ds=?, sql=? ); Figure 11. Macro parameters for sql select The macro call includes the following parameters: 1. Database Name 2. User ID 3. Password 4. Output CSV file – this will have a default value for our project 5. Limit number of rows returned from query 6. SAS Dataset Name 7. SQL Clause %sql(dbname=lusdbx, server=njros1bxx0060, port=1433, userid=mkxxxxx, passwd=mktxxxr, limit=25, ds=test, sql=%nrstr(SELECT product_code, count(product_code) as count1 from AONDB group by product_code) ); run; Figure 12. Example Macro Invocation The SAS macro is very similar to the Excel example above. It takes parameters, produces a template file, calls the java code, then reads the results into a SAS dataset. The complete macro is listed below: %macro sql(dbname=?, server=?, port=1433, userid=?, passwd=?, outfile=%str(/tmp/sql_results.csv), limit=999999, ds=?, sql=? ); proc optsave out = curroptions; /* Store the current options before modifying them run; options nonotes nomprint nomlogic nosymbolgen ; run; %if &dbname=? %then %do; %put No Database Name Defined ; %abort; %end; %if &server=? %then %do; %put No Server Name Defined ; %abort; %end; %if &userid=? %then %do; %put No USERID Defined ; %abort; %end; %if &passwd=? %then %do; 13 */ NESUG 2008 Programming Beyond the Basics %put No PASSWD Defined; %abort; %end; %if &outfile=? %then %do; %put No Output File Defined; %abort; %end; %if %length(&sql) < 2 %then %do; %put No SQL Defined; %abort; %end; data _null_; /* Compute a file suffix so sessions will not bump into each other if _n_ = 1 then do until (ok = 1); suffix = int( ranuni(0) * 100000); fn = compress("/tmp/sql_results" || suffix || ".csv"); ok = not fileexist( fn ); if ok then call symput('suffix', trim(left(suffix)) ); end; else stop; run; */ data _null_; length sql_statement $1000. ; file "/tmp/sql&suffix..sql"; if _n_ = 1 then do; outfile = compress("/tmp/sql_results" || "&suffix" || ".csv",' '); importfile = compress("/tmp/import" || "&suffix" || ".sas",' '); call symput('importfile',trim(left(importfile)) ); call symput('outfile',trim(left(outfile)) ); sql_statement = symget('sql'); sql_statement = tranwrd(sql_statement,'"',"'"); put "server:&server"; put "port:&port"; put "dbname:&dbname"; put "userid:&userid"; put "passwd:&passwd"; outstr = compress("outfile:"||outfile); put outstr; put "limit:&limit"; put "sql:"; put sql_statement ; end; else stop; run; x "java -classpath /miswork/workarea/znavird/sql_server/sqljdbc_1.2/enu:/miswork/workarea/znavird/sql_server/sqljdbc_1.2 /enu/sqljdbc.jar process_sql /tmp/sql"; run; %if &ds ne ? %then %do; %include "&importfile"; %end; run; /* Pull in the SAS import file built in the Java Program */ x "rm -f &outfile."; run; x "rm -f /tmp/sql&suffix..sql"; run; x "rm -f &importfile"; run; proc optload data = curroptions; run; /* Restore the options to the original values %mend sql; Figure 13. Macro that calls JDBC Driver Class Let’s take a look at the Java code (with line numbers added ): 1 2 import java.io.*; import java.sql.*; 3 public class process_sql { 4 public static void main(java.lang.String[] args) { 14 */ NESUG 2008 Programming Beyond the Basics 5 6 7 8 9 10 11 12 13 14 15 String sqlstr = " "; String str = " "; boolean insql = false; String outfile = " "; String limitc = "0"; int limit = 0; String uhold = " "; int pl = 0; String dbname = " "; String userid = " "; String passwd = " "; String server = “ “; String port = “ “; String suffix = " "; String dataset = " "; 16 17 try { /* Pull External Variables */ String fn = args[0] + ".sql"; suffix = args[1] ; dataset = args[2]; BufferedReader in = new BufferedReader(new FileReader(fn)); while ((str = in.readLine()) != null) { uhold = str.toUpperCase(); 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 if (uhold.startsWith("OUTFILE:")){ pl = uhold.indexOf("OUTFILE:") + 8; outfile = str.substring(pl); } else if (uhold.startsWith("LIMIT:")){ pl = uhold.indexOf("LIMIT:") + 6; limitc = str.substring(pl); limit=Integer.parseInt(limitc); } else if (uhold.startsWith("DBNAME:")){ pl = uhold.indexOf("DBNAME:") + 7; dbname = str.substring(pl); } else if (uhold.startsWith("USERID:")){ pl = uhold.indexOf("USERID:") + 7; userid = str.substring(pl); } else if (uhold.startsWith("PASSWD:")){ pl = uhold.indexOf("PASSWD:") + 7; passwd = str.substring(pl); } else if (uhold.startsWith("SERVER:")){ pl = uhold.indexOf("SERVER:") + 7; server = str.substring(pl); } else if (uhold.startsWith("PORT:")){ pl = uhold.indexOf("PORT:") + 5; port = str.substring(pl); } else if (uhold.startsWith("SQL:")){ insql = true; } else sqlstr = sqlstr + " " + str ; } /* end reading the while */ } /* end try reading input file */ catch (IOException e) { System.out.println("Input CFG file not found"); }/*end catch*/ try { // This is where we load the driver Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver"); } catch (ClassNotFoundException e) { System.out.println("Unable to load Driver Class"); return; } try { String connectionUrl = "jdbc:sqlserver://”+server+”:”+port+”;”+ "databaseName="+dbname+";user="+userid+";password="+passwd+”responseBuffering=adaptive”; //Connection String Connection con = DriverManager.getConnection(connectionUrl); try { //p file is the delimited results file FileOutputStream out1; out1 = new FileOutputStream(outfile); PrintStream p; 15 NESUG 2008 Programming Beyond the Basics 77 p = new PrintStream( out1 ); //s file is the SAS import file that will be included in SAS Macro FileOutputStream out2; String xx2 = "/tmp/import"+suffix+".sas"; out2 = new FileOutputStream(xx2); PrintStream s; s = new PrintStream( out2 ); 78 79 80 81 82 83 Statement stmt = con.createStatement( ); //Generate Result Set and Meta Data for Result Set ResultSet rs = stmt.executeQuery( sqlstr ); ResultSetMetaData rsmd = rs.getMetaData(); 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 int colCount = rsmd.getColumnCount(); String varname = " "; s.println("data " + dataset + ";" ); s.println("infile \"" + outfile + "\" delimiter='~' firstobs = 2 dsd missover lrecl=32767;"); for( int i = 1; i <= colCount; i++ ) { s.print( "informat " + rsmd.getColumnName( i )); if (rsmd.getColumnType(i) > 2){ s.println( " best32.;"); s.println("format " + rsmd.getColumnName( i ) + " best12. ;" ); } else { s.println(" $"+rsmd.getColumnDisplaySize(i)+". ;"); s.println("format " + rsmd.getColumnName( i ) + “$"+rsmd.getColumnDisplaySize(i)+". ;"); } } s.println("input "); //Print Column Headers for( int i = 1; i <= colCount; i++ ) { s.println(rsmd.getColumnName( i )); if (i != colCount){ p.print(rsmd.getColumnName( i ) + "~" ); } else { p.print(rsmd.getColumnName( i ) ); } } p.println(); int cc = 0; String val = " "; /* Print the result set */ 118 while(rs.next( ) && (cc < limit)) { 119 cc = cc + 1; 120 for( int i = 1; i <= colCount; i++ ) 121 { 122 varname = rsmd.getColumnName( i ); 123 if (i != colCount){ 124 val = rs.getString( varname ); 125 p.print( val + "~" ); 130 } 131 else { 132 val = rs.getString( varname ); 133 p.print( val ); 134 } 135 } /* for */ 136 p.println(); 137 } /* while */ 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 } s.println("; run;"); s.close(); // Close the csv file p.close(); // Make sure our database resources are released rs.close( ); stmt.close( ); con.close( ); } catch ( IOException ex ) { ex.printStackTrace(); } } catch (SQLException se) { // Inform user of any SQL errors System.out.println("SQL Exception: " + se.getMessage( se.printStackTrace(System.out); } } Figure 14. Java Code with JDBC Call 16 )); NESUG 2008 Programming Beyond the Basics Let’s see what this Java code is doing: Line Numbers 1-2 3 4 5 -15 17 19 - 56 61 - 68 69 - 72 74 - 77 78 - 82 83 84 85 86 88 - 103 105 - 114 118 - 137 139 - 141 143 - 145 146 - 152 Purpose Include the Java Classes that are used for I/O and SQL Define the class name “process_sql” (note: the filename should by process_sql.java) Entry point Variable initialization Read Arguments passed to Java Program Read the attributes file created in the SAS macro Load the JDBC Driver Build the Connect String Set-Up the Output File (tilde delimited results file) Set-Up the SAS Import File Create a statement object from the connect string Create a result set from your query Create a meta data set that stores the attributes of your result set Get the number of columns returned in the result set Begin writing the SAS Import file (s.print => SAS File) Print a comma separated list of the column names to the output file (and input statement to SAS file) Print the data lines returned in the result set to the output file (separated by tildes) Close the file buffers Close anything pointing to the database Error Trapping Code This code creates a SAS file from the result set meta data that is %included in the calling macro. This eliminates the need for a PROC IMPORT call and ensures the SAS formats will exactly match the dbms formats Figure 15. Explanation ofJava Code with JDBC Call Example 4 – Setting up a Data Mining Environment The JSR (Java Specification Request) 73 established the JDMAPI (Java Data Mining API). Now, the JEG (Java Expert Group) has grown and is working on JSR 247 for JDMAPI 2. This expert group is made up of players from the major Data Mining houses including Oracle, Sun Microsystems, SAS Institute, etc. (see http://jcp.org/en/jsr/detail?id=247 ) and use their expertise to add functionality to the javax.datamining package. For this example I will use WEKA, which is a well known, open source collection of data mining Java classes. Using this package will minimize development time and ensure reliability. Here are the tasks to get the Data Mining Environment up and running: 1. 2. 3. Download the WEKA software ( http://www.cs.waikato.ac.nz/ml/weka/ ) Run some straight forward command-line tests to confirm that the environment is configured correctly Develop a SAS Data Mining Macro Library WEKA and other data mining products use Attribute-Relation File Format (arff) files for data input. One of the macros in your Data Mining Macro Library will be a routine to convert SAS datasets to .arff files. Other macros in the library will be used to call WEKA classes and save the results back to SAS. The ARFF is easy to understand. The file consists of two parts with the top portion defining the variables and their attributes, and the bottom portion containing the data in csv format. % This is a toy example, the UCI weather dataset. % Any relation to real weather is purely coincidental. Comment lines at the beginning of the dataset should give an indication of its source, context and meaning. @relation golfWeatherMichigan_1988/02/10_14days Here we state the internal name of the dataset. Try to be as comprehensive as possible. @attribute outlook {sunny, overcast rainy} @attribute windy {TRUE, FALSE} Here we define two nominal attributes, outlook and windy. The former has three values: sunny, overcast and rainy; the latter two: TRUE and FALSE. Nominal values with special characters, commas or spaces are enclosed in 'single quotes'. @attribute temperature real @attribute humidity real These lines define two numeric attributes. Instead of real, integer or numeric can also be used. While double floating point values are stored internally, only seven decimal 17 NESUG 2008 Programming Beyond the Basics digits are usually processed. The last attribute is the default target or class variable used for prediction. In our case it is a nominal attribute with two values, making this a binary classification problem. @attribute play {yes, no} @data sunny,FALSE,85,85,no sunny,TRUE,80,90,no overcast,FALSE,83,86,yes rainy,FALSE,70,96,yes rainy,FALSE,68,80,yes Figure 16. Example of .arff from wekadoc The rest of the dataset consists of the token @data, followed by comma-separated values for the attributes -- one line per example. In our case there are fi The arff producing macro should have a format similar to: %ds2arff(ds=test, response=life_cov_amt, outfile=/tmp/test.arff ); (note: ds=input dataset, response=response variable, outfile = output file) Figure 17. Example of .arff macro call The arff macro is quick: %macro ds2arff(ds=?, response=?, outfile=? ); /** Create Name Vector **/ proc contents data = &ds noprint out=names(keep = name); run; /** Pull Name Vector (except response variable) into Macro Variable **/ proc sql noprint; select name into: vlist separated by ' ' from names where (name ne "&response"); quit; run; /** Add response variable to end of macro variable list **/ %let vlist = &vlist &response ; /** Create CSV file from input dataset **/ ods csv file = "/tmp/xx.csv"; proc print data = &ds noobs ; var &vlist ; run; ods csv close; run; /** Run WEKA Routine to create arff from csv **/ x "java -cp /miswork/workarea/znavird/data_mining weka.core.converters.CSVLoader /tmp/xx.csv>/tmp/xx.arff"; run; /** Change any SAS missing values (dots) to question marks **/ x "sed 's/[.],/\?,/g' /tmp/xx.arff>&outfile."; %mend ds2arff; Figure 18. ARFF Macro Using the WEKA product requires no additional Java programming. You simply call the appropriate Java class, redirect the output (with a few enhancements) to the SAS listing file. The following macro show how to do this: %macro ranuni; %global ranuni; %let ranuni = %sysfunc( ranuni(0) ); %mend; %macro j48(arff=?); proc optsave out = curroptions; %*** Save User Options ***; options nomprint nomlogic nosymbolgen nonotes; %*** Run Silently ***; %ranuni; %*** Call Java Process ***; x "java -cp /miswork/workarea/znavird/data_mining weka.classifiers.trees.J48 -t &arff -i > /tmp/j48.output&ranuni"; data _null_; %*** Write Results and Notes to lst ***; length str $100. ; infile "/tmp/j48.output&ranuni" lrecl=100 pad end=last; file print ; input @1 str $100. ; 18 NESUG 2008 Programming Beyond the Basics if _n_ = 1 then do; put "--------------------------------------------------------------------------------------------"; put " J48 Classification Tree "; put " Implemented by WEKA Software "; put "--------------------------------------------------------------------------------------------"; end; put str $100. ; if last then do; put "--------------------------------------------------------------------------------------------"; put " J48 Definitions "; put " TP (True Positive) Proportion of examples which were classified as class x, among all”; put " examples which truly have classx "; put " FP (False Positive) Proportion of examples which were classified as class x, but belong "; put " to a different class"; put " Precision Proportion of the examples which truly have class x among all those “; put " which were classified as class x"; put "Recall = TP "; put "F-Measure 2 * Precision * Recall/(Precision + Recall)"; put ; put "Refer to Data Mining: Practical Machine Learning Tools and Techniques"; put "By: Ian H. Witten and Eibe Frank for additional information "; put ; put "---------------------------------------------------------------------------------------------"; end; run; x "/usr/bin/rm -f /tmp/j48.output&ranuni."; %*** Clean-Up Work File ***; proc optload data = curroptions; %*** Restore User Options ***; run; %mend j48; Figure 19. J48 Macro Call SAS System 09:30 Friday, June 13, 2008 1 -----------------------------------------------------------------------------------------------------J48 Classification Tree Implemented by WEKA Software -----------------------------------------------------------------------------------------------------J48 pruned tree -----------------outlook = overcast: yes (3.0) outlook = rainy: yes (6.0) outlook = sunny: no (6.0) Number of Leaves : 3 Size of the tree : 4 Time taken to build model: 0.07 seconds Time taken to test model on training data: 0.02 seconds === Error on training data === Correctly Classified Instances Incorrectly Classified Instances Kappa statistic Mean absolute error Root mean squared error Relative absolute error Root relative squared error Total Number of Instances 15 0 1 0 0 0 0 15 100 0 % % % % === Detailed Accuracy By Class === TP Rate 1 1 FP Rate 0 0 Precision 1 1 Recall 1 1 F-Measure 1 1 ROC Area Class 1 no 1 yes === Confusion Matrix === a b <-- classified as 6 0 | a = no 0 9 | b = yes Figure 19. Partial Output from SAS Listing 19 NESUG 2008 Programming Beyond the Basics Summary We have worked through four examples that demonstrate how Java can be used to extend the capabilities of SAS. The examples have a consistent structure with SAS handling the execution of Java Classes through an “X” command. In the first example, the execution chain ends with the creation of the Odometer Graph. The second and third examples run the Java execution, then read the results into the SAS execution stream. The final example runs a Java class and returns the output to the SAS Listing file. Any of these examples could be used as part of a web application by calling the initial SAS program through either a Servlet or a Scriptlet in a JSP. Here are a few sources for excellent, free Java Libraries on the web: JfreeChart (graphics library) http://www.jfree.org/jfreechart/download.html POI (reading Microsoft Files) http://poi.apache.org/ FOP (Print Formatter) http://xmlgraphics.apache.org/fop/ Common ReUse Libraries http://commons.apache.org/ Java Math Tools http://www.mathtools.net Data Mining http://www.cs.waikato.ac.nz/~ml/weka/ http://jcp.org/en/jsr/detail?id=247 20