Where are my Viya files?

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

When working with files like SAS programs, images, documents, logs, etc., we are used to accessing them in operating system directories. In Viya, many of these files are not stored on the file-system. In this blog, we will look at where and how files are stored in Viya, and how to manage them.

In Viya, external files are stored in the Infrastructure Data server and accessed using the file micro-service. The file service manages a wide variety of file types including, images, html files, text files, csv files, SAS programs, media files, pdfs, office documents and logs. The file service is not a complete file system, but rather a method of accessing individual files stored in the infrastructure data server identified via their URI (a unique identifier).

Some files may also be accessible in Viya folders. The file service stores resources like images, SAS programs, etc. in the Infrastructure Data server, but they are accessible from Viya folders in the Content Area of Environment Manager or in SAS Drive.

The screenshot below shows the properties of an image file that was uploaded to a Viya folder using SAS Drive. You can see the filename, the URI (including the ID) and the location in the folder structure.

For files that are surfaced in folders, we can manage the files (view, copy, move, delete etc.) using SAS Environment Manager Content area, SAS Drive or the Folders command line interface. To learn more, watch this video from the SAS Technical Insights and Expertise Series.

However, many system generated files are managed by the file service and stored in the infrastructure data server, but are not accessible from the folders interfaces. A good example of system generated files that an administrator may need to manage are logs.

Many times when processing in Viya, logs are created and stored in the infrastructure data server. For example, when CAS table state management jobs, data plans, file imports, model or scoring processes are executed, a log is generated and stored by the file service.
How do we manage these files? While the files generated are mostly small in size, a large active Viya system with a long history will need management of the log and other files stored in the infrastructure data server.

One way we can access these files is using the REST API of the file service.

You can use your favorite tool to access the REST API. In this blog, I will use the GEL pyviyatools, initially the basic callrestapi tool. For more details on the pyviyatools, read this blog or go directly to the SAS GitHub site.

To get a list of all files are stored in the infrastructure data server send a GET request to the /files/files endpoint of the files service.

/./callrestapi.py -m get -e /files/files -o simple/

Partial output shows the image file referenced above and what could be a system generated log file.
=====Item 0 =======
contentType = image/jpeg
createdBy = geladm
creationTimeStamp = 2019-01-24T14:40:15.085Z
description = None
encoding = UTF-8
id = 6d2995b3-0fa6-4090-a338-1fbfe51fb26b
modifiedBy = geladm
modifiedTimeStamp = 2019-01-25T17:59:28.500Z
name = company_logo.JPG
properties = {}
size = 327848
version = 2

=====Item 164 =======
contentType = text/plain
createdBy = sasadm
creationTimeStamp = 2019-01-31T12:35:00.902Z
description = None
encoding = UTF-8
id = e4d4c83d-8677-433a-a8d3-b03bf00a5768
modifiedBy = sasadm
modifiedTimeStamp = 2019-01-31T12:35:09.342Z
name = 2019-01-31T12:35.823Z-results.log
parentUri = /jobExecution/jobs/380d3a0c-1c31-4ada-bf8d-4db66e786669
properties = {}
size = 2377
version = 2

To see the content of a file, use a GET request, the id of the file and the content endpoint. The content of the log file is displayed in the call below.

/./callrestapi.py -m get -e /files/files/01eb020f-468a-49df-a05a-34c6f834bfb6/content/

This will display the file contents. The output shows the log from a CAS table state management job.

———-JOB INFORMATION———-

Job Created: 2019-01-31T12:35.823Z
Job ID: 380d3a0c-1c31-4ada-bf8d-4db66e786669
Heartbeat interval: PT0.3S
Job expires after: PT168H
Running as: sasadm
Log file: /files/files/e4d4c83d-8677-433a-a8d3-b03bf00a5768 [ 2019-01-31T12:35.823Z-results.log ]
Arguments:

Options:
{
“enabled” : true,
“type” : “LOAD”,
“settings” : {
“refresh” : false,
“refreshMode” : “newer”,
“refreshAccessThreshold” : 0,
“varChars” : false,
“getNames” : true,
“allowTruncation” : true,
“charMultiplier” : 2,
“stripBlanks” : false,
“guessRows” : 200,
“scope” : “global”,
“encoding” : “utf-8”,
“delimiter” : “,”,
“successJobId” : “”
},
“selectors” : [ {
“serverName” : “cas-shared-default”,
“inputCaslib” : “hrdl”,
“outputCaslib” : “hrdl”,
“filter” : “or(endsWith(tableReference.sourceTableName,’.sashdat’), endsWith(tableReference.sourceTableName,’.SASHDAT’),\nendsWith(tableReference.sourceTableName,’.sas7bdat’),\nendsWith(tableReference.sourceTableName,’.csv’)\n)”,
“settings” : { }
} ]
}

————————————————–

Created session cas-shared-default: d4d1235a-6219-4649-940c-e67526be31ed (CAS Manager:Thu Jan 31 07:35:01 2019)
Using session d4d1235a-6219-4649-940c-e67526be31ed
Server: cas-shared-default
Input caslib: hrdl
Output caslib: hrdl
Effective Settings:

———-LOAD STARTING———-

— loaded –> unloaded – HR_SUMMARY unloaded – HR_SUMMARY_NEW unloaded – HRDATA unloaded – PERFORMANCE_LOOKUP <– performance_lookup.sas7bdat
Access denied.
———-LOAD COMPLETE———-

———-CLEANUP STARTING———-

Session deleted: cas-shared-default: d4d1235a-6219-4649-940c-e67526be31ed
———-CLEANUP COMPLETE———-

Final Job State: completed
Log file:/files/files/e4d4c83d-8677-433a-a8d3-b03bf00a5768

I started my journey into the file service to discover how an administrator could manage the log files. In reviewing the files and their contents using the REST API, I discovered that there was no easy way to uniquely identify a log file. The table below shows the attributes of some of log files from the GEL Shared Viya environment.

As this post has illustrated, file service REST API can be used to list and view the files. It can also be used for other file management activities. To help administrators manage files that are not visible via a user interface, a couple of new tools have been added to the pyiyatools.

listfiles.py provides an easy interface to query what files are currently stored in the infrastructure data server. You can sort files by size or modified date, and query based on date modified, user who last modified the file, parentUri or filename. The output provides the size of each file so that you can check the space being used to store files. Use this tool to view files managed by the file service and stored in the infrastructure data server. You can use lisfiles.py -h to see the parameters.

For example, if I want to see all potential log files older than 6 days old created by the /jobexecution service, I would use:

/./listfiles.py -n log -p /jobExecution -d 6 -o csv/

The output is a list of files in csv format:

id ,name ,contentType ,documentType ,createdBy ,modifiedTimeStamp ,size ,parentUri

“f9b11468-4417-4944-8619-e9ea9cd3fab8″,”2019-01-25T13:35.904Z-results.log”,”text/plain”,”None”,”sasadm”,”2019-01-25T13:35:09.490Z”,”2459″,”/jobExecution/jobs/37536453-fe2f-41c2-ba63-ce72737e482c”

“dffdcc97-3fb0-47c1-a47d-e6c8f24f8077″,”2019-01-25T12:35.869Z-results.log”,”text/plain”,”None”,”sasadm”,”2019-01-25T12:35:08.739Z”,”2459″,”/jobExecution/jobs/99708443-2449-40b7-acc3-c313c5dbca23″

“5fa889b6-93fb-4496-98ba-e0c055ca5999″,”2019-01-25T11:35.675Z-results.log”,”text/plain”,”None”,”sasadm”,”2019-01-25T11:35:09.118Z”,”2459″,”/jobExecution/jobs/eb182f88-4853-41f4-be24-225176991e8a”

“87988659-3c2d-4602-b61a-8042b34022ac”,”2019-01-25T10:35.657Z-results.log”,”text/plain”,”None”,”sasadm”,”2019-01-25T10:35:09.881Z”,”2459″,”/jobExecution/jobs/73fffe47-a7ef-4c1d-b7bf-83ef0b86319e”

archivefiles.py allows you to read files from the file service and save them to a directory on the file system. Optionally, the tool will also delete files from the file service to free up space. For example, if I want to archive all the files I listed above, I would use:

/./archivefiles.py -n log -d 6 -p /job -fp /tmp/

This tool will create a timestamp directory under /tmp and save a copy of each file in the directory.

If you want to archive and delete, add the -x option.

IMPORTANT: Use the archive tool carefully. We recommend that you run a Viya Backup prior to running the tool to delete files.

Now you know where your files are and you have some help with managing them. For the full details of how you can manage files using the file service REST API you can view the file service REST API documentation on developer.sas.com. If you would like to suggest any changes to the existing tools, please enter a suggestion on GitHub.

Where are my Viya files? was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.